# Modeling Ebola Contagion Using Airline Networks in R

I first became interested in networks when reading Matthew O’Jackson’s 2010 paper describing their application to economics. During the 2014 ebola outbreak, there was a lot of concern over the disease spreading to the U.S.. I was caught up with work/classes at the time, but decided to use airline flight data to at least explore the question.

The source for the data can be found in a previous post on spatial data visualization.

I assumed that the disease had a single origin (Liberia) and wanted to explore the question of how the disease could travel to the U.S. through a network of airline flights.

A visualization of the network can be seen below. Each node is a country and each edge represents an existing airline route from one country to another. Flights that take off and land in the same country are omitted to avoid clutter.

# Finding and Plotting Lorenz Solution using MATLAB

I use MATLAB to solve the following Lorenz initial value problem:
$\begin{cases} x'=-10(x+y) \\ y'=-x(z+28)-y \\ z'=xy-\frac{8}{3}z \\ x(0)=y(0)=z(0)=5 \end{cases}$

I wrote a function, LorenzRK4IVP(), that takes the system of three differential equations as input and solves the system using the Runge-Kutta method with step size $h=.01$. I plot the strange attractor as well as use MATLAB to produce a GIF of the solution.

# Using Markov Chains to Model Mortgage Defaults in R

The goal of this post is to blend the material I’ve been learning in my night class with my day-job and R.

If we have some object that switches between states over time according to fixed probabilities, we can model the long-term behavior of this object using Markov chains*.

A good example is a mortgage. At any given point in time, a loan has a probability of defaulting, stay current on payments, or getting paid-off in full. Collectively, we call these “transition probabilities.” These probabilities are assumed to be fixed over the life of the loan**.

As an example, we’ll look at conventional fixed-rate, 30-year mortgages. Let’s suppose that every current loan in time T has a 75% chance of staying current, 10% chance of defaulting, and 15% chance of being paid off in time T+1. These transition probabilities are outlined in the graphic above. Obviously, once a loan defaults or gets paid-off, it stays defaulted or paid-off. We call such states “absorbing states.”

Since we know the transition probabilities, all we need*** is an initial distribution of loans and we can predict the percentage of loans in each state at any given point in the 30-year period. Suppose we start off at T=0 with 100 current loans, and 0 defaulted and paid off loans. In time T+1, we know (according to our transition probabilities) that 75 of these 100 will remain current on their payments. However, 15 loans will be paid off and 10 loans will default. Since we assume that the transition probabilities are constant through the loans’ lives, we can use them to find the amount of current loans in time t=2. Of the 75 loans that were current in T+1, 56.25 loans will remain current in T+2 (since 75*.75=56.25).

If we repeat the process 28 more time (which is done in the posted code) and plot the points, we get the time series plotted above. After 30 years, there are no current loans (since they were all 30-year loans). They have all either paid off or defaulted, with more loans paid-off than defaulted.

*There are many drawbacks to using Markov Chains to model mortgages. This model assumes that the transition probabilities are the same for all 100 loans I use in my example. In reality, loans are not identical (e.g. the borrow of one loan may have much higher credit score than another. This difference would give the former a much lower chance of default) and transition probabilities are not constant throughout the lives of the loans (e.g. if interest rates plummet halfway through the loan’s life, the probability that the loan will be paid off skyrockets since the borrower can refinance at a lower rate) . In short, no one actually uses this model because it is too primitive. Interestingly enough, however, I did compare the curves in my plot against empirical data I have at work and the results are strikingly similar.

In industry, Survival Analysis is used most frequently to model loans (usually implemented using logistic regression with panel data or a proportional hazards models). This is an interesting blend of biostatistics and economics. It’s particularly funny when people apply the biostatistics terminology to mortgages to discuss single-monthly mortality (monthly probability of prepayment), hazards, or survival functions (i.e. the blue line in my chart.).

**In this case, these probabilities can be though of as “hazard rates.” The hazard of default, for example, is the probability that a loan defaults in time T+1 given that it has survived in time T. This is different from a probability of default. The former is a conditional probability whereas the latter is not.

***We don’t technically need an initial condition in this case for mathematical reasons which I won’t get into because time is a scarce resource.

# Cobb-Douglas Visualisation – I

I used excel to construct a visualization of the Cobb-Douglas production function (explicitly presented in the graph titles).

The production function expresses the output of any given firm as a function of two inputs (labor and capital) and parameters (alpha and beta). When the sum of alpha and beta equals 1, it can be shown that they represent labor’s and capital’s share of output, respectively .

This condition also means that the firm is operating with constant returns to scale. When a firm expands its inputs by a certain percent, output increases by the same amount.

We can plot each amount of labor, capital, and output in x-y-z space if we specify alpha and beta.

We do this for labor and capital ranging from 1 to 100 and alpha=beta=.5.The result is the Cobb-Douglas production surface with capital and labor each comprising 50% of the input.

Notice that the lines that separate the differently colored areas are equally spaced. This is a property of increasing returns to scale.

When labor and capital expand, the level of utility rises proportionately at a constant rate.

It is also useful to view the surface from above.

Those L-shaped curves are called isoquants or rectangular hyperbolas. They represent the different combinations of labor and capital that produce the same (“iso”) quantity of output (“quant”). For example, L=4 and K=4, L=16 and K=1, and L=1 and K=16 all produce O=4 level of output. The L-shaped curve simply connects these points for Q=4. As the curves move northwest, they plot the combinations for higher values of output.

A particular firm’s efficient combination depends on the price of its inputs, but that’s for another day.

# Monte Carlo Simulation of Pi

Partially out of boredom and partially because I was inspired by the movie title “Life of Pi”, I decided to make a monte carlo simulator that could approximate the value of pi.

Monte carlo simulations are used in everything from derivative pricing to biology (and, in this case, boredom alleviation). Basically, it’s good for solving problems that have no exact solution.

The simulator throws a random point on a 2×2 square and then throws a random point on a circle of radius 1. This is one trial. However, it does this (in this case) for 6000 trials. So the square is filled with 6000 points and the circle is filled with less than 6000 points.

Now that the simulation has done a really good job at filling in the square and the circle, we count the number of points in the the circle and the square to get the areas of each. Then we take the ratio: areaCir/areaSq.

However, we know that the areaSq is 4 for a 2×2 square. So we multiple that ratio by 4 to cancel out the denominator and are left with areaCir. Since areaCir=pi*radius^2, areaCir=pi for our circle because radius=1.

The more trials we conduct, the more filled our circle and square become and, thus, the more accurate out approximation of pi. Here is the result:

It’s clear that as trials -> ∞ , error -> 0% and Pi,Approx -> Pi,True

Doing more than 6000 trials really slows down processing unless the simulator is done in code. So, I built a monte carlo simulator using VBA.

I ran 50,000,000 simulations. It took 34.2187 seconds and it approximated pi=3.14155704. Pi is actually equal to 3.14159265. Error was .001%