The goal of this post is to blend the material I’ve been learning in my night class with my day-job and R.

If we have some object that switches between states over time according to fixed probabilities, we can model the long-term behavior of this object using Markov chains*.

A good example is a mortgage. At any given point in time, a loan has a probability of defaulting, stay current on payments, or getting paid-off in full. Collectively, we call these “transition probabilities.” These probabilities are assumed to be fixed over the life of the loan**.

As an example, we’ll look at conventional fixed-rate, 30-year mortgages. Let’s suppose that every current loan in time T has a 75% chance of staying current, 10% chance of defaulting, and 15% chance of being paid off in time T+1. These transition probabilities are outlined in the graphic above. Obviously, once a loan defaults or gets paid-off, it stays defaulted or paid-off. We call such states “absorbing states.”

Since we know the transition probabilities, all we need*** is an initial distribution of loans and we can predict the percentage of loans in each state at any given point in the 30-year period. Suppose we start off at T=0 with 100 current loans, and 0 defaulted and paid off loans. In time T+1, we know (according to our transition probabilities) that 75 of these 100 will remain current on their payments. However, 15 loans will be paid off and 10 loans will default. Since we assume that the transition probabilities are constant through the loans’ lives, we can use them to find the amount of current loans in time t=2. Of the 75 loans that were current in T+1, 56.25 loans will remain current in T+2 (since 75*.75=56.25).

If we repeat the process 28 more time (which is done in the posted code) and plot the points, we get the time series plotted above. After 30 years, there are no current loans (since they were all 30-year loans). They have all either paid off or defaulted, with more loans paid-off than defaulted.

*There are many drawbacks to using Markov Chains to model mortgages. This model assumes that the transition probabilities are the same for all 100 loans I use in my example. In reality, loans are not identical (e.g. the borrow of one loan may have much higher credit score than another. This difference would give the former a much lower chance of default) and transition probabilities are not constant throughout the lives of the loans (e.g. if interest rates plummet halfway through the loan’s life, the probability that the loan will be paid off skyrockets since the borrower can refinance at a lower rate) . In short, no one actually uses this model because it is too primitive. Interestingly enough, however, I did compare the curves in my plot against empirical data I have at work and the results are strikingly similar.

In industry, Survival Analysis is used most frequently to model loans (usually implemented using logistic regression with panel data or a proportional hazards models). This is an interesting blend of biostatistics and economics. It’s particularly funny when people apply the biostatistics terminology to mortgages to discuss single-monthly mortality (monthly probability of prepayment), hazards, or survival functions (i.e. the blue line in my chart.).

**In this case, these probabilities can be though of as “hazard rates.” The hazard of default, for example, is the probability that a loan defaults in time T+1 given that it has survived in time T. This is different from a probability of default. The former is a conditional probability whereas the latter is not.

***We don’t technically need an initial condition in this case for mathematical reasons which I won’t get into because time is a scarce resource.