# Efficient MCMC with Caching

This post is part of a running series on Bayesian MCMC tutorials. For updates, follow @StableMarkets.

## Metropolis Review

Metropolis-Hastings is an MCMC algorithm for drawing samples from a distribution known up to a constant of proportionality, $p(\theta | y) \propto p(y|\theta)p(\theta)$. Very briefly, the algorithm works by starting with some initial draw $\theta^{(0)}$ then running through the following process for $t=1, \dots, T$ times:

1. Propose a draw from a given proposal distribution $\theta^* \sim p^*(\theta)$.
2. Evaluate posterior up to a proportionality constant under this draw: $T(\theta^*) = p(y|\theta^*)p(\theta^*)$.
3. Evaluate the posterior up to a proportionality constant under the previous draw: $T(\theta^{(t-1)}) = p(y|\theta^{(t-1)})p(\theta^{(t-1)} )$
4. If $T(\theta^*) \geq T(\theta^{(t-1)})$, then we accept $\theta^*$ as a draw from $p(\theta | y)$ and set $\theta^{(t)} \leftarrow \theta^*$
5. If $T(\theta^*) < T(\theta^{(t-1)})$, then we accept $\theta^*$ with probability $T(\theta^*)/T(\theta^{(t-1)})$ and set $\theta^{(t)} \leftarrow \theta^*$ or else we reject it and retain the previous estimate, $\theta^{(t)} \leftarrow \theta^{(t-1)}$.

## Caching

These terms “accept” and “reject” give the impression that we either “keep” or “toss” the results from each iteration. But it’s important to note that while we can toss out $\theta^*$ if it is rejected, the evaluations $T(\theta^*)$ and $T(\theta^{(t-1)})$ should not be tossed. They should be cached and used for future iterations.

If accepted, then in the next iteration we will need to compare $T(\cdot )$ under the new proposal with $T(\cdot )$ from this accepted proposal – which we have from the previous iteration. It’s what we used to accept this proposal in the first place.

If rejected, then in the next iteration we will need to compare $T(\cdot )$ under a new proposal against $T(\cdot )$ from $\theta^{(t-1)}$ – which we have from the previous iterations.

## Logistic Regression

Caching saves us one $T(\cdot)$ evaluation per iteration $t$ – cutting complexity by half. To give you a sense of how much computation this saves, consider the case of a logistic regression with binary $n \times 1$ vector $Y$, $n \times p$ covariate matrix $X$, and $p\times 1$ parameter vector $\beta$. $Y | \beta \sim Ber\Big( expit( X\beta ) \Big)$

Where $expit()$ is applied element-wise to yield an $n\times 1$ vector $p=(p_1, p_2, \dots, p_n)'=expit( X\beta )$. For simplicity, let’s say we have an improper, uniform prior. Then, $\log T(\beta) = p(Y|\beta) = \log \sum_{i=1}^n y_i \log[ p_i] + (1 - y_i) \log[ 1 - p_i]$

This is a costly evaluation. Computing $T(\cdot)$ requires matrix multiplication – $\mathcal{O}(np)$ operation. Without caching, we’d have to do this multiplication twice – once under the proposal and once under the current draw.

# Example in R

I wrote a short code implementing both a vanilla Metropolis algorithm (no caching) and a Metropolis with caching for a logistic regression with 50,000 observations and four parameters. I used microbenchmark package to compare run time. The vanilla algorithm’s runtime is about twice as long as the algorithm using caching – as expected.

The relative runtime is actually slightly less than 2 because we still need to do the work of storing the evaluation at each step, but still a huge efficiency gain.

1. xi'an says:
There are more efficient ways to make use of the computations of $T(\theta^*)$ and $T(\theta^{(t-1)})$, like Rao-Blackwellisation (Gelfand & Smith, 1990; Casella & Robert, 1996; Douc & Robert, 2011) and prefetching (Banterlé et al. (2015).
1. AO says: