Bayesian Inference with Backfitting MCMC

Previous posts in this series on MCMC samplers for Bayesian inference (in order of publication):

  1. Bayesian Simple Linear Regression with Gibbs Sampling in R
  2. Blocked Gibbs Sampling in R for Bayesian Multiple Linear Regression
  3. Metropolis-in-Gibbs Sampling and Runtime Analysis with Profviz
  4. Speeding up Metropolis-Hastings with Rcpp

All code for this (and previous) posts are in my Github repo

Recently I’ve been reading about Bayesian additive regression trees (BART). For those interested, the paper is here. I use similar notation when describing the backfitting procedure in this post. I believe this MCMC method was first described here.

BART a Bayesian nonparametric model that can be fit using backfitting MCMC – which is cool in itself. I thought I’d write a post describing the intuition behind backfitting as well as a toy example.

All code for this post can be found in this GitHub repo. I don’t use any packages … the MCMC is implemented from scratch.

Consider data on a covariate x and an outcome y for n subjects, (y_i, x_i)_{1:n} . In this toy example, the data looks something like this:


We might consider the following probabilistic model

 y_i | x_i,  \beta, \phi \sim N(\mu_i, \phi) 


\mu_i = \sum_{j=0}^3 x_i^j \beta_j

Essentially we model the conditional mean using a third-degree polynomial. Note that this is a special case of the more general additive model

\mu_i = \sum_{i=0}^k f_j(x_i)\beta_j

where in this case f_j(x_i) = x_i^j and k=3. The model is completed with a flat prior p(\beta_j) \propto 1 on each element of the parameter vector \beta and an inverse-gamma prior, p(\phi) = InvGam(.5, 10000), with shape of .5 and rate 10000 on the variance parameter.

The conditional posterior for each \beta_j is Gaussian (because of conjugacy). We can sample from it using conjugate Gibbs or Metropolis. We could sample the whole parameter vector \beta as a block as well, but in this post we’ll stick to backfitting – which is itself  a Gibbs sampler. We still sample from the conditional posterior of each \beta_j conditional on the other \betas. However, we make use of the key insight that the conditional posterior of each \beta_j depends on the other betas, denoted by \beta_{-j}, only through the residual

R_j = y_i - \sum_{k \neq j} x_i^k \beta_k

Intuitively, R_j is the portion of the mean of y_i left after the portions explained by the other terms (not j) are subtracted off. It is also normally distributed,

R_j \sim N(x_i^j\beta_j, \phi)

With a normal prior on \beta_j, the posterior can be computed via conjugacy as

p(\beta_j | R_j, \phi) \sim N\big( \frac{\sum_{i=1}^n R_{ij} \cdot x_i^j}{\sum_{i=1}^n [x_i^j]^2}, \frac{\phi}{\sum_{i=1}^n [x_i^j]^2 }  \big)

The backfitting MCMC proceeds as follows. First, initialize all betas except \beta_1. This is totally arbitrary – you can start with any parameter. Then, within each Gibbs iteration,

  1. Compute R_1 with the values of \beta_{-1} at the current iteration. The sample from the posterior \beta_1 \sim p(\beta_1 | R_1, \phi) conditional on the current draw of \phi.
  2. Compute R_2 with the values of \beta_{-2} at the current iteration. Note, for \beta_1 use the value from step 1. The sample from the posterior \beta_2 \sim p(\beta_2 | R_2, \phi).
  3. Continue this procedure for all beta parameters.
  4. Once all \beta parameters are drawn, sample \phi \sim p(\phi | y, \beta). This posterior is another inverse gamma.

The term backfitting seems appropriate because, at each iteration, we’re “backing out” the distribution of the \beta_j that we want to sample using the other betas.

To get a fitted regression line, we need to sample from the posterior predictive distribution. We do this after step 4 in each Gibbs iteration by drawing n values from

\hat{y}_i | x_i,  \beta^{[k]}, \phi^{[k]} \sim N(\mu_i^{[k]}, \phi^{[k]} ), i \in \{1, 2, \dots, n \}

The superscript [k] denotes the parameters using values from the k^{th} Gibbs iteration.

Here are the results from the simulated example. Again, all code can be found on my GitHub repo.


3 thoughts on “Bayesian Inference with Backfitting MCMC

    1. We know that the posterior is proportional to the likelihood times the prior: p(beta|R, phi) \propto p(R | beta, phi) * p(beta ). Now, p(beta) is a normal density and p(R | beta, phi) is a normal density. Just multiply them together and do a lot of algebra to identify the mean and variance of another normal.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s