Previous posts in this series on MCMC samplers for Bayesian inference (in order of publication):

- Bayesian Simple Linear Regression with Gibbs Sampling in R
- Blocked Gibbs Sampling in R for Bayesian Multiple Linear Regression
- Metropolis-in-Gibbs Sampling and Runtime Analysis with Profviz
- Speeding up Metropolis-Hastings with Rcpp

All code for this (and previous) posts are in my Github repo

Recently I’ve been reading about Bayesian additive regression trees (BART). For those interested, the paper is here. I use similar notation when describing the backfitting procedure in this post. I believe this MCMC method was first described here.

BART a Bayesian nonparametric model that can be fit using backfitting MCMC – which is cool in itself. I thought I’d write a post describing the intuition behind backfitting as well as a toy example.

All code for this post can be found in this GitHub repo. I don’t use any packages … the MCMC is implemented from scratch.

Consider data on a covariate and an outcome for subjects, . In this toy example, the data looks something like this:

We might consider the following probabilistic model

Where

Essentially we model the conditional mean using a third-degree polynomial. Note that this is a special case of the more general additive model

where in this case and . The model is completed with a flat prior on each element of the parameter vector and an inverse-gamma prior, , with shape of and rate on the variance parameter.

The conditional posterior for each is Gaussian (because of conjugacy). We can sample from it using conjugate Gibbs or Metropolis. We could sample the whole parameter vector as a block as well, but in this post we’ll stick to backfitting – which is itself a Gibbs sampler. We still sample from the conditional posterior of each conditional on the other s. However, we make use of the key insight that the conditional posterior of each depends on the other betas, denoted by , only through the residual

Intuitively, is the portion of the mean of left after the portions explained by the other terms (not ) are subtracted off. It is also normally distributed,

With a normal prior on , the posterior can be computed via conjugacy as

The backfitting MCMC proceeds as follows. First, initialize all betas except . This is totally arbitrary – you can start with any parameter. Then, within each Gibbs iteration,

- Compute with the values of at the current iteration. The sample from the posterior conditional on the current draw of .
- Compute with the values of at the current iteration. Note, for use the value from step 1. The sample from the posterior .
- Continue this procedure for all beta parameters.
- Once all parameters are drawn, sample . This posterior is another inverse gamma.

The term backfitting seems appropriate because, at each iteration, we’re “backing out” the distribution of the that we want to sample using the other betas.

To get a fitted regression line, we need to sample from the posterior predictive distribution. We do this after step 4 in each Gibbs iteration by drawing values from

The superscript denotes the parameters using values from the Gibbs iteration.

Here are the results from the simulated example. Again, all code can be found on my GitHub repo.

Can you please elaborate further on how you were able to calculate the posterior via conjugacy?

We know that the posterior is proportional to the likelihood times the prior: p(beta|R, phi) \propto p(R | beta, phi) * p(beta ). Now, p(beta) is a normal density and p(R | beta, phi) is a normal density. Just multiply them together and do a lot of algebra to identify the mean and variance of another normal.