Introduction to Stan for Bayesian Data Analysis

Mark Andrews

Date: 17-20 January, 2022

Location: Online course

This course provides a comprehensive introduction to the Stan language for Bayesian modeling and high-performance statistical computation. It covers the theory behind Stan, including Bayesian inference and Markov Chain Monte Carlo (MCMC) methods. Participants will learn to write Stan models, apply them to practical examples such as linear and logistic regression, and explore more complex models like probabilistic mixture models.

Intended Audience

This course is aimed at anyone who is in interested in doing advanced Bayesian data analysis using Stan. Stan is a state of the art tool for advanced analysis across all academic scientific disciplines, engineering, and business, and other sectors.

Teaching Format

This course will be largely practical, hands-on, and workshop based. For many topics, there will first be some lecture style presentation, i.e., using slides or blackboard, to introduce and explain key concepts and theories. Then, we will work with examples and perform the analyses using R. Any code that the instructor produces during these sessions will be uploaded to a publicly available GitHub site after each session.

The course will take place online using Zoom. On each day, the live video broadcasts will occur between (EST; UTC/GMT-5) at:

12pm-2pm
3pm-5pm

All the sessions will be video recorded, and made available as soon as possible after each 2hr session on a private video hosting website.

Assumed quantitative knowledge

We assume familiarity with inferential statistics concepts like hypothesis testing and statistical significance, and practical experience with linear regression, logistic regression, mixed effects models using R.

Assumed computer background

Some experience and familiarity with R is required. No prior experience with Stan itself is required.

Equipment and software requirements

A computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers. In addition to R and RStudio, some R packages, particularly rstan, are required. Installing these packages will install Stan, which is a standalone program to R.

Course programme

Day 1

Topic 1: Hamiltonian Monte Carlo for Bayesian inference. We begin by describing Bayesian inference, whose objective is the calculation of a probability distribution over a high dimensional space, namely the posterior distribution. In general, this posterior distribution can not be described analytically, and so to summarize or make predictions from the posterior distribution, we must draw samples from it. For this, we can use Markov Chain Monte Carlo (MCMC) methods including the Metropolis sampler, sometimes known as random-walk Metropolis. Hamiltonian Monte Carlo (HMC), which Stan implements, is ultimately an efficient version of the Metropolis sampler that does not involve random walk behaviour. In this introductory section of the course, we will go through these major theoretical topics in sufficient detail to be able to understand how Stan works.
Topic 2: Univariate models. To learn the Stan language and how to use it to develop Bayesian models, we will start with simple models. In particular, we will look at binomial models and models involving univariate normal distributions. The models will allow us to explore many of the major features of the Stan language, including how to specify priors, in conceptually easy examples. Here, we will also learn how to use rstan to compile the HMC sampler from the defined Stan model, and draw samples from it.

Day 2

Topic 3: Regression models. Having learned the basics of Stan using simple models, we now turn to more practically useful examples including linear regression, general linear models with categorical predictor variables, logistic regression, Poisson regression, etc. All of these examples involve the use of similar programming features and specifications, and so they are easily extensible to other regression models.

Day 3

Topic 4: Multilevel and mixed effects models. As an extension of the regression models that we consider in the previous topic, here we consider multilevel and mixed effects models. We primarily concentrate on linear mixed effects models, and consider the different ways to specify these models in Stan.
Topic 5: Because Stan is a programming language, it essentially gives us the means to create any bespoke or custom statistical model, and not just those that are widely used. In this final topic, we will cover some more complex cases to illustrate it power. In particular, we will cover probabilistic mixture models, which are a type of latent variable model.

GitHub resources

Further resources for this training course can be found on Github at mark-andrews/isbd01.