Bayesian Multilevel Modelling

Mark Andrews

Date: 25-29 March, 2019

Location: Cologne, Germany

The aim of this five day workshop is to give a theoretical and practical introduction to Bayesian approaches to multilevel modelling. Multilevel models are already widely used throughout the social sciences, and their widespread use is likely to continue to grow. Statistical inference in multilevel models presents major challenges for classical, i.e. sampling theory based, inference methods, while inference using Bayesian methods is always possible in principle, and becomes increasingly easy to perform in practice with ever more powerful Markov Chain Monte Carlo (MCMC) techniques.

While the workshop will focus on multilevel models, we will also provide a solid introduction to Bayesian data analysis and Bayesian theory generally. We will also make extensive use of general purpose Bayesian data analysis software that may be used for general data analysis, and not just multilevel models.

There will be five main sections, each occupying one day, as follows:

Introducing Bayesian data analysis

In this section, we aim to provide a solid and comprehensive coverage of all the fundamental principles of Bayesian data analysis. We will begin with the simple and uncontroversial application of Bayes’ rule to calculations of conditional probability, and then examine how Bayes’ rule can be used as the basis of general statistical inference. We will work through some conceptually simple but non-trivial statistical inference problems that are both easy to understand and computationally easy to caclulate yet nonetheless allow us to delve into all the key concepts of all Bayesian statistics such as the likelihood function, prior distributions, posterior distributions, maximum a posteriori estimation, high posterior density intervals, posterior predictive intervals, marginal likelihoods, Bayes factors, model evaluation of out-of-sample generalization.
Introducing Bayesian multilevel models

In this section, we begin by defining what multilevel models are, and why they arise in so many practical or real-world data analysis problems. We will then explore some simple Bayesian multilevel models including examples based on the beta-binomial model and the hierarchical normal model. In addition, in this section, we will introduce some general fundamental concepts in Bayesian multilevel modelling including hyperpriors, exchangeability, empirical Bayesian methods, pooling, shrinkage, etc.
Multilevel linear regression models

In this section, we will thoroughly explore the case of multilevel regression models, also known as linear mixed effects models. These are extremely practically valueable and widely used models, and a solid understanding of them leads to a solid foundation for understanding many but more complex multilevel models. As part of our coverage, we will consider the many alternative, but ultimately equivalent, conceptualizations of multilevel linear regression models, and how the relate to fixed versus random effects, and crossed versus nested effects. We will also consider how these models effectively address many of the shortcomings of the repeated-measures Anova models.

In this section, given that we will be making extensive use of the Stan probabilistic modelling language, primarily via the R based brms interface, we will devote sufficient time to explaining probabilistic modelling languages, and why they are an, if not the, essential tool in any Bayesian modelling toolbox. We will also delve into the details of Markov Chain Monte Carlo methods, and explain their vital role in Bayesian modelling generally.
Multilevel generalized linear models

Building upon the foundation laid down in the previous section, we now explore the plethora of multilevel generalized linear models and multilevel nonlinear regression models. These will include multilevel regression models for binary, categorical, ordinal and count models, including the well known cases such as logistic regression, Poisson regression, negative binomial regression. We will also explore multilevel survival models, response time models, robust regression methods, and nonlinear regression models including basis function regression models and Gaussian process regression models.
Multilevel mixture models

In this final section, we will explore multilevel models that are not regression models per se. In particular, we will explore multilevel probabilistic mixture models. Probabilistic mixture models are examples of latent variable models, and in the social sciences, they sometimes go by the name latent class models. We will provide a solid introduction to these models before considering their multilevel counterparts. One of the most widely used and practically successful multilevel mixture models is the Latent Dirichlet Allocation (LDA) model, which has been widely employed in the modelling of text data. We will explore this model and some of its generalizations. As part of this coverage, we will also delve into the Bayesian nonparametric models, and particularly hierarchical nonparametric models such as the Hierarchical Dirichlet Process mixture model.

Preparatory Reading:

Gelman et al (2014) Bayesian Data Analysis.
McElreath (2016) Statistical Rethinking.
Gelman & Hill (2007) Data Analysis using Regression and Multilevel/Hierarchical Models.

Recommended Literature to look at in advance:

Carpenter et al (2017). Stan: A probabilistic programming language. Journal of statistical software, 76(1).
Burkner (2017): brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of statistical software, 80(1)
Lambert (2018) A Student’s Guide to Bayesian Statistics.
Gill (2008) Bayesian Methods; A Social and Behavioural Sciences Approach.

GitHub resources

Further resources for this training course can be found on Github at mark-andrews/gesis-48-bayesian-multilevel.