Marginal No-U-Turn Sampler for Bayesian Analysis in Pharmacometrics
Mohamed Tarek[1,2]; Jose Storopoli [1, 3]; Chris Elrod [1, 4]; Joga Gobburu [1, 5]; Vijay Ivaturi [1, 5]
[1] Pumas-AI Inc., USA; [2] Business School, University of Sydney, Australia; [3] Universidade Nove de Julho - UNINOVE, Sao Paulo, Brazil; [4] JuliaHub Inc., USA; [5] University of Maryland Baltimore, USA
Objectives: The number of parameters in hierarchical models increases with the number of sub-groups. In pharmacometrics, each sub-group is a subject/patient with individual parameters for each subject. There are also other population-level parameters shared between all the subjects. In most cases, the number of parameters per subject is < 10 and the number of population parameters is < 100. However, the number of subjects can be ≫ 100. Running Markov Chain Monte Carlo (MCMC) on such models can often be extremely time-consuming where the No-U-Turn Sampler (NUTS) [1] algorithm with a diagonal mass matrix often finds extremely small step sizes in the adaptation phase. This can lead to extremely slow sampling with an estimated completion time of hours or even days. The heavy correlation which exists between the population-level parameters and the subject-specific parameters in the posterior often means that the conditional posterior of the subject-specific parameters given the population parameters can be significantly different in each MCMC iteration.
Methods: One alternative to the joint MCMC approach is to marginalize the subject-specific parmaeters for all subjects using the Laplace method or one of its approximations, e.g. the first order conditional estimation (FOCE) method [2]. The Laplace method assumes a Gaussian approximation of the conditional posterior, and is common in pharmacometrics when maximizing the marginal likelihood. This approach is similar in spirit to the integrated nested Laplace approximation (INLA) algorithm [3] which uses a similar Laplace-based approach. In this work, we adopt a “Marginalize-Then-Sample” approach by integrating the subject-specific parameters out and then sampling from the marginal posterior only. Marginalizing the subject-specific parameters comes with the additional cost of computing the marginal log likelihood of the population parameters as well as its gradient. However, this additional computational cost can be shown to be over-compensated with computational gains made using the NUTS algorithm on the resulting small dimensional problem without significant loss in sampling accuracy. This high level idea was first proposed in [4] however the main difference between this work and [4] beside the different test model is that we used the first order conditional estimation (FOCE) method [2] for marginalizing the subject-specific parameters as implemented in Pumas which works for both Gaussian and non-Gaussian subject-specific parameters.
We ran experiments in Pumas [5] using a non-linear regression model, comparing treatment versus placebo arms for a rare disease model, using synthetic data. We used an Intel Xeon Platinum 8375C CPU @ 2.90GHz 32GB RAM Linux virtual machine with 8 virtual CPUs for the experiments. Each arm was fitted using both Marginal and Joint MCMC using NUTS with 5 to 100 subjects in increments of 5 subjects per arm, each subject has 26 observations. The sampling was performed using 4 parallel MCMC chains with parallel computation of the log probability over the subjects. The number of total samples were 1000, with 500 adaptation (warmup) samples. The target acceptance ratio was 0.8. Marginal MCMC was sampled using a dense matrix, and Joint MCMC was sampled using a diagonal matrix for performance reasons. For Marginal MCMC, we used the FOCE algorithm for the approximate marginalization. The model has 6 population parameters and 2 subject parameters, and one parameter represents the efficacy of the treatment arm.
Results: Marginal NUTS was found to be 30-100 times faster than the full joint NUTS without losing significant accuracy in the summary statistics of the marginal posterior of the population-level parameters. The computational speedup factor was found to scale with the number of subjects in the test model used.
Conclusions: When analyzing a pharmacometrics model using a fully Bayesian approach, where the number of individual parameters per individual multiplied by the number of individuals is generally much higher than the number of population parameters, it can be significantly faster to marginalize the individual parameters before sampling using an MCMC sampler.
References:
[1] Hoffman MD, Gelman A (2014) The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research 15(1):1593–1623
[2] Y. Wang, Derivation of various NONMEM estimation methods, Journal of Pharmacokinetics Pharmacodynamics, 2007.
[3] Rue H, Martino S, Chopin N (2009) Approximate bayesian inference for latent gaussian models by using integrated nested laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(2):319–392.
[4] C. Margossian, A. Vehtari, D. Simpson, R. Agrawal, Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond, NeurIPS, 2020
[5] C. Rackauckas, et al., Accelerated predictive healthcare analytics with Pumas, a high performance pharmaceutical modeling and simulation platform, 2020.