2024 - Rome - Italy

PAGE 2024: Methodology - New Tools
Lorenzo Contento

Improving simulations by learning the true random effects' distribution from a population using generative machine learning

Lorenzo Contento (1), Mohamed Tarek (1)

(1) Pumas-AI

Objectives: Non-linear mixed effects (NLME) modelling aims to estimate the population-level parameters and the distribution of individual-level parameters (random effects, REs) from the observed data. In practice, the distribution of the REs is always restricted to belong to a standard family (usually multivariate Gaussians). Sometimes, the true distribution of the REs cannot be closely described by one of such standard distributions. For example, the true distribution may: (i) be skewed, leading to biased estimates of the typical value; (ii) exhibit complex non-linear dependencies between the REs, while only linear relationships can be captured by Gaussians; (iii) be multimodal, though in this case Gaussian mixtures may be used with the downside that the number of modes must be correctly selected. In such cases, the estimated population-level parameters may be biased. Moreover, when generating synthetic populations, a misspecified REs distribution could lead to simulations that are extremely unlikely and very different from the observed data. Simulated data would consequently show an inflated variability and the quality of any conclusion based on such simulations (e.g., study design) would be negatively impacted [1].

To address this problem, we developed a method for using arbitrarily shaped REs distributions in NLME models, enabling the generation of realistic synthetic populations. The developed method takes a fitted NLME model and uses it to learn the true distribution of REs.

Methods: Assume we have a previously fitted NLME model with a potentially misspecified REs distribution. Using the model and assuming there are enough observations per subject, we can sample from the true distribution of data-generating REs in the population by: (i) sampling a subject from the population randomly; (ii) sampling from the posterior distribution of the REs associated to that subject, using either MCMC or an approximation of the posterior. While the potentially misspecified REs distribution in the model will influence the posterior, as long as we have a sufficient amount of data for each subject, we can expect such influence to be small.

Once we have samples from the distribution of data-generating REs in the population, we can fit a generative machine learning (ML) model to the samples to learn their distribution. A normalizing flow (NF) [2] was used as the generative model in this work. An NF is an invertible neural network transformation of a standard Gaussian distribution with a tractable probability density function enabling tractable training by maximizing the likelihood. The learnt NF was then used as the new REs distribution in the model to correct the model's misspecification.

The proposed method was implemented in the Pumas/DeepPumas software suite.

Results: To test the method, data was simulated using models with non-Gaussian REs distributions (e.g., multimodal or nonlinearly correlated). The data was then fitted to the same model but with (misspecified) Gaussian REs. We then applied our method to the fitted misspecified model to try to recover the true REs distribution. As long as there was enough information in the population (i.e. low residual noise and/or many samples per subject), we were able to recover the true REs distribution (used to simulate the data) from the misspecified models.

We then simulated from the NF-augmented model and compared the visual predictive check (VPC) plot of the new model to the misspecified model. The VPC plot of the NF-augmented model was significantly better than that of the misspecified model, where the quantiles of the distribution of simulated data better resembled the quantiles of the observed data.

We also tested more complex models where it was not possible to recover exactly the true REs distribution due to higher noise levels and non-identifiability issues. Even so, the obtained distribution contained only REs that are compatible with the observed population, resulting in improved simulation performance and VPC plots.

Conclusions: Generative ML models can be used to learn the true distribution of REs in a population, fixing NLME model misspecification and improving the quality of simulations, VPC plots and trial design. We can also expect improved predictive performance when few data points per subject are available and the influence of the prior is stronger, leading to more effective adaptive precision medicine.



References:
[1] D. R. Mould and R. N. Upton. "Basic concepts in population modeling, simulation, and model-based drug development - Part 2: introduction to pharmacokinetic modeling methods". CPT Pharmacometrics Syst Pharmacol. 2013 Apr 17;2(4):e38. doi: 10.1038/psp.2013.14
[2] I. Kobyzev, S. Prince and M. Brubaker. "Normalizing Flows: An Introduction and Review of Current Methods". IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 43, no. 11, pp. 3964-3979, 2021. doi: 10.1109/TPAMI.2020.2992934


Reference: PAGE 32 (2024) Abstr 11038 [www.page-meeting.org/?abstract=11038]
Oral: Methodology - New Tools
Top