Penalized regression implementation within the SAEM algorithm to advance high-throughput personalized drug therapy
Julie Bertrand (1), Maria De Iorio (2), David J. Balding (1)
(1) Genetics Institute, University College London, London, UK, (2) Department of Statistical Science, University College London, London, UK
Context: In a previous study, we have shown that penalized regression approaches (such as Lasso) in combination with a model-based population analysis were computationally and statistically efficient to explore a large array of single nucleotide polymorphisms (SNPs) in association with drug pharmacokinetics (PK) [1]. However, these approaches use two stages in which the effect of a SNP on model parameters is assessed after those parameters are estimated.
Objectives: To develop an integrated approach to simultaneously estimate the PK model parameters and the genetic size effects and compare its performance to a penalized regression on Empirical Bayes Estimates (EBEs) and a classical stepwise procedure.
Methods: At each iteration of the Stochastic Approximation (SA) Expectation Maximization algorithm, a penalized regression is realized on the values of the individual parameters issued from the SA to update the vector of fixed effects. In the Lasso procedure, the penalty function is the double-exponential (DE) probability density. Hoggart et al. [2] proposed the HyperLasso, a generalization of the Lasso, to allow the penalty function to have flatter tails and a sharper peak. HyperLasso uses the normal-exponential Gamma (NEG) distribution, which is the DE with the rate parameter drawn from a Gamma distribution. The shape parameter of the NEG was here set to 1 and the scale using a formula ensuring a given family wise error rate (FWER) [2] rather than permutations as in [1].
Our simulated PK model is based on a real-case study but with a design selected to ensure reasonable precision of parameter estimates of 300 subjects and 6 sampling times. The simulated array includes 1227 SNPs in 171 genes. Under the alternative, H1, we randomly picked 6 SNPs per simulated data set which together explain 30% of the variance in the logarithm of the apparent clearance of elimination.
Results: The penalized regression on EBEs and the stepwise procedure obtained a FWER not significantly different from the target value of 0.2, while the integrated approach was more conservative with an empirical FWER of 0.1. Nevertheless, all three approaches obtain similar power estimates to detect each of the 6 causal SNPs with the integrated approach detecting almost no false positives. The integrated approach computing times were longer under the null and under H1, 1.8 and 2.8h compared to 0.08 and 0.12h for the penalized regression on EBEs and 0.08 and 0.73h for the stepwise procedure.
References:
[1] Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ. PLoS Genet. 2008, 4(7): e1000130
[2]Multiple single nucleotide polymorphism analysis using penalized regression in nonlinear mixed-effect pharmacokinetic models. Bertrand J, Balding DJ. Pharmacogenet Genomics. 2013, 23(3): 167-74