The LASSO - A Novel Method for Predictive Covariate Model Building in Nonlinear Mixed Effects Models
Ribbing, J. (1), J. Nyberg (1), O. Caster (1), E.N. Jonsson (1,2)
(1) Division of Pharmacokinetics and Drug Therapy, Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden; (2) Roche Pharmaceuticals, Basel, Switzerland
Introduction: Covariate models for population Pharmacokinetics (PK) and Pharmacodynamics (PD) are often built with stepwise-covariate-selection procedures such as the SCM[1]. When analysing a small dataset this method would be expected to produce a covariate model suffering from selection bias and poor predictive performance[2]. A novel method[3] which has been suggested to remedy these problems is the LASSO (acronym for "Least Absolute Shrinkage and Selection Operator"). In addition, the procedure may have considerably shorter computer-run-time.
In the LASSO all covariates are standardized to the between-individual-standard deviation. After this transformation estimating the LASSO model is the same as estimating the model containing all potential covariate-parameter relations, but with one restriction: the sum of the absolute-covariate coefficients must be smaller than a value, t. This will force some covariates towards zero while the others (with a stronger signal) are shrunk compared to the maximum-likelihood estimate (mle). This means in practice that when fitting the lasso-model the covariate-relations are tested for inclusion at the same time as the included relations are estimated.
In the SCM the model size depends on the p-value required for selection. In the LASSO the model size instead depends on the tuning parameter, t. The optimal value of t can be estimated using cross validation[4].
Objectives: To implement the LASSO for covariate selection within NONMEM and to compare this method to the commonly-used SCM.
Methods: The LASSO was implemented as an automated tool using Pearl-speaks-NONMEM (PsN)[5, 6]. Each t-value was evaluated using five-fold-cross validation on the NONMEM-objective-function value.
To compare the LASSO to the SCM, both procedures were applied to 100 analysis datasets with 60 and 180 subjects each. The datasets were generated by sampling subjects (with replacement) from a large dataset containing PK-data from 721 subjects[7]. For each analysis dataset a validation dataset was created comprising all subjects among the 721 that were not in the corresponding analysis dataset.
A simple starting model (without any covariate coefficients) was developed to be estimable on the smaller datasets.
For the datasets with 60 subjects 11 covariates were investigated on CL and 6 on each of the other three structural-model parameters. For the dataset with 180 subjects one additional covariate was investigated on CL. Continuous covariates were investigated both as a simple linear relation and as piecewise-linear relations with one breakpoint at the median. The SCM was built with forward inclusion at p<0.05 and backward elimination at p-values 0.05, 0.01 and 0.001. The starting model and the model produced by the LASSO and the three models from the SCM (at different p-values) were evaluated w.r.t. runtimes and predictive performance. The LASSO was run on five parallel processors while the SCM was run on six parallel processors.
The predictive performance of a model was evaluated on the observations in the validation dataset as the mean-absolute-relative error, mae(%)=100∙average(|obsn-predn|/obsn), where obsn and predn are the nth observation and model prediction of the same. The average mae over the 100 validation datasets were calculated for each covariate-selection procedure (and p-value) and for each dataset size.
Results: The LASSO was implemented within NONMEM to a fully automated procedure first estimating the best t using cross-validation and then (selecting and) estimating the covariate model on the analysis-dataset using this t value. The LASSO operates on transformed covariates and with some extra calculations but the final model obtained from the procedure is a normal NONMEM-model file with the selected covariate coefficients on the original scale and initial estimates of the covariate coefficients fixed to the estimates obtained from the LASSO.
As a starting model the data was well described by a two-compartment model with log-normally distributed inter-individual variability on all four parameters but correlation between V1 and V2 assumed one which required one random-effects parameter less. The intra-individual error was additive on the log-transformed scale.
In the 60-subject datasets, the best p-value for the SCM (in terms of predictive performance) was 0.01. The SCM with this p-value had approximately the same prediction error as the starting model (mae=51%). Compared to the SCM, the LASSO reduced the prediction error by 1.8% (which represents a statistically significant improvement, p < 0.001). Run-times were also in favour of the LASSO which on average required 10.1 minutes on five processors compared to the SCM which on average took 15.5 minutes on six. The average number of covariate coefficients in the final models from the 60-subject datasets were 3.2, 1.9 and 0.7 for SCM with p-values 0.05, 0.01 and 0.001 respectively. The LASSO had on average 1.9 covariate coefficients albeit shrunken from the mle. For the 180-subject datasets the prediction error for the starting model was 50%. The SCM with p-value 0.05 and 0.01 as well as the LASSO all reduced this error by 4%. The number of covariates in the final model increased, especially for the LASSO which no longer had the shorter run-time.
Discussion: The improvement in prediction error of 1.8% in the 60-subject datasets may seem small but should be seen in the light that the intra-individual error was estimated to ~43% on the original (721-subject) dataset and no model could ever overcome this variability. Also, the benefit of the LASSO compared to the SCM would be larger if the user were not to use the optimal p-value. The SCM could be expected to perform better than the LASSO in situations where a single highly-influential covariate exists and the others are of negligible explanatory value[3]. However, such a situation can often be foreseen (e.g. measurement of renal function affects CL of a drug which is mainly cleared via renal filtration and genotype of a metabolising enzyme affects a drug which is mainly eliminated via that enzyme) and the highly-influential covariate could be included in the starting model without statistical testing.
The SCM has (50%) longer run-times in the example with the 60-subject dataset because of the many covariate-relations investigated and if there were fewer the SCM would require less computer power. On the other hand, if running on a large cluster or grid the SCM would still have to run one step after another where as using the LASSO all lasso-models could be estimated in parallel (independently of one another) and run-times may be only a fraction of that required by the SCM. Further, the LASSO search for an optimal t could be improved and the cross-validation estimate of prediction error may be enhanced[8].
Conclusion: On small- to moderately-sized datasets the current implementation of the LASSO will often produce a covariate model which predicts new data better than the SCM. The LASSO is also faster if many covariate-parameter relations are investigated. If a large cluster or grid is available the procedure could be made much faster than the SCM.
The LASSO does not require the user to make a decision about the p-value for covariate inclusion.
References:
1. Jonsson, E.N. and M.O. Karlsson, Automated covariate model building within NONMEM. Pharm Res, 1998. 15(9): p. 1463-8.
2. Ribbing, J. and E.N. Jonsson, Power, selection bias and predictive performance of the population pharmacokinetic covariate model. Journal Of Pharmacokinetics And Pharmacodynamics, 2004. 31(2): p. 109-134.
3. Tibshirani, R., Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 1996. 58(1): p. 267-288.
4. Hastie, T., R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. 2001: Springer-Verlag.
5. Lindbom, L., P. Pihlgren, and E.N. Jonsson, PsN-Toolkit--a collection of computer intensive statistical methods for non-linear mixed effect modeling using NONMEM. Comput Methods Programs Biomed, 2005. 79(3): p. 241-57.
6. Lindbom, L., J. Ribbing, and E.N. Jonsson, Perl-speaks-NONMEM (PsN)--a Perl module for NONMEM related programming. Comput Methods Programs Biomed, 2004. 75(2): p. 85-94.
7. Zingmark, P.H., et al., Population pharmacokinetics of clomethiazole and its effect on the natural course of sedation in acute stroke patients. Br J Clin Pharmacol, 2003. 56(2): p. 173-83.
8. Efron, B. and R. Tibshirani, Improvements on cross-validation: The.632+ bootstrap method. Journal Of The American Statistical Association, 1997. 92(438): p. 548-560.