Influence of Correlated Covariates on Predictive Performance for Different Models
Akash Khandelwal, Andrew C Hooker, and Mats O Karlsson
Dept of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
Objectives: To compare the predictive performance of different model building methods in the presence of covariate correlation.
Methods: A one compartment first-order absorption model was used to simulate concentrations. A dataset comprising of 100 individuals, each with 3 sampling times and 2 normally distributed covariates (COV1 and COV2), was simulated 100 times under different scenarios: (A) COV1, but not COV2 is related to CL, (B) both COV1 and COV2 are related to CL, (C) neither COV1 nor COV2 are related to CL i.e. the base model. Each scenario is simulated with different correlation between COV1 and COV2 (0, 0.3, 0.5, 0.7 and 0.9). Each scenario was analyzed using models including: no covariate, COV1, COV2, or both COV1 and COV2. In addition stepwise covariate model building (SCM; likelihood ratio test; p<0.05) was employed. A dataset with 5000 individuals was simulated from the true models to serve as an external, test, dataset. The parameter estimates obtained from each model under different scenarios were used to predict into the test dataset. The predictive ability of the models was assessed in terms of prospective OFV.
Results: The addition of a false covariate to a model with a true covariate effect lowers the predictive ability of the model as evidenced by the increase in the prospective OFV (albeit similar for different strengths of correlation). Best predictive performance came from the use of the true covariate model. Second best predictive performance was provided by SCM, regardless of scenario. When a covariate containing information is not included in the model the prospective OFV is higher than the model with included covariate, regardless of correlation to informative covariate.
Conclusions: The selection of a covariate model can be pre-defined or data-driven. In this limited case it was shown that unless the true model is pre-selected, the data-driven approach provided the best predictive performance regardless of covariate correlation. When either of two correlated covariates may contain information about the parameter in question, pre-selection of one may harm the predictive performance of the resulting model.