Conditional distribution modeling for covariates simulation using classification and regression trees methods
Giovanni Smania and E. Niclas Jonnson
Pharmetheus AB
Objectives: Clinical trial simulation (CTS) is a valuable tool in drug development. To obtain realistic scenarios, the subjects included in the CTS must be representative of the target population. Common ways of generating virtual subjects are based upon bootstrap procedures or multivariate normal distributions (MVND) [1]. We recently investigated an alternative method based on conditional distributions (CD), which used predictive mean matching (PMM) as underlying prediction model [2]. Previous studies have shown that, in the context of missing data imputation, CD with classification and regression trees (CD-CART) outperformed CD with PMM (CD-PMM) when there are interactions or other nonlinear effects among the variables [3]. The objective if this study was to investigate the operating characteristics of CD when used to simulate covariates distributions based on CART methods, and to compare them with those of CD-PMM.
Methods: A dataset from a hypertension drug development program containing 233 healthy volunteers (HV) and 706 patients was utilized to extract the following baseline covariates: age, weight (WT), serum creatinine, creatinine clearance (CRCL), sex and race. CD-PMM and CD-CART were implemented using the R package mice [4]. N=30 datasets were simulated. The methods were evaluated based on the observed dataset (internal evaluation) as well as on their ability to predict an older population (extrapolation). Relative root mean square error (RMSE) and relative bias of mean, median, standard deviation (SD), range and variance-covariance matrix (for continuous covariates) and of proportions (for categorical covariates) were used as performance metrics.
Results: In the internal evaluation, CD-CART had lower bias and RMSE for the mean, median, SD and range of continuous covariates and the proportion of categorical covariates, when compared to CD-PMM. While bias in the variance-covariance matrix was comparable between the two methods, CD-CART allowed to considerably increase the precision of the correlation structure (RMSE gains were up to 11%), particularly in case of highly non-linearly related covariates. In terms of extrapolation performance, CD-CART slightly improved accuracy and precision in means and medians, but performed remarkably worse for SD. With respect to the variance-covariance matrix of the simulated data sets, CD-CART provided better estimates of the off-diagonal terms except for the WT~CRCL and AGE~CRCL relationships. Finally, bias and RMSE in the proportion of males/females were higher for CD-PMM vs. CD-CART.
Conclusions: In our previous work we have shown that, if uncertainty about the MVND assumptions exists, CD-PMM can increase the confidence in the simulated covariates compared to MVND [2]. Despite improving the operating characteristics in the internal evaluation, CD-CART does not consistently outperform CD-PMM in terms of extrapolation performance. However, the present work suggests that CD-CART can be a promising alternative to CD-PMM when dealing with covariate distributions characterized by strong non-linearities and/or interactions effects across covariates.
References:
[1] Teutonico D. et al. Generating Virtual Patients by Multivariate and Discrete Re-Sampling Techniques. Pharm Res. 32(10), 3228-3237 (2015).
[2] Smania G, Jonsson EN. Conditional distribution modeling as an alternative method for covariates simulation: comparison with joint multivariate normal and bootstrap techniques. CPT Pharmacometrics Syst Pharmacol. Epub ahead of print (2021).
[3] Doove LL, Van Buuren S, Dusseldorp E. Recursive Partitioning for Missing Data Imputation in the Presence of Interaction Effects. Comput Stat Data Anal. 72, 92–104 (2014).
[4] van Buuren S., Groothuis-Oudshoorn K. MICE: Multivariate Imputation by Chained Equations in R. J Stat Soft, 45(3), 1-67 (2011).