Welcome to the Population Approach Group in Europe

Evaluation of the Boruta Machine Learning Algorithm for Covariate Selection

Ibtissem Rebai (1), Vincent Duval (1), Ayman Akil (1), Nathan Teuscher (1), Anna Largajolli* (1), Floris Fauchet*(1).

(1) Certara, Princeton, NJ, USA ; * Contributed equally.

Introduction

Stepwise covariate modeling methodology (SCM) [1] is one of the most used approaches in covariate selection. Despite its use, SCM suffers from several weaknesses including selection of incorrect covariates and high computational time burden for complex models [2]. In the last few years, machine learning algorithms [3, 4] have been applied for covariate selection. The objective of our work is to evaluate the performance of the Boruta algorithm (BOAL) [5] implemented in R [6] and in combination with Lasso [7] as a new framework for covariate selection.

Methods

Using a target mediated drug disposition model [8], six scenarios were simulated in NONMEM [9] where different covariate combinations were added to the model parameters. The scenarios explored were (1) no covariate, (2) body weight (BW) on CL, (3) BW on V1, (4) age on V1, (5) sex on V1, (6) age and BW on V1. For each scenario 100 simulation datasets were created. Each dataset included 180 subjects with rich sampling following single dose administration (from 35 up to 1500 mg) and whose 20 covariates were sampled from the NHANES dataset [10].

For each dataset, the individual parameters were estimated using the model without covariate and two different power calculations were performed for BOAL alone or in combination with Lasso for each parameter: The power (Power1) to identify the correct covariates together with additional ones, and the power (Power2) to identify exclusively the correct covariates. BOAL is a feature selection wrapper algorithm that can work with any classification method, the current work focused on XGboost classifier using default setting for the hyperparameters. Lasso method was added before the BOAL to reduce the impact of correlation between covariates in the XGboost process. Lasso is a regression method that shrinks the regression coefficient towards 0 for non-informative covariates.

Results

All re-estimated parameters had RSEs below 15% in all scenarios, confirming that the simulation design was appropriate. All parameters showed a low shrinkage (<5%) in all scenarios except for CL parameter (~20%). When applying BOAL alone, the scenario 1 without covariate showed 61 and 79 datasets out of 100 with no association for CL and V1, respectively. In scenario 2, 3, 4 and 5 BOAL resulted in a Power1 of 60%, 91%, 100% and 100%, respectively. In scenario 6, Power1 was 92 % for BW and AGE. When focusing on Power2, the BOAL resulted in a power of 30%, 44%, 46%, 37% and 42% for scenario 2, 3, 4 ,5 and 6. The addition of the Lasso step with BOAL provided a higher Power1 in scenario 1 with no association (i.e., 91% and 86% for CL and V1). In scenario 2, 3, 4, 5 Power1 was comparable to BOAL (58%, 93%, 100% and 99%). In scenario 6, the power was 86% for BW and AGE. Moreover, Power2 improved with respect to Boruta alone to 53%, 63%, 77%, 55%, and 57% for scenario 2 ,3, 4, 5 and 6.

Conclusions

BOAL proved to have a high power to select the correct parameter-covariate relationships even if in combination with additional covariates. BOAL preceded by Lasso can improve the power to detect the exact parameter-covariate relationships. These results show that the covariate selection process can become more efficient by reducing the number of covariates to be assessed for their relevance. These findings are and in line with what was previously found.

When shrinkage in the parameter was higher, the two power values dropped up to 60%. As next steps, more complex covariate scenarios will be investigated, different algorithm settings (e.g., hyperparameters optimization) will be tested together with different classifiers than XGboost. The power of a few scenarios will be also compared to SCM for referencing.

References:
[1] Jonsson E, Karlsson M (1998) Automated covariate model building with NONMEM. Pharm Res 15(9):1463–1468
[2] Ahamadi, M., Largajolli, A., Diderichsen, P.M. et al. Operating characteristics of stepwise covariate selection in pharmacometric modeling. J Pharmacokinet Pharmacodyn 46, 273–285 (2019).
[3] Sibieude, E., Khandelwal, A., Hesthaven, J.S. et al. Fast screening of covariates in population models empowered by machine learning. J Pharmacokinet Pharmacodyn 48, 597–609 (2021).
[4] Nicolò C, Périer C, Prague M, Bellera C, MacGrogan G, Saut O, Benzekry S. Machine Learning and Mechanistic Modeling for Prediction of Metastatic Relapse in Early-Stage Breast Cancer. JCO Clin Cancer Inform. 2020 Mar;4:259-274.
[5] KURSA, Miron B.; JANKOWSKI, Aleksander; RUDNICKI, Witold R. Boruta–a system for feature selection. Fundamenta Informaticae, 2010, 101.4: 271-285.
[6] Kursa, M. B., & Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1–13.
[7] Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33(1):1-22
[8] B. Meibohm, B. Brockhaus, M. Zühlsdorf, A. Kovar. Semi-mechanistic model-based drug development of EMD 525797 (DI17E6), a novel anti-?v integrin monoclonal antibody. PAGE 22 (2013), 11/2015
[9] Beal, S.L., Sheiner, L.B., Boeckmann, A.J. & Bauer, R.J. (Eds). NONMEM 7.4 users guides <https://nonmem.iconplc.com/nonmem743/guides> (1989–2018).
[10] Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data. 2017-2020. Available at: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?cycle=2017-2020

PAGE 2023: Methodology � AI/Machine Learning
Ibtissem Rebai

Evaluation of the Boruta Machine Learning Algorithm for Covariate Selection

Reference: PAGE 31 (2023) Abstr 10415 [www.page-meeting.org/?abstract=10415]

Poster: Methodology � AI/Machine Learning