2023 - A Coruña - Spain

PAGE 2023: Methodology – AI/Machine Learning
Federico Amato

Explainable Machine Learning prediction of edema adverse events in NSCLC patients treated with tepotinib

Federico Amato (1), Rainer Strotmann (2), Pascal Girard (3), Alberto Garcia Duran (3), Roberto Castello (1), Karthik Venkatakrishnan (4), Nadia Terranova (3)*

(1) Swiss Data Science Centre, EPFL Lausanne and ETH Zurich, Switzerland; (2) Merck Healthcare KGaA, Darmstadt, Germany; (3) Merck Institute for Pharmacometrics, Ares Trading S.A., Lausanne, Switzerland, an affiliate of Merck KGaA; (4) EMD Serono Research and Development Institute, Inc., Billerica, MA, USA, an affiliate of Merck KGaA; *Corresponding author.

Objectives: 

Tepotinib is an ATP-competitive reversible inhibitor of the mesenchymal-epithelial transition factor (c-MET) receptor tyrosine kinase [1] currently approved for the treatment of non-small cell lung cancer (NSCLC) in patients with MET Exon 14 skipping alterations, representing 3–4% of NSCLC. While edema is known to be the most common adverse event for this class of MET inhibitors including tepotinib, there is still a limited understanding about the factors contributing to its occurrence.

This study investigates the use of Machine Learning (ML) to predict the occurrence of edema in patients receiving tepotinib treatment and assess covariates driving this prediction from a large pool of factors including patient characteristics, laboratory measures and disease covariates, and treatment-related information.

Methods: 

The pooled analysis combined data for 450 patients across five Phase I/II clinical studies (NCT01014936-001, NCT01832506-003, NCT01988493-004, NCT02115373-005, NCT02864992-0022 VISON Cohort A at cut-off date of 01-02-2021). Patient received 30–1400 mg/day tepotinib monotherapy, with the recommended starting dose of 500 mg (equivalent to 450 mg free base form) administered to 320 of them. Adverse events were coded according to the MedDRA. Their severity was graded using the NCI-CTCAE toxicity grades.

The analysis dataset included information on the patients' edema occurrence over time, dosing regimen and a set of 55 time-varying and time-invariant covariates. Tepotinib exposure was also assessed by deriving two dose-related features representing the short-term and the long-term exposure of tepotinib.

Two ML algorithms, Random Forest (RF) [2] and Gradient Boosting Trees (GBT) [3], were used to predict edemas and their severity using the collected data. The models were trained using data from 80% of the patients and their performance were evaluated using the remaining 20% of the data. Model parameter selection and optimization was performed through stratified grouped 5-fold cross validation.

To account for the longitudinal changes of the time-varying input covariates, six different feature engineering approaches were assessed with both algorithms, resulting in a total of twelve different models. Probability calibration via Isotonic Regression (IR) [4] was applied to enable the accurate estimation of the likelihood and severity of edemas. Finally, SHapley Additive exPlanations (SHAP) [5] was used to explain and assess which factors had the greatest impact on model predictions, both at the population and individual patient level.

Results: 

About 56% of patients experienced edemas of Grades 1+, with Grades 2+ assessed on less than 17% of the safety visits. Successful predictions of edema grades were obtained by leveraging longitudinal and baseline covariates in the developed ML frameworks. Good model performance was achieved across models with edema grades correctly predicted in 85–97% of patients’ visits. RF models (F1 scores 0.93–0.94) outperform GBT models (F1 scores 0.82–0.90). A sensitivity analysis to investigate the impact of the two dose-derived features suggested no deterioration of the classification performance.

The investigation of the impact of the covariates on predictions was performed by deriving SHAP values and assessing the resulting importance of features. Several longitudinal covariates appeared within the 15 most relevant covariates, with serum albumin and protein consistently found to be the most informative longitudinal covariates across all the different model settings. Lower values of these two covariates, together with higher age, were found to be associated with higher probabilities of more severe edemas (Grade 2+). Exposure metrics were mainly informative for predictions of low edema Grade 0-1.

Conclusions: 

A framework using ML algorithms was developed to predict the occurrence of edema in NSCLC patients receiving tepotinib. For both RF and GBT, six feature engineering approaches were used to exploit longitudinal covariates and assess the impact of covariates time course. All models well predicted the occurrence of edema over time with consistent performance obtained across tested settings. Probability calibration using IR allowed for accurate estimation of the likelihood of edema occurrence in patients. Finally, the use of SHAP provided an explanation of the contribution of the input factors to the model-predicted probability of edema.



References:

[1] Xiong W, et al. Exposure–response analyses for the MET inhibitor tepotinib including patients in the pivotal VISION trial: support for dosage recommendations. Cancer Chemother and Pharmacol 2022;90.1:53–69.

[2] Breiman L. Random forests. Machine Learning 2001;45.1:5–32.

[3] Chen T and Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016.

[4] Niculescu-Mizil A and Caruana R. Predicting good probabilities with supervised learning. Proceedings of the 22nd International Conference on Machine Learning. 2005.

[5] Lundberg SM and Su-In L. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 2017;30.


Reference: PAGE 31 (2023) Abstr 10481 [www.page-meeting.org/?abstract=10481]
Poster: Methodology – AI/Machine Learning
Top