RDarwin: An R package for Machine Learning using pyDarwin and Certara’s pharmacometrics modeling language (PML) and NLME engine
Keith Nieforth (1), Mark Sale (1), James Craig (1), Michael Tomashevskiy(1)
(1) Certara
Introduction/Objectives: RDarwin (1) is an R package that provides a set of R functions to enable setup and execution of pyDarwin searches from the R command line using Certara’s pharmacometrics modeling language (PML) and NLME engine. pyDarwin (2) is an open source Python package for machine learning based population pharmacokinetic/pharmacodynamic (pk/pd) model selection developed under a grant from, and in collaboration with FDA. RDarwin is open source, fully documented and developed using “best practice” guidelines for R package development. Machine learning based model selection methods continue to evolve and show promise as a robust, efficient, less labor intensive means for pharmacometrics model development.
The objective is to demonstrate how to set up and execute a pyDarwin search using the RDarwin package functions. Search results will be summarized using the companion DarwinReporter R package, producing a report shell (in HTML, PDF, or Word format) as a starting point for a report.
Five machine learning algorithms are available for model selection. These are:
- Genetic algorithm
- Random Forest
- Random Tree with Gradient boosting
- Gaussian process
- Particle Swarm optimization
It has been found that the machine learning algorithms alone will provide a global search, identifying the “generally good” regions of the search space, but are inefficient at making the final, individual changes that will select the true optimal. This “local search” is more efficiently done by a “downhill search” that is similar to the traditional forward addition/backward elimination model selection method. These two methods (machine learning and local downhill search) are used in tandem to generate a robust and efficient search method.
Methods: A brief overview of machine learning model selection will be provided followed by a demonstration of RDarwin and DarwinReporter packages covering the following topics:
- Installation of RDarwin and DarwinReporter using the Certara.R CRAN package
- Definition of the model search space
- o Prerequisites
- Dataset requirements
- o Creation of base model structures
- o Specifying search options for random effects
- o Specifying search options for residual error
- o Adding covariates to the search space
- o Setting initial parameter estimates
- o Prerequisites
- Translating search space object into required pyDarwin input files
- o Creating template.txt and tokens.json
- Specifying pyDarwin search methods
- o Selecting search algorithm and options
- o Specifying model fitness penalty scores
- o Specifying downhill search step options
- o Creating options.json file
- Executing pyDarwin search
- Evaluating search results
- o Creating Darwin database object and passing to diagnostics toolsets
- Individual diagnostic functions
- DarwinReporter user interface
- o Generate VPC for key models
- o Creating Darwin database object and passing to diagnostics toolsets
Results: The demonstration provides a comprehensive overview of how to execute a pyDarwin machine learning model search from the R command line using Certara’s PML language and NLME engine.
Conclusions: R Darwin simplifies the process of creating and executing machine learning based model searches using pyDarwin.
References:
[1] https://certara.github.io/R-Certara/index.html#rdarwin-
[2] https://doi.org/10.1002/cpt.3114