Parallel Processing for Nonlinear Mixed Effects Modeling
Leary, Robert H. and Dunlavey, Michael
Pharsight Corporation, Cary, North Carolina, USA
Background: Nonlinear mixed effects (NLME) modeling can be computationally intensive, particularly if the underlying PK/PD models are defined by differential equations that must be solved numerically, and/or accurate likelihoods based on MCMC, numerical integration, or nonparametric methods are used. Excessive wall clock time to completion of an analysis or series of analyses often dictates compromises in the analytic quality of the results, such as substitution of an FO for FOCE analysis, or FOCE for accurate likelihood parametric or nonparametric analysis. Grid computing can address some computationally intense NLME contexts such as bootstrapping by scheduling the resultant independent jobs across a possibly heterogeneous computational grid without reprogramming of the underlying serial analysis program. However, speedup of individual analyses requires parallelization of the underlying application. In this talk we describe the parallelization strategies and achieved performance improvements for new parallel implementations of FOCE, adaptive Gaussian quadrature (AGQ), and nonparametric NLME analytic methods on a variety of architectures ranging from dual processor Windows-based PCs to supercomputers.
Methods: New implementations of parametric FOCE, AGQ, and nonparametric NLME algorithms have been designed and written in Fortran/C++ with both parallel and serial execution capabilities. The FOCE and AGQ variants are novel, while the nonparametric method is an extension of the NPOD algorithm [1] with an initiation of support point positions at the Nsubjects points defined by the individual subject MAP Bayesian estimates from a preceding FOCE analysis.
The parallelization strategies for all methods take advantage of the independence of the likelihood computations over different subjects, which can thus be performed on separate processors and combined with a global summation operation. The computation of an individual likelihood is a relatively large granularity task, particularly if this involves the numerical solution of a differential equation, and thus this approach can achieve good parallel efficiencies even on distributed memory architectures. Additional opportunities will be described for exploitable parallelism and load balancing that arise in the more computationally intense algorithms such as the nonparametric and AGQ methods, and in particular for a nonparametric bootstrap, but these have yet to be implemented.
Specific user-defined models are written in a high-level NLME modeling language that is translated into a C-language model subroutine, which is then compiled and linked with the analytical engine. All parallelization occurs at a level above the model subroutine, which is thus independent of the target architecture.
Implementations of the parallel algorithms use OpenMP for symmetric multiprocessor systems, and the MPI (Message Passing Interface) library for distributed memory parallel computers.
Results: Achieved parallel speedups depend on the specifics of the parallel architecture, algorithm, model and data set, but often were observed to reach significant fractions of the theoretical limit Nprocessors. Best results in terms of load balancing and parallel efficiency are achieved when Nprocessors is a relatively small fraction of Nsubjects. For example, a 234-subject nonparametric analysis achieved a speedup of over 16X on a 24-processor distributed memory cluster, while the FOCE and AGQ algorithm often showed speedups approaching the processor count on small (2-8 processors) symmetric multiprocessor systems.
Conclusion: Parallelization of NLME methods based on distributing the various individual likelihood computations across separate processors is an effective and practical method to achieve significant speedups of computationally intensive NLME analyses.
Reference: 
Leary, R., Jelliffe , R., Schumitzy, A. and Van Guilder, M, Improved Computational Methods for Statistically Consistent and Efficient PK/PD Analysis, PAGE 12 (2003), Abstract 421.
