2014 - Alicante - Spain

PAGE 2014: Methodology - Model Evaluation
Rikard Nordgren

Automatic binning for visual predictive checks

Christian Sonehag (1), Niklas Olofsson (1), Rasmus Simander (1), Rikard Nordgren (2), Kajsa Harling (2)

(1) Department of Scientific Computing, Uppsala University, Uppsala, Sweden, (2) Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden

Objectives: Visual predictive checks [1] require binning of the observations in the dimension of the independent variable. An automatic binning algorithm has previously been proposed [2]. In this study we explore a novel algorithm based on K-means clustering with a data density function to penalize adding a bin edge where the data is dense, and implement the algorithm in PsN [3].

Methods: For a given number of bins (K) the algorithm seeks to find the bin edges that minimizes the objective function O(K) = sum(W) + alpha*sum(Phi(e_i)), where sum(W) is the sum of within-bin variabilities, sum(Phi(e_i)) is the sum of the data density function values at the bin edges e_i, and alpha is a scaling factor. The data density function Phi is obtained by kernel density estimation using a Gaussian kernel [4]. The bandwidth for the kernel is chosen based on the optimal bandwidth for Gaussian data [4] but reduced by a factor F if the bin data appears non-Gaussian, as indicated by the kurtosis, increasing the resolution of Phi and decreasing the penalty for moving an edge into this area. Phi is computed based on an initial binning, were O_0 = sum(W) is minimized. The same optimization algorithm is used both for minimizing sum(W) to get an initial binning and for minimizing O(K) after Phi is fixed. The algorithm iterates between moving bin edges one by one within its two neighbors and taking out edges and placing them elsewhere. When the objective function cannot be reduced any further it stops. Finally the optimal K has to be selected. K that minimized O(K), the K that maximized the function used in a method proposed by Calinski and Harabaszs [5], and the ratio between the two were tried. The algorithm was run on test data and the resulting vpc plots were judged by a panel of experienced modelers to obtain reasonable values on the different parameters.

Results: The best values of the different parameters of the algorithm were judges to be alpha = 7.8*argmax_k(W), cutoff C=2.5 to classify the kurtosis as Gaussian/non-Gaussian, factor F=0.25 as the bandwidth reduction factor for non-Gaussian bin data, and the ratio between the objective function and the Calinski and Harabaszs function as the best K selection criteria.

Conclusions: We have developed an automatic binning algorithm that allows the modeler to quickly obtain a binning for VPCs. The user can either let the algorithm perform both bin edge placement and K selection, or let it place the bin edges given a user-selected K.



References:
[1] Karlsson M. and Holford N. A Tutorial on Visual Predictive Checks, PAGE Meeting, Marseille, 2008.
[2] M. Lavielle, K. Bleakley, Automatic data binning for improved visual diagnosis of pharmacometric models, J Pharmacokinet Pharmacodyn. 2011 Dec;38(6):861-71
[3] Lindbom L, Ribbing J, Jonsson EN. Perl-speaks-NONMEM (PsN)--a Perl module for NONMEM related programming. Comput Methods Programs Biomed. 75(2):85-94.
[4] Silverman, B.W. Density Estimation for Statistics and Data Analysis, 1998.
[5] G. Milligan, M. Cooper, An examination of procedures for determining the number of clusters in a data set, Psychometrika-vol. 50, no 2, 159- 179. June 1985


Reference: PAGE 23 (2014) Abstr 3085 [www.page-meeting.org/?abstract=3085]
Poster: Methodology - Model Evaluation
Click to open PDF poster/presentation (click to open)
Top