The performance of model selection criteria in the absence of a fixed-dimensional correct model
Olofsen, E
Department of Anesthesiology, Leiden University Medical Center, The Netherlands
Objectives: Akaike's information-theoretic criterion (AIC) for model discrimination is often stated to "overfit", i.e., it selects models with a higher dimension than the dimension of the model that generated the data. However, when no fixed-dimensional correct model exists, for example for pharmacokinetic data, AIC (or its bias-corrected version AICc) might be the selection criterion of choice if the objective is to minimize prediction error [1,2]. The present simulation study was designed to assess the behavior of AIC and other criteria under this type of model misspecification, for various sample sizes and measurement noise levels.
Methods: Data y(j) = 1/t(j) were simulated at M sampling times t(j) in the range (0.1,10), contaminated by measurement noise with constant relative error variance. A power function was chosen because it often fits pharmacokinetic data well, and it can be approximated by a sum of exponentials [3]. M rate constants were taken as k(j) = 1/t(j), and N of those were chosen by forward selection; the coefficients of the exponential series were determined using weighted linear regression with weights w(j) = t2(j). When N = M, an exact fit is possible; when N is large, the "measurement noise is being modeled" so that the prediction error variance increases. The selection criteria evaluated were: AIC, AICc, BIC, likelihood ratio tests with P<0.05 and P<0.01, and their performance assessed under the objective of minimum prediction error (using validation data sets). They were evaluated for different sample sizes, measurement error variances, and averaged across a set of simulation trials.
Results: At small noise levels, all criteria selected models with a smaller dimension than those that minimized prediction error. With higher noise levels and sample sizes these discrepancies diminished. Models selected by AIC were usually the closest to those that minimized prediction error. At small sample sizes, the AICc usually performed worse than AIC. Models selected by the likelihood ratio tests were usually too small.
Conclusions: Under the circumstances of the present simulation study, AIC performed well, and no worse than competing selection criteria. The fact that all criteria performed badly at small noise levels is most likely related to violations of assumptions underlying the derivation of the criteria, such as uncorrelated residuals.
References:
[1] Shao, J, An asymptotic theory for linear model selection, Statist Sin, 7:221, 1997
[2] Burnham KP, and Anderson, DR, Multimodel inference: understanding AIC and BIC in model selection, Sociol Meth Res, 33:261, 2004
[3] Norwich, KH, Noncompartmental models of whole-body clearance of tracers: a review, Ann Biomed Eng, 25:421, 1997