Application of Item Response Theory in Early Phase Clinical Trials: Utilization of a Reference Model to Analyse the Montgomery-Åsberg Depression Rating Scale.
M.E. Otto (1,2), K. Bergmann (1), G. Jacobs (1,2), M.J. van Esdonk (1)
(1) Centre for Human Drug Research (CHDR), Leiden, The Netherlands, (2) Department of Psychiatry, Leiden University Medical Centre (LUMC), Leiden, The Netherlands
Objectives: Item response theory (IRT) has shown to be a valuable asset for the analysis of (late phase) clinical trials with questionnaire-based outcomes as pharmacodynamic endpoints in the field of pharmacometrics1. IRT leverages all information available from the individual item responses of such questionnaires and transforms this into a latent variable (ψ) which represents the underlying disease severity, thereby increasing power to detect a drug effect compared to standard analyses using composite/total scores. However, IRT model development in early phase clinical trials remains challenging due to the limited amount of available data in these trials2,3. Therefore, the aim of this study was to judge the underlying assumptions and applicability of a reference IRT model for the analysis of a small early phase clinical dataset investigating ketamine in Major Depressive Disorder (MDD) patients.
Methods: Data from a randomized, double-blind, placebo-controlled, cross-over study investigating the antidepressant effects of ketamine in 17 patients with treatment-resistant MDD was available. Patients received a single dose of racemic ketamine or placebo via intravenous infusion over 40 minutes. Montgomery-Åsberg Depression Rating Scale (MADRS) questionnaires were filled out at pre-dose, 100 mins, 24h and 1w after dosing during both treatment occasions and at follow-up (2w). The MADRS consisted of 10 items and possible responses could range between 0-6. Two previously developed IRT models4, based on populations of (1) treatment-resistant patients with MDD (n=208) and bipolar disorder (n=25) or (2) non-treatment resistant MDD patients (n=985), were used as reference models to determine ψ for each individual measurement. Multiple approaches of determining ψ were investigated with the first reference model, in which (a) ψ does not follow any pre-specified distribution, (b) ψ is normally distributed and separate distributions (mean and variance) were estimated for each measurement of each treatment occasion, and (c) ψ is part of the distribution of the reference population (mean of 0 and variance of 1). The absolute difference in ψ between approaches was calculated for each observation and approaches were deemed not significantly different if the 95% confidence interval (CI) included 0. The impact of using a different reference model for the determination of ψ was evaluated with the use of the second reference model. Additionally, the drug effect of ketamine on the MADRS as measured through the composite score was compared to the use of ψ with a linear mixed model analysis of variance. Determination of ψ was done with NONMEM (7.5) using the Laplacian estimation method and further analysis was done in R (4.0.3).
Results: A total of 147 observations were available. Differences in ψ between approaches were statistically significant for approach (a) and (b) versus approach (c) (mean [95%CI] of -0.029[-0.043,-0.015] and -0.022[-0.039,-0.005] respectively), but not between approach (a) and (b) (0.007[-0.021,0.035]). Overall, the absolute difference in ψ between approaches was small in relation to the range of estimated ψ values [-3.152,1.465] and individual profiles were highly similar over time. However, approach (a) estimated notably lower ψ values for questionnaires whose items were mostly answered with 0, whereas approach (b) resulted in more conservative (i.e. closer to 0) ψ values, although multiple distribution parameters of this approach (b) were estimated with relative standard errors (RSE) >50%. The use of a second reference population resulted in a significant overall mean increase in ψ of 0.53 (95%CI [0.52,0.54]), but the individual trend over time remained similar. Lastly, mixed model analysis showed a significant treatment effect with the composite score (p=0.0009), but significance improved further when using ψ ((a): p=0.0001, (b) and (c): p<0.0001). Use of the second reference population resulted in a minimally different p-value (p=0.0002).
Conclusions: Reference IRT-models can be used to transform questionnaire data to a measure of disease severity in early phase clinical trials, resulting in higher sensitivity to detect treatment effect compared to standard analyses based on composite scores. Assuming study data is part of the reference distribution, approach (c), averts estimation of parameters with scarce data, yet gives similar results and enables the application of IRT in early phase clinical trials.
References:
[1] Ueckert, S. et al. Improved utilization of ADAS-Cog assessment data through item response theory based pharmacometric modeling. Pharm. Res. 31, 2152–2165 (2014).
[2] Houts, C. R., Morlock, R., Blum, S. I., Edwards, M. C. & Wirth, R. J. Scale development with small samples: a new application of longitudinal item response theory. Qual. Life Res. 27, 1721–1734 (2018).
[3] Ueckert, S. Modeling Composite Assessment Data Using Item Response Theory. CPT Pharmacometrics Syst. Pharmacol. 7, 205–218 (2018).
[4] Carmody, T. J. et al. The Montgomery Äsberg and the Hamilton ratings of depression: A comparison of measures. Eur. Neuropsychopharmacol. 16, 601–611 (2006).