Data mining analysis of survival data in cancer of pancreas : first exploratory step for identification and validation of explicative variables
M.Guery (1), G.Herbin (1), C.Pobel (1), E.Kiep (1), C.Donamaria (2)
(1) Clinical pharmacology CH Saintonge - Saintes, (2) Clinical pharmacology Institut Bergonie - Bordeaux
Objectives: Pancreatic cancer is a relative sparse pathology with few Evidence Base Medecine. Given the rapid progression of the disease, metastatic stage treatment is based on non standardized chemotherapies. This study aimed to explore explanatory variables (therapeutic strategies and physiopathological covariates) which may influence individual global survivals. These variables may therefore be included in time to event modelisation.
Methods: We explore datas from a cohort of unselected patients with metastatic pancreatic cancer. Analysis of explanatory variables with respect to the time dependant variable was performed by a two stage bootstrap method (Abdelaziz Faraj,Michel Constant, personal communication). A multivariate analysis was made on the selection of possibly discriminant variables with Datapilot software (1) (Philippe Bastien, personal communication).
Results: 42 patients contributed to the time-dependant variable values. After first one stage bootstrap analysis with Datapilot, 16 categorical variables were selected for their clinical significance: Age (more or less 65), gender, stage at diagnosis (local or metastatic), number of treatment lines (more/less than 2), first protocol schedule, prior surgery, dose reduction, platinium salt introduction, presence of Gemcitabine alone, of GEMOX protocol (Gemcitabine-Oxaliplatin), of GEMOX-Gemcitabine sequence, of outlier protocols, of erlotinib. Probability of each first category > the other range from 0,202 to 0,987 (significant difference for 3 variables). Covariance matrix estimation leads to no intragroup correlations. We explore a two stage approach that is : stage one bootstrap of observed datas (500 replications for each value to insure good estimation of CI (confidence interval), and stage two Datapilot analysis on the population of stage one. This two stage approach improve the results of single stage approach in terms of significance on the 16 variables.
Conclusions: Selection of statistical and clinical pertinent variables seems to be an interesting prerequisite to time to event modelisation under Weibull or Cox approaches. This exploratory method could also be used further during clinical trial simulation, to avoid clinical studies having a high risk of failure, and to properly design future clinical studies, specially in case of poor regulatory therapeutic references.
References:
[1] http://www.colorpilot.com/datapilot.html