Briefings in Bioinformatics Advance Access originally published online on April 3, 2009
Briefings in Bioinformatics 2009 10(5):537-546; doi:10.1093/bib/bbp016
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Development of biomarker classifiers from high-dimensional data
Corresponding author. Dr James J. Chen, HFT-20, Jefferson, AR 72079, USA. Tel: +1-870-543-7007; Fax: +1-870-543-7662; E-mail: jamesj.chen{at}fda.hhs.gov
Recent development of high-throughput technology has accelerated interest in the development of molecular biomarker classifiers for safety assessment, disease diagnostics and prognostics, and prediction of response for patient assignment. This article reviews and evaluates some important aspects and key issues in the development of biomarker classifiers. Development of a biomarker classifier for high-throughput data involves two components: (i) model building and (ii) performance assessment. This article focuses on feature selection in model building and cross validation for performance assessment. A frequency approach to feature selection is presented and compared to the conventional approach in terms of the predictive accuracy and stability of the selected feature set. The two approaches are compared based on four biomarker classifiers, each with a different feature selection method and well-known classification algorithm. In each of the four classifiers the feature predictor set selected by the frequency approach is more stable than the feature set selected by the conventional approach.
Keywords: class prediction, cross-validation, feature selection, frequency of selection, stable feature set
Submitted: January 14, 2009. Received (in revised form): February 21, 2009.