Briefings in Bioinformatics Advance Access originally published online on February 29, 2008
Briefings in Bioinformatics 2008 9(2):119-128; doi:10.1093/bib/bbn008
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Machine learning methods for predictive proteomics
Corresponding author. Cesare Furlanello, FBK, via Sommarive 18, I-38100 Povo (Trento), Italy. Tel: +39-0461-314580; Fax: +39-0461-314591; E-mail: furlan{at}fbk.eu
The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 103 times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies.
Keywords: proteomics, selection bias, feature selection, functional profiling
Submitted: September 14, 2007. Received (in revised form): January 25, 2008.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A.-L. Boulesteix and M. Slawski Stability and aggregation of ranked gene lists Brief Bioinform, September 1, 2009; 10(5): 556 - 568. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Lancashire, C. Lemetre, and G. R. Ball An introduction to artificial neural networks in bioinformatics--application to complex microarray and mass spectrometry datasets in cancer studies Brief Bioinform, May 1, 2009; 10(3): 315 - 329. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gertheiss and G. Tutz Supervised feature selection in mass spectrometry-based proteomic profiling by blockwise boosting Bioinformatics, April 15, 2009; 25(8): 1076 - 1077. [Abstract] [Full Text] [PDF] |
||||

