Skip Navigation



Briefings in Bioinformatics Advance Access published online on February 29, 2008

Briefings in Bioinformatics, doi:10.1093/bib/bbn008
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Supplementary Data
Right arrow All Versions of this Article:
9/2/119    most recent
bbn008v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Barla, A.
Right arrow Articles by Furlanello, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Barla, A.
Right arrow Articles by Furlanello, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. For Permissions, please email: journals.permissions@oxfordjournals.org

Machine learning methods for predictive proteomics

Annalisa Barla, Giuseppe Jurman, Samantha Riccadonna, Stefano Merler, Marco Chierici and Cesare Furlanello

Corresponding author. Cesare Furlanello, FBK, via Sommarive 18, I-38100 Povo (Trento), Italy. Tel: +39-0461-314580; Fax: +39-0461-314591; E-mail: furlan{at}fbk.eu

The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 103 times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Suport Vector Machine (SVM) or feature ranking methods recursive feature elimination (RFE) or I-Relief. A procedure for assessing stability and predictive value of the resulting biomarkers’ list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies.

Keywords: proteomics, selection bias, feature selection, functional profiling

Submitted: September 14, 2007. Received (in revised form): January 25, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
A.-L. Boulesteix and M. Slawski
Stability and aggregation of ranked gene lists
Brief Bioinform, September 1, 2009; 10(5): 556 - 568.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
L. J. Lancashire, C. Lemetre, and G. R. Ball
An introduction to artificial neural networks in bioinformatics--application to complex microarray and mass spectrometry datasets in cancer studies
Brief Bioinform, May 1, 2009; 10(3): 315 - 329.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Gertheiss and G. Tutz
Supervised feature selection in mass spectrometry-based proteomic profiling by blockwise boosting
Bioinformatics, April 15, 2009; 25(8): 1076 - 1077.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.