Skip Navigation



Briefings in Bioinformatics Advance Access published online on February 3, 2006

Briefings in Bioinformatics, doi:10.1093/bib/bbk001
This Article
Right arrow Full Text (Rapid PDF)
Right arrow All Versions of this Article:
7/1/2    most recent
bbk001v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mitrophanov, A. Y.
Right arrow Articles by Borodovsky, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mitrophanov, A. Y.
Right arrow Articles by Borodovsky, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. For Permissions, please email: journals.permissions@oxfordjournals.org
Received July 11, 2005
Accepted November 24, 2005

Original Article

Statistical significance in biological sequence analysis

Alexander Yu Mitrophanov and Mark Borodovsky *

* To whom correspondence should be addressed.
Mark Borodovsky, E-mail: mark.borodovsky{at}biology.gatech.edu


   Abstract

One of the major goals of computational sequence analysis is to find sequence similarities, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations among the sequences. Since the degree of similarity is usually assessed by the sequence alignment score, it is necessary to know if a score is high enough to indicate a biologically interesting alignment. A powerful approach to defining score cutoffs is based on the evaluation of the statistical significance of alignments. The statistical significance of an alignment score is frequently assessed by its P-value, which is the probability that this score or a higher one can occur simply by chance, given the probabilistic models for the sequences. In this review we discuss the general role of P-value estimation in sequence analysis, and give a description of theoretical methods and computational approaches to the estimation of statistical signifiance for important classes of sequence analysis problems. In particular, we concentrate on the P-value estimation techniques for single sequence studies (both score-based and score-free), global and local pairwise sequence alignments, multiple alignments, sequence-to-profile alignments and alignments built with hidden Markov models. We anticipate that the review will be useful both to researchers professionally working in bioinformatics as well as to biomedical scientists interested in using contemporary methods of DNA and protein sequence analysis.

Keywords: sequence analysis; pairwise alignment; multiple alignment; profile; probabilistic model; statistical significance; P-value; E-value.

Alexander Yu. Mitrophanov is a postdoctoral fellow at the School of Biology, Georgia Institute of Technology. His research interests include applications of probabilistic methods in different areas of bioinformatics and computational biology.

Mark Borodovsky is a Regents’ Professor at the School of Biology, Georgia Institute of Technology, and the Wallace H. Coulter Department of Biomedical Engineering at Georgia Institute of Technology and Emory University. His research interests include development of statistical methods for biological sequence analysis and identifying functionally important features of DNA and proteins in the context of cell function and evolution.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.