Briefings in Bioinformatics Advance Access published online on February 3, 2006
Briefings in Bioinformatics, doi:10.1093/bib/bbk001
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
* To whom correspondence should be addressed. One of the major goals of computational sequence analysis is to find sequence similarities, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations among the sequences. Since the degree of similarity is usually assessed by the sequence alignment score, it is necessary to know if a score is high enough to indicate a biologically interesting alignment. A powerful approach to defining score cutoffs is based on the evaluation of the statistical significance of alignments. The statistical significance of an alignment score is frequently assessed by its P-value, which is the probability that this score or a higher one can occur simply by chance, given the probabilistic models for the sequences. In this review we discuss the general role of P-value estimation in sequence analysis, and give a description of theoretical methods and computational approaches to the estimation of statistical signifiance for important classes of sequence analysis problems. In particular, we concentrate on the P-value estimation techniques for single sequence studies (both score-based and score-free), global and local pairwise sequence alignments, multiple alignments, sequence-to-profile alignments and alignments built with hidden Markov models. We anticipate that the review will be useful both to researchers professionally working in bioinformatics as well as to biomedical scientists interested in using contemporary methods of DNA and protein sequence analysis. Alexander Yu. Mitrophanov is a postdoctoral fellow at the School of Biology, Georgia Institute of Technology. His research interests include applications of probabilistic methods in different areas of bioinformatics and computational biology. Mark Borodovsky is a Regents’ Professor at the School of Biology, Georgia Institute of Technology, and the Wallace H. Coulter Department of Biomedical Engineering at Georgia Institute of Technology and Emory University. His research interests include development of statistical methods for biological sequence analysis and identifying functionally important features of DNA and proteins in the context of cell function and evolution.
Received July 11, 2005
Accepted November 24, 2005
Original Article
Statistical significance in biological sequence analysis
Alexander Yu Mitrophanov
and
Mark Borodovsky *
Mark Borodovsky, E-mail: mark.borodovsky{at}biology.gatech.edu
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?