Briefings in Bioinformatics Advance Access originally published online on March 15, 2008
Briefings in Bioinformatics 2008 9(3):210-219; doi:10.1093/bib/bbn010
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pfam 10 years on: 10 000 families and still growing
Corresponding author. Alex Bateman, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK. Tel: 44-1223-494950; Fax: 44-1223-494919; E-mail: agb{at}sanger.ac.uk
Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10 000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28 000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.
Keywords: Pfam, protein families, classification, coverage, hidden Markov model
Submitted: October 15, 2007. Received (in revised form): February 6, 2008.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. Mochida, T. Yoshida, T. Sakurai, K. Yamaguchi-Shinozaki, K. Shinozaki, and L.-S. P. Tran In silico Analysis of Transcription Factor Repertoire and Prediction of Stress Responsive Transcription Factors in Soybean DNA Res, December 1, 2009; 16(6): 353 - 369. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schlicker and M. Albrecht FunSimMat update: new features for exploring functional similarity Nucleic Acids Res., November 18, 2009; (2009) gkp979v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Levitt Nature of the protein universe PNAS, July 7, 2009; 106(27): 11079 - 11084. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Skolnick and M. Brylinski FINDSITE: a combined evolution/structure-based approach to protein function prediction Brief Bioinform, July 1, 2009; 10(4): 378 - 391. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Rose, S. Lorenzen, A. Goede, B. Gruening, and P. W. Hildebrand RHYTHM--a server to predict the orientation of transmembrane helices in channels and membrane-coils Nucleic Acids Res., July 1, 2009; 37(suppl_2): W575 - W580. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. G. Pell, V. Kanelis, L. W. Donaldson, P. Lynne Howell, and A. R. Davidson The phage {lambda} major tail protein structure reveals a common evolution for long-tailed phages and the type VI bacterial secretion system PNAS, March 17, 2009; 106(11): 4160 - 4165. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Diella, S. Chabanis, K. Luck, C. Chica, C. Ramu, C. Nerlov, and T. J. Gibson KEPE--a motif frequently superimposed on sumoylation sites in metazoan chromatin proteins and transcription factors Bioinformatics, January 1, 2009; 25(1): 1 - 5. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lima, A. H. Auchincloss, E. Coudert, G. Keller, K. Michoud, C. Rivoire, V. Bulliard, E. de Castro, C. Lachaize, D. Baratin, et al. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot Nucleic Acids Res., January 1, 2009; 37(suppl_1): D471 - D478. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rajasekaran, S. Balla, P. Gradie, M. R. Gryk, K. Kadaveru, V. Kundeti, M. W. Maciejewski, T. Mi, N. Rubino, J. Vyas, et al. Minimotif miner 2nd release: a database and web system for motif search Nucleic Acids Res., January 1, 2009; 37(suppl_1): D185 - D190. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Rigden and M. Y. Galperin Sequence analysis of GerM and SpoVS, uncharacterized bacterial 'sporulation' proteins with widespread phylogenetic distribution Bioinformatics, August 15, 2008; 24(16): 1793 - 1797. [Abstract] [Full Text] [PDF] |
||||




