Exact mapping of prokaryotic gene starts
PhD student at the Institute of Molecular Biology, RAS. His research is in the area of gene recognition.
Director for Science, Integrated Genomics, Moscow. His research interests are comparative genomics, genome annotation, analysis of regulation of gene expression and gene recognition.
Director for Technology, Integrated Genomics, Moscow. His research interests are the creation of algorithms for sequence and structure alignments, software development and genome annotation
M. S. Gelfand, Integrated Genomics Moscow, PO Box 348, Moscow 117333, Russia Tel: +7 (095) 135 20 41 Fax: +7 (095) 132 60 80 E-mail: gelfand{at}integratedgenomics.ru
It is known that while the programs used to find genes in prokaryotic genomes reliably map protein-coding regions, they often fail in the exact determination of gene starts. This problem is further aggravated by sequencing errors, most notably insertions and deletions leading to frame-shifts. Therefore, the exact mapping of gene starts and identification of frame-shifts are important problems of the computer-assisted functional analysis of newly sequenced genomes. Here we review methods of gene recognition and describe a new algorithm for correction of gene starts and identification of frame-shifts in prokaryotic genomes. The algorithm is based on the comparison of nucleotide and protein sequences of homologous genes from related organisms, using the assumption that the rate of evolutionary changes in protein-coding regions is lower than that in non-coding regions. A dynamic programming algorithm is used to align protein sequences obtained by formal translation of genomic nucleotide sequences. The possibility of frame-shifts is taken into account. The algorithm was tested on several groups of related organisms: gamma-proteobacteria, the Bacillus/Clostridium group, and three Pyrococcus genomes. The testing demonstrated that, dependent on a genome, 110 per cent of genes have incorrect starts or contain frame-shifts. The algorithm is implemented in the program package Orthologator-GeneCorrector.
Keywords: gene, genomics, gene recognition, reading frame, start of translation, computer analysis, prokaryotes