Skip Navigation


Briefings in Bioinformatics Advance Access originally published online on March 3, 2007
Briefings in Bioinformatics 2007 8(2):129-133; doi:10.1093/bib/bbm005
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
8/2/129    most recent
bbm005v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jordan, I. K.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Jordan, I. K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. For Permissions, please email: journals.permissions@oxfordjournals.org

Abstracts

Briefings in Bioinformatics aims to provide working biologists with an awareness and understanding of the computational approaches available for research and discovery. The Abstracts section of the journal consists of summaries of bioinformatics manuscripts published in the preceding 2 months. Inclusion of an article in this section indicates that the editors consider it to be among the most interesting and/or useful contributions to the field for the quarter covered. The contents of these reports are briefly distilled for the readers with an emphasis placed on their biological context and potential utility. Publications from December 2006 and January 2007 are reviewed here.


    Dynamic usage of transcription start sites within core promoters
 TOP
 Dynamic usage of transcription...
 Multiple independent...
 Origins and impact of...
 Relating three-dimensional...
 Mammalian small nucleolar RNAs...
 Exploring genomic dark matter:...
 
Hideya Kawaji, Martin C. Frith, Shintaro Katayama, Albin Sandelin, Chikatoshi Kai, Jun Kawai, Piero Carninci and Yoshihide Hayashizaki
Genome Biology (2006) Vol. 7, No. 12, p. R118
The (post)-genomics era in biology has often been marked by unexpected discoveries driven by new technologies. This phenomenon is exemplified by the cap analysis of gene expression (CAGE) technique. Recent results of CAGE surveys on the human and mouse genomes have the potential to fundamentally alter our understanding of eukaryotic gene regulation. Kawaji et al. have performed an illuminating computational follow-up on previous CAGE experimental studies to further explore the implications of these data. CAGE is a high-throughput sequence-tag based technique that allows for the simultaneous identification of transcriptional start sites (TSSs) and measuring of gene expression levels. Briefly, CAGE employs the characterization of 20–21 base pair sequence tags that correspond to the very 5' ends of full length cDNAs. Analyzing thousands of CAGE tags reveals the expression profiles of numerous TSSs. The most surprising result from the large scale application of the CAGE technique to the human and mouse genomes was the finding that mammalian transcription is almost never initiated from a single precisely defined TSS. Rather, genes are transcribed from an array of different TSSs; in some cases these TSSs may be tightly clustered and in others they may be broadly distributed. Kawaji et al. sought to address the question of whether and how the selection of TSSs is regulated. To do this, they framed two hypotheses with mutually exclusive expectations regarding the use of alternative TSSs across different tissues. Their null hypothesis rested on the supposition that if TSS selection is governed mainly by genomic sequence context, then it would not change appreciably between different tissue or cell types. Alternatively, if TSS selection is differentially regulated, whether by different transcription factors or epigenetic modifications, then TSSs may differ systematically between tissues. To discriminate between these two hypotheses, they analyzed CAGE data from 22 different mouse tissues. This revealed evidence for tissue-specific TSS utilization for about half of all CAGE tag clusters analyzed. In other words, TSS selection is clearly regulated for numerous genes. Furthermore, the fraction of genes with regulated TSS usage is likely to be an underestimate since the number of different tissues they analyzed is in no way exhaustive. Genes that are prone to differential TSS usage include those with CpG island-containing promoters as well as genes with multi modal promoter structures characterized by two or more peaked CAGE tag clusters. For the most part, the specific regulatory mechanisms that give rise to tissue-specific TSS use lie beyond the reach of the computational approach employed here. However, one intriguing possibility arises from the authors’ demonstration that numerous imprinted genes (i.e. genes with different expression according to the parent of origin) use alternate TSSs. This finding points to a connection between TSS selection and epigenetic modifications such as methylation and/or chromatin modification.


    Multiple independent evolutionary solutions to core histone gene regulation
 TOP
 Dynamic usage of transcription...
 Multiple independent...
 Origins and impact of...
 Relating three-dimensional...
 Mammalian small nucleolar RNAs...
 Exploring genomic dark matter:...
 
Leonardo Mariño-Ramírez, I. King Jordan and David Landsman
Genome Biology (2006) Vol. 7, No. 12, p. R122
It has long been recognized that changes in gene regulation probably underlie many important aspects of phenotypic evolutionary divergence. However, it has been difficult to systematically analyze the genetic bases of changes in gene regulation and expression due to a lack of comparative data. The increasing availability of genomic sequence data from numerous species, combined with accumulating experimental data on gene regulation, is providing new opportunities to study the evolution of gene regulation. In this manuscript, Mariño-Ramírez and colleagues report surprisingly paradoxical results regarding the evolution of the core histone gene regulatory mechanisms. There are four families of core histone genes, which together encode the proteins that assemble as an octamer to form the nucleosome—the fundamental unit of eukaryotic chromatin. Core histone genes are deeply conserved across eukaryotes having changed relatively little in terms of both sequence and expression pattern from the yeast Saccharomyces cerevisiae to human. Core histone genes show periodic expression across the eukaryotic cell cycle with a pronounced peak during S-phase. This S-phase specific expression pattern allows for histone proteins to be produced at the same time DNA is being synthesized so that it can be readily bound by nucleosomes and compacted into chromatin. The authors of this report show that, despite the conservation of core histone gene expression patterns, the cis-trans regulatory machinery that controls core histone gene expression has changed greatly among eukaryotic evolutionary lineages. Specifically, the identity of the core histone gene cis-regulatory sequence motifs and the trans protein factors that bind them are distinct for the yeasts S. cerevisiae and Schizosaccharomyces pombe as well as for other fungi, plants, insects and mammals. In other words, while core histone gene expression has remained unchanged since these species last shared a common ancestor, the way these genes are regulated has been constantly reinvented along different evolutionary lineages. In addition to this rapid turnover of regulatory mechanisms, the authors also show that within evolutionary lineages different core histone gene families employ basically the same regulatory machinery. This is the exact opposite of the evolutionary pattern seen for core histone gene (protein) sequences where members of the same gene family are more similar to one another than they are to members of other families found in the same species. Thus, the regulatory machinery has changed between evolutionarily lineages and has been homogenized within lineages. Taken together these results underscore just how dynamic the evolution of gene regulation can be. Regulatory systems are somehow able to undergo wholesale changes even in the face of pressure to maintain the same expression patterns. The concerted evolution of the regulatory systems within lineages suggests that this mode of evolution may be due to epistatic pressures that are exerted on regulatory components that play multiple roles.


    Origins and impact of constraints in evolution of gene families
 TOP
 Dynamic usage of transcription...
 Multiple independent...
 Origins and impact of...
 Relating three-dimensional...
 Mammalian small nucleolar RNAs...
 Exploring genomic dark matter:...
 
Boris E. Shakhnovich and Eugene V. Koonin
Genome Research (2006) Vol. 16, No. 12, pp. 1529–1536
Decades of research in molecular evolution have shown that rates of gene (protein) evolution are largely determined by purifying natural selection, which consists of the removal of deleterious variants. However, the biological source of this selective constraint was not systematically evaluated until recently. This advance was made possible by the combination of complete genome sequences of related organisms and high-throughput experimental data such as protein interaction data, gene expression profiles and gene dispensability measures. Using thousands of sequence comparisons and large scale functional data sets, researchers performed many correlations between genes’ evolutionary rates and their functional characteristics. Numerous, relatively weak, correlations were found. For instance, essential genes were found to evolve more slowly than nonessential genes, more highly expressed genes were shown to evolve more slowly than genes with lower expression levels, and it was demonstrated that proteins that interact with multiple partners evolve more slowly than those with fewer interactions. However, it quickly became apparent that many of these functional variables were also correlated with one another. Highly expressed genes tend to have more interaction partners and be more essential than lowly expressed genes. These cross-connections together with the general weakness observed for the correlations between evolutionary rates and functional measures have obscured the biological significance of these results. Shakhnovich and Koonin have uncovered a novel determinant of selective constraint related to gene family evolution that may prove to be one of the strongest determinants of evolutionary rates known to date. To do this, they partitioned families of related genes into two classes: those that contain at least one essential gene (E-families) and those that contain no essential genes (N-families). The members of N-families were found to be subject to weaker selective constraint, and as a result they are more likely to be fixed and/or to become pseudogenes with no function. The E-families, on the other hand, are subject to greater selective constraint. This means that the genes survive for longer periods of evolutionary time, and accordingly, the E-families accumulate greater evolutionary divergence among their members. Included in this divergence are changes in the upstream regulatory regions that control the expression patterns of E-family genes. Indeed, E-family genes are far less likely to share transcription factor binding sites than N-family genes. This finding is consistent with the evolution of novel functions by E-family genes. The authors conclude that E-family genes typically evolve through the acquisition of new or altered functions while N-family genes lose functionality.


    Relating three-dimensional structures to protein networks provides evolutionary insights
 TOP
 Dynamic usage of transcription...
 Multiple independent...
 Origins and impact of...
 Relating three-dimensional...
 Mammalian small nucleolar RNAs...
 Exploring genomic dark matter:...
 
Philip M. Kim, Long J. Lu, Yu Xia and Mark B. Gerstein
Science (2006) Vol. 314, No. 5807, pp. 1938–1941
Biological function is carried out by complex networks of interacting players including, but not limited to, genes, proteins, metabolites and even individual organisms. In the last few years, the analytical tools of graph theory have been brought to bear on these biological networks resulting in some unexpected findings. Perhaps the most intensively studied networks have been those comprised of interacting proteins—nodes in such networks correspond to proteins and the nodes are connected by an edge if they can be shown to physically interact. Studies of protein interaction networks have yielded non-trivial revelations about cellular organization and the role of natural selection in evolution. In a recent issue of Science, Kim et al. add an important new wrinkle to this field of work. Previous studies of protein interaction networks have treated physical interactions as binary; either proteins interact or they do not. Kim et al. took a more nuanced approach to protein interaction by considering the three-dimensional structures of interacting proteins. By now it is well appreciated that the topological properties protein interaction networks, their connectivity in particular, are dominated by the so-called hubs, proteins that interact with numerous partners. The authors of this work evaluated the structural properties of such hubs with respect to the locations and identities of their binding interfaces. Apparently, some hub proteins interact with multiple partners using the same binding interface. Such interactions are by definition mutually exclusive; only one particular interaction can occur for any given time or condition. Other proteins interact with multiple partners using distinct non-overlapping interfaces, and these interactions may occur simultaneously. The distinction between singlish—meaning proteins with one or two binding interfaces, versus multi-interface hubs helped the authors to resolve a number of hotly debated issues regarding the relationship between protein interaction network topologies and evolution. For instance, while it had previously been noted that hub proteins evolve more slowly than less connected proteins, this matter was very contentious. Kim et al. show that this signal is largely confined to multi-interface hubs which evolve far more slowly than both non-hub proteins and singlish-interface hubs. In addition, the relatively transient nature for the interactions of singlish-interface hubs was confirmed by gene expression analysis, which showed that multi-interface hubs have more correlated expression patterns with their interaction partners. This work also calls into question a popular model for the evolution of complex networks. Hubs are thought to evolve via preferential attachment of new nodes to already highly connected nodes, and this model has been related to the process of gene duplication. However, this appears to only be the case for singlish-interface hubs consistent with the idea that duplicated proteins would bind to the same interface. Thus, while the added level of detail provided by the structural analysis of Kim et al. has helped to address some outstanding questions, it has also raised some new questions of its own.


    Mammalian small nucleolar RNAs are mobile genetic elements
 TOP
 Dynamic usage of transcription...
 Multiple independent...
 Origins and impact of...
 Relating three-dimensional...
 Mammalian small nucleolar RNAs...
 Exploring genomic dark matter:...
 
Michel J. Weber
PLoS Genetics (2006) Vol. 2, No. 12, p. e205
Transposable elements (TEs) are the single most abundant class of eukaryotic genomic sequence. For instance, 45% of the human genome is made up of TE sequences while only 1.5% corresponds to protein coding sequences. The genomic abundance of TEs can be explained by their selfish properties. Indeed, TEs are most often thought of as genomic parasites that serve only to replicate themselves at the expense of the host genomes in which they reside. However, in recent years numerous examples of TEs that have been ‘domesticated’ to play some functional role for their hosts have been uncovered. In this article, Michel Weber shows that two classes of structural RNA molecules—small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs)—represent novel families of TEs. snoRNAs are involved in the modification of ribosomal RNAs and scaRNAs help to modify spliceosomal RNAs. Using known human and mouse sno/scaRNA sequences as queries, Weber searched for orthologs and paralogs (i.e. within genome duplicates) among sequenced vertebrate genomes and identified many novel sno/scaRNA sequences. snoRNAs are known to be encoded in the introns of host genes and so it was not surprising to find some snoRNA paralogs that had been generated by duplication of their host genes. Far more unexpected was the finding that most snoRNA paralogs showed characteristics of sequences that are generated by retrotransposition—in other words they appeared to have been copied by the reverse transcription of a snoRNA transcript and inserted in a new location in the genome. Specifically, these snoRNA paralogs have polyA tails, target site duplications and insertion sites that can be clearly defined by comparisons with other genome sequences. Weber dubs these sequences snoRNA retroposons or snoRTs. Sequence alignment and phylogenetic analysis of related snoRTs allows for the determination of the original (parental) snoRNA sequence that gave rise to paralogs through retrotransposition. Examination of the genomic structure of these elements revealed three classes of snoRTs that differ with respect to the amount of sequence from the parental host gene that they contain. Weber proposes that the redundancy created by the retrotransposition of snoRNAs helps to both safeguard functional degeneracy of single copy snoRNAs through mutation and to provide increased diversity that could lead to novel functions. With respect to the latter prediction, it is most interesting to note that many of the mobile sno/scaRNA retroposons identified in this study are either lineage- or species-specific. Thus, sno/scaRNA evolution by retrotransposition may be related to between-species differences in RNA-modifications.


    Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA
 TOP
 Dynamic usage of transcription...
 Multiple independent...
 Origins and impact of...
 Relating three-dimensional...
 Mammalian small nucleolar RNAs...
 Exploring genomic dark matter:...
 
Eva K. Freyhult, Jonathan P. Bollback and Paul P. Gardner
Genome Research (2007) Vol. 17, No. 1, pp. 117–125
The importance of non-coding RNA has recently been underscored by the realization that the majority of eukaryotic genome sequences are transcribed, and furthermore, most of these transcripts do not encode proteins. Hundreds of new non-coding RNA (ncRNA) genes are characterized every year, and it looks like the rate of discovery is still in the exponential growth phase. Having said that, the full complement, i.e. the number, size and extent, of ncRNA genes for any given eukaryotic genome is unknown, and knowledge of ncRNA genes lags far behind the current understanding of protein coding gene repertoires. Since most new genes are discovered and characterized via homology relationships with known genes, the application of homology search methods will be critical for ncRNA discovery. However, homology search methods were designed for protein coding sequences and their performance on ncRNAs is unknown. In light of this problem, Freyhult et al. have assessed how different sequence similarity search algorithms perform in the detection of ncRNAs. They evaluated 12 different search programs that fall into three broad classes: sequence-based methods, profile-based methods and structure-based methods. BLAST and FASTA are the most widely used of the sequence-based methods, while HMMer and Infernal exemplify the profile and structure-based methods respectively. The authors present a detailed accounting, which is beyond the scope of this abstract, of how all 12 of these methods perform with an emphasis on sensitivity, selectivity and relatively less focus on search speed. Perhaps most usefully however, they also provide a series of ‘practical recommendations’ to guide homology searches for ncRNAs. First of all, they point out that the most widely used methods often fare the worst in their comparisons. This has the unfortunate effect of rendering the results of many searches suspect. Sequence-based methods in particular need to use accurate scoring schemes, based on RNA-optimized substitution matrices, to ensure their reliability. The authors suggest that sequence-based searches be used as a starting point to build training sets, which can be employed to build covariance models for use with the program Infernal. Infernal uses the covariance matrix to do a structure-based search and was shown to perform very well in terms of both sensitivity and selectivity as did another structure-based method RSEARCH. However, both of these structure-based methods are substantially slower than straight profile-based methods such as HMMer. While the combined approach recommended by the authors may well represent the most rigorous way to search for ncRNAs, it also seems unlikely to be embraced by bench scientists who wish to do a quick and easy search on his or her favorite gene. Thus, it would seem that a good opportunity exists to implement a packaged software suite that could automatically implement this multipart strategy.

I. King Jordan

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
8/2/129    most recent
bbm005v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jordan, I. K.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Jordan, I. K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?