Briefings in Bioinformatics Advance Access originally published online on August 25, 2006
Briefings in Bioinformatics 2006 7(3):209-210; doi:10.1093/bib/bbl029
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Annual Progress in Bioinformatics 2006
In this issue, Briefings in Bioinformatics is happy to present the next installment of our special annual issue devoted to reviews of very active subdisciplines within our field. The editors surveyed recent publications in order to identify fields that are moving rapidly and would be good targets for summary and review. We asked authors to provide brief introductions to their field, and then to concentrate on contributions in the last 1224 months of particular interest. In some cases, they also provided annotated bibliographies in which they highlighted papers of particularly high interest. The result is seven outstanding reviews. The influence of high-throughput genomic experimental techniques and the increasing interest on synthesis of information comes through strongly in this year's selections. We have ordered the reviews starting with those discussing tools close to the genome (the HapMap project, function prediction, graph methods for analyzing cellular networks), and then toward tool-oriented organization and sharing of knowledge (biological ontologies, the semantic web and open source software). The first set of reviews focuses on tools for understanding the central dogma and basic biological processes. The second set focuses on tools to assist scientists in the process of doing their work. In many ways, these are the two primary foci of bioinformatics, and it is reassuring to see that progress is balanced along both fronts.In the first review, Barnes provides an overview of the HapMap project for cataloging human genetic diversity. Understanding variation in the human genome is critical for understanding the variation in human phenotypes. The HapMap project is the natural follow-up to the human genome sequencing project, and seeks to characterize the variations in the human genomeinitially in four groups of different geographic origin. The review describes the HapMap project's motivation, strategy, data resources and analytic challenges. Not surprisingly, variation in the human genome is not entirely independent, but shows a correlation structure (expressed as linkage disequilibrium or LD) that is critical for the design of studies that aim to understand the relationship between genotype and phenotype. In addition, this LD structure can be examined in the context of human population history to understand our origins.
In the second review, Iddo Friedberg presents the challenges associated with annotating genes with their biological functions. It seems that nothing is easy about this task. First, genes are typically polyfunctional and therefore multiple experimental and theoretical sources are used to characterize their function. Annotation techniques must be careful to consider multiple sources of data, and must allow multiple annotations. Second, it is not clear what language should be used to describe gene function. Gene function may be understood in many contexts, and controlled terminologies are required in order to guarantee precise semantics when functional labels are used. Finally, the promise of computer algorithms for predicting function must be associated with gold standard methods to validate these predictions. The first two problems make this last one even more challenging.
Aittokallio and Schwikowski present an overview of graph methods for biological networks in the third review. The availability of multiple high-throughput data sources using gene expression, proteomics, literature mining and other techniques provide information sufficient to create networks of interaction. These networks provide a global view of biological systems, and are the first step towards an integrated understanding of the emergent properties of these systems. Of course, the networks are often represented as graph, and require informatics tools for their analysisoften taking advantage of a mature computer science literature on graphical methods. The analyses that result are truly multiscale because they range from global properties of the networks all the way to the analysis of individual interactions. A particular challenge is the identification of modules, clusters and recurring motifs that perform identifiable functions, and may be conserved across evolution. The authors also point out that integrated analyses that include multiple sources of data may provide better performance.
Nearly every area of biomedical research is currently concerned about the integration, aggregation and annotation of experimental data and the associated knowledge. This concern stems from two observations: (i) the volume of data in most fields is exploding and is impossible to track manually and (ii) this data is useless if they cannot be indexed and retrieved with labels that are standardized and have clear semantics. Thus, biomedical ontologies have emerged as the primary hope for providing the required informatics infrastructure. In the fourth review, Bodenreider and Stevens describe the history of ontologies in biomedicine, and describe some remarkable developments in the last few years that have accelerated progress. Chief among these is the near-universal agreement that ontologies are a critical strategic need (even for individuals who had never heard of ontologies a decade ago), and that scientific institutions have been formed around this goal, introducing more resources and more constraints on their development.
The progress on ontology is a critical prerequisite for the long-awaited emergence of the semantic web for life science. The impact of the internet and world-wide-web on biomedical research has been profound, but some believe it could be even greater with better integration of tools, data and other resourcesbased on an ability to represent the semantics of these resources and link them appropriately. In the fifth review, Good and Wilkinson complement Bodenreider and Stevens by providing a systems level view of progress towards building the next generation of web-based tools for scientists. An early success has been the rapid penetration of web services in bioinformatics, a low-cost method for making computational services available. The authors point out, however, that further progress may require a move from centralized control of data and algorithms toward a more open and inter-linked model. They argue that the primary challenges may be sociological and not technical.
In the final sixth review, Stajich provides an overview of open-source software in bioinformatics. The success of the Linux operating system introduced the paradigm of open-source development, and many bioinformatics software developers embraced this paradigm as a way to accelerate progress in the field, by avoiding redundancy and promoting transparency. But has open-source software made an impact in bioinformatics, and how should individual developers decide whether to join an open-source project or build their own? This review provides several useful case studies to show the diversity of approaches that have lead to valuable software libraries covering a range of applications in molecular biology. The review also revisits a theme it shares with the articles on ontologies and the semantic webthe need for standards. Taken together, these last three reviews provide a fascinating view on how a maturing field of bioinformatics is organizing itself for the coming decades.
In addition to the annual progress reviews, this issue includes an outstanding review of statistical methods for association studies by Montana. Increasingly, bioinformatics professionals are being asked to participate in projects with goals of associating genotype with phenotype. However, the literature on the analysis of genotype is rich and mature, and the complexities often overwhelm scientists who are otherwise very familiar with computing with DNA sequences. This review is a very useful primer on the state and progress of methods for statistical association analysis.
We hope that you will find the six annual progress reviews and the review on statistical methods for genetic studies useful. Taken together, they demonstrate that bioinformatics continues to be an active and evolving field, responding to scientific and technical opportunities.
Associate Editor
Stanford University
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||