Briefings in Bioinformatics Advance Access published online on October 31, 2006
Briefings in Bioinformatics, doi:10.1093/bib/bbl019
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
* To whom correspondence should be addressed. Translating the overwhelming amount of data generated in high-throughput genomics experiments into biologically meaningful evidence, which may for example point to a series of biomarkers or hint at a relevant pathway, is a matter of great interest in bioinformatics these days. Genes showing similar experimental profiles, it is hypothesized, share biological mechanisms that if understood could provide clues to the molecular processes leading to pathological events. It is the topic of further study to learn if or how a priori information about the known genes may serve to explain coexpression. One popular method of knowledge discovery in high-throughput genomics experiments, enrichment analysis (EA), seeks to infer if an interesting collection of genes is enriched for a Consortium particular set of a priori Gene Ontology Consortium (GO) classes. For the purposes of statistical testing, the conventional methods offered in EA software implicitly assume independence between the GO classes. Genes may be annotated for more than one biological classification, and therefore the resulting test statistics of enrichment between GO classes can be highly dependent if the overlapping gene sets are relatively large. There is a need to formally determine if conventional EA results are robust to the independence assumption. We derive the exact null distribution for testing enrichment of GO classes by relaxing the independence assumption using well-known statistical theory. In applications with publicly available data sets, our test results are similar to the conventional approach which assumes independence. We argue that the independence assumption is not detrimental. David L. Gold is currently a Ph.D. candidate in the Department of Statistics at Texas A&M University. Kevin R. Coombes is the Section Chief of Bioinformatics at M.D. Anderson Cancer Center. He received his Ph.D. in Mathematics from The University of Chicago, IL. Jing Wang is a statistician in the section of Bioinformatics at M.D. Anderson Cancer Center. He received his Ph.D. from in Biophysics, from the University of Manitoba, Winnipeg. Bani Mallick is a full professor in the Department of Statistics at Texas A&M. He received his Ph.D. in Statistics from the University of Connecticut.
Received March 14, 2006
Accepted May 25, 2006
Original Papers
Enrichment analysis in high-throughput genomics--accounting for dependency in the NULL
David L. Gold *, Kevin R. Coombes, Jing Wang, and Bani Mallick
David L. Gold, E-mail: dlgold{at}tamu.edu
![]()
Abstract
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
D. W. Huang, B. T. Sherman, and R. A. Lempicki Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists Nucleic Acids Res., January 1, 2009; 37(1): 1 - 13. [Abstract] [Full Text] [PDF] |
||||
