Skip Navigation


Briefings in Bioinformatics Advance Access originally published online on February 3, 2006
Briefings in Bioinformatics 2006 7(1):48-54; doi:10.1093/bib/bbk004
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/1/48    most recent
bbk004v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Abnizova, I.
Right arrow Articles by Gilks, W. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Abnizova, I.
Right arrow Articles by Gilks, W. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. For Permissions, please email: journals.permissions@oxfordjournals.org

Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes

Irina Abnizova and Walter R. Gilks

Corresponding author. Irina Abnizova, Institute of Public Health, Forvie Site, Robinson Way, Cambridge CB2 2SR, UK. Tel: 01 223330385; Fax: 01 223 330388.irina.abnizova{at}mrc-bsu.cam.ac.uk


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 COMPUTATIONAL METHODS TO...
 CONCLUSION
 FOOTNOTES
 Acknowledgements
 References
 
There are no well-known properties in regulatory DNA analogous to those in coding sequences; their spatial location is not regular, the consensus regulatory elements are often degenerate and there are no understandable rules governing their evolution. This makes it difficult to recognize regulatory regions within genome. We review developments in the statistical characterization of regulatory regions and methods of their recognition in eukaryotic genomes.

Keywords: regulatory regions, statistical methods, transcription factor binding sites


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 COMPUTATIONAL METHODS TO...
 CONCLUSION
 FOOTNOTES
 Acknowledgements
 References
 
Revealing the typical statistical properties of regulatory regions and regulatory elements may improve our understanding of their evolutionary and functional constraints, and permit recognition of these regions computationally. The study of regulatory DNA is more difficult than that of coding sequences [1, 2]. There are no well-known properties in regulatory DNA analogous to open reading frames and non-uniform codon usage in coding sequences. This makes it difficult to define the consensus and location of functional regulatory elements.

Basics about gene transcription
Temporal and spatial gene expression is regulated by transcription control and mediated by a complex cis-regulatory system. Transcription factors activate or repress gene expression by binding to their respective binding sites: comparatively short sequences (several hundred to several thousand base pairs, depending on the species) upstream, downstream or far away from the transcription start sites. Specific sites within such regions, which are generally composed of dense clusters, are recognized by regulatory proteins (transcription factors, TFS), which control the rate of gene transcription [3–5].

Types of regulatory regions
Regulatory regions of higher eukaryotes can be subdivided into proximal regulatory units—promoters—which are located close to the 5' end of the gene, and distal transcription regulatory units called enhancers or cis-regulatory modules (CRMs). CRMs may be located far upstream or downstream of the target gene, and are much more difficult to recognize because they lack proximal specific transcription signals, such as position relative to coding sequence, TATA box, CAAT box, transcription start site consensus, etc. Therefore, recognition of CRMs is even more difficult than recognition of promoters.

Experimental determination of regulatory region function
Biochemical characterizations can identify binding sites precisely and are the only way to determine whether consensus sequences differ among species. There are several methods available for producing DNA–protein interaction data. Nitrocellulose-binding assay [6], electrophoretic mobility shift assay (EMSA) [7], enzyme-linked immunosorbent assay (ELISA) [8], DNase 1 footprinting [9], DNA–protein crosslinking (DPC) [10], and reporter conducts [11] are examples of in vitro techniques that are used for determining DNA binding sites and analysing the difference in binding specificity for different protein–DNA complexes. The major disadvantage of these methods is that they are not suited to high-throughput experiments.

Recently, a micro-array-based assay called chromatin immunoprecipitation (ChIP) was developed for genome-wide determination of protein binding sites on DNA [12]. Other types of experiments are systemic evolution of ligands by exponential enrichment (SELEX) [13] and phage display (PD) [14], offer a high-throughput possibility to select high-affinity binders, DNA and protein targets, respectively. Both SELEX and PD suffer the same disadvantage: most part of sequences obtained from these experiments are all good binders, but it is hard to say anything about their relative affinities. It is assumed that the best binders occur more frequently.

In [15] dsDNA microarrays are presented for exploring sequence specific protein–DNA binding. The major advantage over the methods discussed aforementioned is that it is a high-throughput method resulting in data with associated relative binding affinities.

There are, finally, X-ray crystallographic and NMR spectroscopic data providing a base for studying the structural details of protein–DNA interactions. Protein–DNA complexes have successfully been co-crystallized [16], and the data has been deposited into the protein data bank (PDB) and nucleic acid database (NDB). However, these experiments are very time-consuming.

Unfortunately, for a lot of technical reasons, the numbers of experimentally verified binding sites are nearly always an underestimate, and the physical length of regulatory regions is rarely well defined [17]. Therefore, experimental studies of the function of regulatory regions almost always result in incomplete and biased information for binding sites. For binding sites that are identified, we are often ignorant about changes in time, space and level of transcription.

Statistical computational recognition of regulatory regions is desirable but very difficult
Characterizing regulatory DNA and functional combinations of transcription factor binding sites (TFBSs) are key to understand gene regulation, but remains a difficult computational problem. The reasons for it are:

  • Lack of known properties: There are no well-known properties in regulatory DNA analogous to genetic code, open reading frames and non-uniform codon usage in coding sequences. This makes it difficult to define the consensus and location of functional regulatory elements.
  • Degeneracy of TFBS: Transcription factors have low specificity for their binding site motifs and they are short and imprecise which makes it hard to accurately detect those sites.
  • Lack of evolutionary understanding of transcriptional regulation: The rules that govern the evolution of regulatory elements have not been yet clearly established.
  • Complicated and non-regular structure of regulatory regions: No consistent sequence motifs exist for regulatory regions. These regions comprise a collection of diverse TFBS, their composition and organization varies enormously among genes, they dispersed sparsely and unevenly.
However, experimental verification is expensive and time-consuming. Therefore, to address the growing volumes of available genomic sequence, a number of statistical computational algorithms that identify putative cis-regulatory modules and transcription factor binding sites using evolutionary comparisons, whole-genome data and known descriptions of TFBS have been developed. These will be reviewed in the next section.


    COMPUTATIONAL METHODS TO RECOGNISE REGULATORY REGIONS
 TOP
 ABSTRACT
 INTRODUCTION
 COMPUTATIONAL METHODS TO...
 CONCLUSION
 FOOTNOTES
 Acknowledgements
 References
 
The underlying biological phenomena [18] exploited by computational methods are:

  1. multiple transcription factors tend to regulate gene activity in distinct regulatory modules;
  2. individual transcription factors often have multiple binding sites within a regulatory module and
  3. binding sites within a regulatory module tend to be spatially clustered.
Methods for recognizing regulatory DNA may be briefly divided into six main groups as follows:

Recognition of regulatory DNA regions based on statistics of known TFBS
This approach exploits the clustering of known, often co-operatively acting, transcription factors (TFs). Extracting clustered recognition motifs is one of the most reliable techniques, but is limited to the recognition of similarly regulated cis-regulatory regions. For large scale discovery of regulatory regions, computational algorithms have been developed [19–25]. One of the major sources of known TFBS is the database TRANSFAC [26], which contains consensus sequences as well as their probability weight matrices.

Other methods of this group are based on arrangement of known TFBS. These methods exploit known interactions between TFs and their spatial arrangement: in [27], is identified a neurogenic enhancer in distantly related species based on similar arrangement of a subset of putative TFBS. The work [28] explored distance preferences in the arrangement of TFBS positions, and suggested to use it to detect regulatory regions genome-wide. Some works exploit the fact of combinatorial activity of TFs: often they are arranged in functional pairs that act synergistically to activate or repress promoter activity. These pairs are called composite elements, and their combined clustering is used to detect regulatory regions in [29].

Recognition of regulatory DNA based on evolutionary conservation: phylogenetic footprinting [30–35]
This is an actively progressing area as more and more sequenced genomes appear. Methods of this type assume that regulatory regions are highly conserved in cross-genomic comparison, and conserved segments can be extracted from evolutionarily related genomes. Recently, several highly conserved non-coding sequences were identified in vertebrate genomes [36–39]. When some of these sequences were tested in vivo, the majority appear to drive tissue-specific gene expression during early development. However, recent studies indicate that the system is more complicated and fluid, with regulatory regions having an underlying pattern of evolution not directly visible from simple sequence comparisons [40–42]. Indeed, performance of phylogenetic footprinting depends on the evolutionary distance between given species and on the conservation level of individual genes. Such an approach offers little information as to the specific function of the conserved sequences. Furthermore, it is still an open question as to how many and which genomes are required for reliably extracting regulatory regions.

Content-based methods
These are methods based on the difference of local nucleotide composition between regulatory and non-regulatory DNA [43–47]. It is assumed that this difference is due to presence of multiple transcription signals, such as binding motifs for TFs in regulatory regions. The works of [43–45] are based on constructing a global interpolated Markov model, and is applied to promoter recognition only. In the work of [46], the authors perform an exhaustive statistical analysis of local short word frequencies. As candidates for regulatory regions, the authors identify sequence segments containing a specific word distribution. This word distribution is inferred from the training set of experimentally verified Drosophila enhancers. In [47], the authors describe a tool, the so called ‘fluffy-tail test’, to distinguish regulatory modules out of coding and non-coding non-regulatory DNA. They hypothesize that abundance of regulatory motifs within regulatory regions takes the form of an over-representation of ‘similar words’ (which are not simple repeats), having a distribution with a thick right tail. The "fluffy tail test" is designed to identify such significantly tailed regions, and mark them as putatively regulatory.

Motif recognition and discovery as basis for regulatory region recognition
Because computational recognition of regulatory regions is mainly based on regulatory motif recognition (both supervised and unsupervised), we shall very briefly describe this area, since we do not have enough space to review motif recognition algorithms in detail here. Methods to recognize regulatory elements may be divided into two large categories:

  1. supervised, i.e. based on description of known TFBS; these approaches typically constitute methods to screen a set of DNA sequences against a precompiled library of motifs [48] and assess which of the motifs are statistically significant in the sequences [49–51].
  2. unsupervised (ab initio), i.e. without prior knowledge of TFBS sequences; these methods search for recurrent patterns of any kind.
Unsupervised (ab initio) algorithms may be roughly divided into three categories: enumerative (including phylogenetic footprinting), iterative, and content-based:
  1. Enumerative, or word-counting, methods [52–60], build or assume a background model of base pair distribution in the DNA non-coding regions that do not contain TFBS, and look for any motifs in the given set of sequences (often upstream regions of co-expressed genes) that are statistically significant against this background. An important type of enumerative method is phylogenetic footprinting [61–65]. These methods are based on comparative genomic studies which show that conserved non-coding sequences are good candidates for transcription regulatory elements [36, 66, 67]. Phylogenetic footprinting methods are helpful in constructing regulatory databases: thus, the COmparative Regulatory Genomics (CORG) [68] database consists of putative regulatory elements, conserved between human and mouse, derived by comparison of upstream sequences of orthologous genes. However, not every functionally important TFBS is conserved even between closely related species [69–71] and not every conserved pattern may be functional [72].
  2. Iterative methods [73, 74] employ various methods based on [75, 76]. These methods perform in a similar way to enumerative tasks: they find the most statistically over-represented motifs against a background in a set of sequences in a probabilistic way. They are usually faster, though their speed of convergence depends on a first guess.
  3. Content approaches [77, 78] are based on the observation that functional binding sites are often found in clusters within regulatory regions and thus cause a biased word distribution within a given sequence.
We recommend a number of excellent reviews: [79] on statistical computational methods for transcriptional regulation in yeast and fly [80]; on supervised computational methods [81]; on frequency enumerative methods [82] and on ab initio methods.

Some more features which help to distinguish regulatory regions
Complexity of regulatory regions
Orlov and Potapov [83] found out that the complexity of regulatory regions, both promoters and enhancers, is in intermediate between that of coding and non-coding non-regulatory DNA.

Existence of homotypic clusters of TFBS
When analysing clustering of known TFBS in known CRMs Drosophila, Lifanov et al. [84] showed that each type of recognition motif forms significant cluster within regulatory region.

Clustering of motifs in aligned regulatory region
It may be useful to study the average statistical properties of regulatory regions, analysing a whole large set of them. In [85], the authors aligned a set of 13010 human promoters relative to TSS. As the result, the authors computationally identified eight DNA sequences in 5082 promoters that are important for regulating gene expression. It may help to find new regulatory regions then.

Combination of additional experimental information with statistics of DNA sequence
However, with a purely computational approach, uncertainty remains as to whether a predicted CRM actually possesses the expected function. In [21, 25], the authors experimentally evaluated the candidate CRMs obtained from their respective genome-wide searches. These two groups used related but distinct computational strategies for the prediction of co-expressed genes and their associated CRMs in the Drosophila melanogaster genome. Whereas Johansson et al. [21] used clustering of a single class of TF, Berman et al. [25] employed five different TFs with known concerted functions during Drosophila embryogenesis.

It is helpful to incorporate into regulatory region search algorithms the expression data as derived from genome wide expression profiling or high throughput in situ hybridization screen of cDNA collections. This approach can help filtering the most likely co-regulated genes or group of biologically related TFBS. Aerts et al. [86] searches for CRM by optimal combination of TFBS in a set of co-expressed genes.

Combination with data from evolutionary conservation [35, 87] applied a statistical methodology to find modules within human–mouse conserved promoter segments, focusing on cell cycle regulated genes and stress response genes. They also tested whether these genes are co-expressed.

Another important information could be obtained from chromatin immunoprecipitation (CHIP) studies, which helps to the assignment of genes into common regulatory network [88, 89].

Finally, the particular computational combinatorial models can be tested by construction and functional assessment of synthetic enhancers [4, 90].


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 COMPUTATIONAL METHODS TO...
 CONCLUSION
 FOOTNOTES
 Acknowledgements
 References
 
In conclusion, combined experimental and statistical approaches are probably the most promising ways to increase the precision of computational identification of regulatory regions and co-regulated genes. Clear understanding of processes governing the evolution of regulatory regions is another important requirement for creating reliable regulatory region recognition methods.


Key Points

  • Identification of regulatory regions in DNA sequences is difficult.
  • Reliable experimental confirmation of regulatory sequences are hard to come by, and informatic approaches based on sequence motifs, conservation or nucleotide composition are only weakly predictive.
  • The most promising outlook is for statistical approaches to cis-regulatory module prediction which combine all of the above.

 


    Acknowledgements
 TOP
 ABSTRACT
 INTRODUCTION
 COMPUTATIONAL METHODS TO...
 CONCLUSION
 FOOTNOTES
 Acknowledgements
 References
 
We thank Tanya Vavouri and Rene te Boekhorst for help and comments on this manuscript. We are grateful for two anonymous reviewers for their valuable comments.


    FOOTNOTES
 TOP
 ABSTRACT
 INTRODUCTION
 COMPUTATIONAL METHODS TO...
 CONCLUSION
 FOOTNOTES
 Acknowledgements
 References
 
Irina Abnizova is a Research Scientist in MRC Biostatistics Unit. Her current research interests are the development of computational statistics and software focusing on regulatory regions and elements.

Walter Gilks is Programme Leader at MRC Biostatistics Unit. His interests are in the development of statistical methods for the analysis of genomic data.

Submitted: May 23, 2005. Received (in revised form): October 12, 2005.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 COMPUTATIONAL METHODS TO...
 CONCLUSION
 FOOTNOTES
 Acknowledgements
 References
 

  1. Wasserman W, Palumbo M, Thompson W, et al. Human-mouse genome comparisons to locate regulatory sites. Nat Genet 2000; 26:225–28.[CrossRef][Web of Science][Medline]
  2. Dermitzakis ET, Clark AG. Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol 2002; 19:1114–21.[Abstract/Free Full Text]
  3. Yuh C, Bolouri H, Davidson E. Genomic cis-regulatory logic: functional analysis and computational model of a sea urchin gene control system. Science 1998; 279:1896–1902.[Abstract/Free Full Text]
  4. Davidson E. Genomic Regulatory Systems. Academic Press, London 2001.
  5. Davidson E, Rast JP, Oliveri P, et al. A genomic regulatory network for development. Science 2002; 295:1669–78.[Abstract/Free Full Text]
  6. Woodbury CJ, von Hippel P. On the determination of deoxyribonucleic acid-protein interaction parameters using the nitrocellulose filter-binding assay. Biochemistry 1983; 22:4730–37.[CrossRef][Medline]
  7. Garner M, Revzin A. A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system. Nucleic Acids Res 1981; 9:3047–60.[Abstract/Free Full Text]
  8. Choo Y, Klug A. A role in DNA binding for the linker sequences of the first three zinc fingers of TFIIIA. Nucleic Acids Res 1993; 21:3341–46.[Abstract/Free Full Text]
  9. Galas D, Schmitz A. DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res 1978; 5:3157–70.[Abstract/Free Full Text]
  10. Molnar G, OLeary N, Pardee A, et al. Quantification of DNA-protein interaction by UV cross-linking. Nucleic Acids Res 1995; 23:3318–26.[Abstract/Free Full Text]
  11. Hanes S, Brent R. A genetic model for interaction of the homeodomain recognition helix with DNA. Science 1991; 251:426–30.[Abstract/Free Full Text]
  12. Ren B, Robert F, Wyrick J, et al. Genome-wide location and function of DNA binding proteins. Science 2000; 290:2306–9.[Abstract/Free Full Text]
  13. Choo Y, Klug A. Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. Proc Natl Acad Sci USA 1994; 91:11168–72.[Abstract/Free Full Text]
  14. Choo Y, Klug A. Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. Proc Natl Acad Sci USA 1994; 91:11163–67.[Abstract/Free Full Text]
  15. Bulyk M, Gentalen E, Lockhart D, et al. Quantifying DNA-protein interactions by double-stranded DNA arrays. Nat Biotechnol 1999; 17:573–77.[CrossRef][Web of Science][Medline]
  16. Kim J, Burley S. 1.9 A resolution refined structure of TBP recognizing the minor groove of TATAAAAG. Nat Struct Biol 1994; 1:638–53.[CrossRef][Web of Science][Medline]
  17. Wray L, Hahn M, Abouheif E, et al. The Evolution of transcriptional regulation in eukaryotes. Molec Bio Evol 2003; 20:1377–1419.
  18. Arnone M, Davidson EH. The hardwiring of development: organization and function of genomic regulatory system. Development 1997; 124:1851–64.[Abstract]
  19. Brazma A, Jonassen I, Vilo J, et al. Pedicting gene regulatory elements in silico on a genomic scale. Genome Res 1998; 8:1202–15.[Abstract/Free Full Text]
  20. Markstein M, Markstein P, Markstein V, et al. Genome wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. PNAS 2002; 99:763–68.[Abstract/Free Full Text]
  21. Johansson O, Alkema W, Wasserman WW, et al. Identification of functional lists of transcription factor binding motifs in genome sequences: the MSCAN algorithm. Bioinformatics 2003; 19:(Suppl.)I169–76.
  22. Rajewski N, Vergassola W, Gaul U, et al. Computational detection of genomic cis-regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 2002; 3:30.[CrossRef][Medline]
  23. Bailey T, Elkan C. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 2003; 21:51–80 (MEME).
  24. Lifanov AP, Makeev VJ, Nazina AG, et al. Homotypic regulatory lists in Drosophila. Genome Res 2003; 13:579–88.[Abstract/Free Full Text]
  25. Berman B, Nibu Y, Pfeiffer P, et al. Exploiting TFBS clustering to identify CRM involved in pattern formation in Drosophila genome. PNAS 2002; 99:757–62.[Abstract/Free Full Text]
  26. Wingender E. The TRANSFAC System on Gene Regulation. Trends in Glycoscience and Glycotechnology 2000; 12:255–264 http://www.gene-regulation.com/pub/databases.html.
  27. Erives A, Levine M. Coordinate enhancers share common organization features in the Drosophila genome. Proc Natl Acad Sci USA 2004; 101:3851–56.[Abstract/Free Full Text]
  28. Makeev VJ, Lifanov AP, Nazina AG, et al. Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information. Nucleic Acids Res 2003; 31:6016–26.[Abstract/Free Full Text]
  29. Kel-Margoulis OV, Romanshchenko AG, Kolchanov NA, et al. COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation. Nucleic Acids Res 2000; 28:311–15.[Abstract/Free Full Text]
  30. Duret L, Bucher P. Searching for regulatory elements in human non coding sequences. Curr Opin Struct Biol 1997; 7:399–406.[CrossRef][Web of Science][Medline]
  31. Blanchette M, Schwikowski B, Tompa M. Algorithms for phylogenetic footprinting. J Comput Biol 2002; 2:11–23.
  32. Couronne O, Poliakov A, Bray N, et al. Strategies and tools for whole-genome alignments. Genome Res 2003; 13:73–80.[Abstract/Free Full Text]
  33. Boffelli D, McAuliffe J, Ovcharenko D, et al. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 2002; 299:1391–4.
  34. Elnitski L, Hardison RC, Li J, et al. Distinguishing regulatory DNA from neutral sites. Genome Res 2003; 13:64–72.[Abstract/Free Full Text]
  35. Berman BP, Pfeiffer B, Laverty RT, et al. Computational identification of developmental enhancers: conservation and function of TFBS clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biology 2004; 5:R61.[CrossRef][Medline]
  36. Woofle A, Goodson M, Goode D, et al. Highly conserved non-coding sequences are associated with developmental control genes in vertebrates. PloS Biology 2005; 3:e7.[CrossRef][Medline]
  37. Bofelli D, Nobrega M, Rubin E. Comparative genomics at the vertebrate extremes. Nat Rev Genet 2005; 6:151–57.[CrossRef][Web of Science][Medline]
  38. Dermitzakis M, Reymond A, Antonarakis S. Conserved non-genic sequences - an unexpected feature of mammalian genomes. Nat Rev Genet 2005; 6:151–57.[CrossRef][Web of Science][Medline]
  39. Bejerano G, Pheasant M, Makunin I, et al. Ultraconserved elements in human genome. Science 2004; 304:1321.[Abstract/Free Full Text]
  40. Hancock JM, Shaw P, Bonneton F, et al. High sequence turnover in the regulatory regions of the developmental gene hunchback in insects. Mol Biol Evol 1999; 16:253–65.[Abstract]
  41. Ludwig MZ, Bergman C, Patel NH, et al. Evidence for stabilizing selection in eukaryotic enhancer element. Nature 2000; 403:564–67.[CrossRef][Medline]
  42. Tautz D. Evolution of transcriptional regulation. Curr Opin Genet Dev 2000; 10:575–79.[CrossRef][Web of Science][Medline]
  43. Ohler U, Harbeck S, Niemann H, et al. Interpolated Markov chains for eukaryotic promoter recognition. Bioinformatics 1999; 15:362–9.[Abstract/Free Full Text]
  44. Ohler U. Promoter prediction on a genomic scale – the Adh experience. Genome Res 2000; 10:539–42.[Abstract/Free Full Text]
  45. Ohler U, Niemann H, Liao G, et al. Joint modelling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 2001; 17S:199–206.
  46. Nazina A, Papatsenko D. Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency. BMC Bioinformatics 2003; 4:65–78.[Medline]
  47. Abnizova I, te Boekhorst R, Walter K, et al. Some statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in Drosophila genome: the fluffy-tail test. BMC Bioinformatics 2005; 6:1–12.[Free Full Text]
  48. Heinemeyer T, Wingender E, Reuter I, et al. Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res 1998; 26:362–67.[Abstract/Free Full Text]
  49. Frith M, Fu Y, Chen J, et al. Detection of functional motifs via statistical representation. Nucleic Acid Res 2004; 32:1372–81.[Abstract/Free Full Text]
  50. Liu R, McEachin R, States D. Computationally identifying novel NF-kappa B-regulated immune genes in the human genome. Genome Res 2003; 13:654–8.[Abstract/Free Full Text]
  51. Zheng J, Wu J, Sun Z. An approach to identify over-represented cis-elements in related sequences. Nucleic Acid Res 2003; 31:1995–2005.[Abstract/Free Full Text]
  52. van Helden J, Andre B, Collado-Vides J. Extracting regulatory sites from the upstream regions of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 1998; 281:827–42.[CrossRef][Web of Science][Medline]
  53. van Helden J, Rios A, Collado-Vides J. Discovering regulatory elements in non coding sequences by analysis of spaced dyads. Nucleic Acid Res 2000; 28:1808–18.[Abstract/Free Full Text]
  54. Tompa M. In Proceedings of the 7th Int. Conf. On Intell. Systems for Mol. BiologyAn exact method for finding short motifs in sequences, with application to the ribosome binding site problem 1999 Heidelberg, Germany: pp. 262–71.
  55. Brazma A, Jonassen I, Vilo J, et al. Pedicting gene regulatory elements in silico on a genomic scale. Genome Res 1998; 8:1202–15.[Abstract/Free Full Text]
  56. Herts GS, Stormo G. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 1999; 15:563–77.[Abstract/Free Full Text]
  57. Tavazoie S, Hughes J, Campbell M, et al. Systematic determination of genetic network architecture. Nature Genetics 1999; 22:281–85.[CrossRef][Web of Science][Medline]
  58. Chu S, DeRisi J, Eisen M, et al. The transcriptional programm of sporulation in budding yeast. Science 2001; 282:699–705.
  59. Hampson S, Kibler S, Baldi P. Distribution patterns of over-represented k-mers in non coding yeast genome. Bioiformatics 2002; 18:513–28.
  60. Marsan L, Sagot M. Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. J Comput Biol 2000; 7:345–60.[CrossRef][Web of Science][Medline]
  61. Blanchette M, Tompa M. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Research 2002; 12:(5)739–48.[Abstract/Free Full Text]
  62. Dieterich C, Rahman S, Vingron M. ISMB/ECCB, 2004 – The 12th International Conference on Intelligent Systems for Molecular Biology (ISMB) and the 3rd European Conference on Computational Biology, BioinformaticsFunctional inference from non-random distributions of conserved predicted transcription factor binding sites; 20:suppl 1, pp. i109–i115.
  63. Zhang Z, Gerstein M. Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements. J Biol 2003; 2:11–17.[CrossRef][Medline]
  64. Moses A, Chiang D, Pollard D, et al. MONKEY: Identifying conserfed transcription factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol 2004; 5:R98.[CrossRef][Medline]
  65. Xie X, Jun Lu, Kulbokas E, et al. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 2005; 434:3441.
  66. Hardison RC. Conserved non coding sequences are reliable guides to regulatory elements. Trands Genets 2000; 16:369–72.
  67. Nobrega MA, Ovcharenko I, Afzal V, et al. Scanning human gene deserts for long-range enhancers. Science 2003; 302:413.[Free Full Text]
  68. Dieterich C, Wang H, Rateitschak K, et al. CORG: a database for COmparative Regulatory Genomics. Nucleic Acid Res 2003; 31:(1)55–57.[Abstract/Free Full Text]
  69. Dermitzakis E, Clark A. Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol 2002; 19:1114–21.[Abstract/Free Full Text]
  70. Costas J, Casares F, Viera J. Turnover of binding sites for transcription factors involved in early Drosophila development. Gene 2003; 310:215–20.[CrossRef][Web of Science][Medline]
  71. Emberly E, Rajevsky N, Siggia E. Conservation of regulatory elements between two species of Drosophila. BMC Bioinformatics 2003; 4:57.[CrossRef][Medline]
  72. Cheng J, Kapranov P, Drenkow J, et al. Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution. Science 2005; 308:1149–54.[Abstract/Free Full Text]
  73. Workman C, Stormo G. ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. Pac. Symph Biocomput 2000 467–78.
  74. Down T, Hubbard T. NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence. Nucleic Acids Res 2005; 33:(5)1445–53.[Abstract/Free Full Text]
  75. Lawrence E, Altshul S, Boguski S, et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 1993; 262:208–14.[Abstract/Free Full Text]
  76. Bailey T, Elkan C. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine learning 2003; 21:51–80 1995. (MEME).
  77. Bussemaker H, Li H, Sigia E. Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis ("MobyDick"). PNAS 2000; 97:(18)10096–99.[Abstract/Free Full Text]
  78. Papatsenko D, Makeev V, Lifanov A, et al. Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. Genome Res 2002; 12:(1)470–81.[Abstract/Free Full Text]
  79. Siggia E. Computational methods for transcriptional regulation. Curr opin Genet Dev 2005; 15:214–21.[CrossRef][Web of Science][Medline]
  80. Vavouri T, Elgar G. Prediction of cis-regulatory elements using binding site matrices - the success, the failures and the reasons for both. Curr opin Genet Dev 2005; 15:395–402.[CrossRef][Medline]
  81. van Helden J. Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics 2004; 20:399–406.[Abstract/Free Full Text]
  82. Frith M, Fu Y, Chen J, et al. Detection of functional motifs via statistical representation. Nucleic Acids Res 2004; 32:1372–81.[Abstract/Free Full Text]
  83. Orlov Y, Potapov V. Complexity: an internet resource for analysis of DNA sequence complexity. Nucleic Acids Res 2004 32 on-line.
  84. Lifanov AP, Makeev VJ, Nazina AG, et al. Homotypic regulatory lists in Drosophila. Genome Res 2003; 13:(4)579–88.[Abstract/Free Full Text]
  85. FitzGerald P, Shlyakhtenko A, Mir A, et al. Clustering of DNA Sequences in Human Promoters. Genome Res 2004; 14:1562–74.[Abstract/Free Full Text]
  86. Aerts S, Thijs G, Coessens B, et al. Toucan: deciphering the cis-regulatory logic of co-regulated genes. Nucleic Acids Res 2003; 31:1753–64.[Abstract/Free Full Text]
  87. Sharan R, Ovcharenko I, Ben-Hur A, et al. CRÈME: a framework for identifying cis-regulatory modules in human-mouse conserved segments. Bioinformatics 2003; T19:283–91.[Abstract/Free Full Text]
  88. Ren B, Robert F, Wyrick JJ, et al. Science 2000; 290:2306–309.[Abstract/Free Full Text]
  89. Iyer VR, Horak CE, Scafe CS, et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 2001; 409:533–38.[CrossRef][Medline]
  90. Guss KA, Nelson CE, Hudson A, et al. Control of a genetic regulatory network by a selector gene. Science 2001; 292:1164–67.[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
7/1/48    most recent
bbk004v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Abnizova, I.
Right arrow Articles by Gilks, W. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Abnizova, I.
Right arrow Articles by Gilks, W. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?