Briefings in Bioinformatics Advance Access published online on July 12, 2007
Briefings in Bioinformatics, doi:10.1093/bib/bbm027
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Combined experimental and computational approaches to study the regulatory elements in eukaryotic genes
Corresponding author. Elena A. Ananko, Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), Novosibirsk, Russia. E-mail: eananko{at}bionet.nsc.ru
| ABSTRACT |
|---|
The recognition of transcription factor binding sites (TFBSs) is the first step on the way to deciphering the DNA regulatory code. There is a large variety of experimental approaches providing information on TFBS location in genomic sequences. Many computational approaches to TFBS recognition based on the experimental data obtained are available, each having its own advantages and shortcomings. This article provides short review of approaches to computational recognition of TFBS in genomic sequences and methods of experimental verification of predicted sites. We also present a case study of the interplay between experimental and theoretical approaches to the successful prediction of Steroidogenic Factor 1 (SF1).
Keywords: Genome annotation, transcription factor binding site recognition, computer-assisted and experimental methods, sequence training-samples
| INTRODUCTION |
|---|
Deciphering of the DNA regulatory code (second genetic code) is among the most important topics in the modern molecular biology. The basic elements of this code have traditionally been thought to be short (520 bp) DNA sequences recognized by transcription factors (TFs). The set of regulatory elements of a gene determines its expression during ontogenesis, tissue-specificity and ability to respond to various external signals [1, 2]. In silico identification of transcription factor binding sites (TFBSs) in nucleotide sequences is the first and common step for understanding molecular mechanisms of gene expression regulation. TFBS recognition by computational methods can be used for detection of the groups of coordinately regulated genes, both alone and in conjunction with network reconstructions based on changes in transcriptome profiles. These provide a window on the molecular mechanisms of an organism's response to external stimuli. For example, when studying the response to hypoxia, it is necessary to detect the target genes of HIF-1 [3]. Study of the immune response, in particular, the response induced by type I interferons, implies the search for target genes for ISGF3 [4]. There are many more examples.
Validated computer methods for TFBS recognition face considerable challenges, which are principally caused by an intricate organization and a high redundancy degree of this second genetic code. Another concerns whether or not the predicted sites are real. The theoretical methods used for this purpose, for example, comparative genomics and gene expression profile analysis, give only an indication of the functionality of a predicted TFBS, and hence only an indirect measure of a recognition method's validity. The best approaches to quality estimation of a TFBS prediction method are based on the direct experimental verification. Unfortunately, such laboratory techniques are not widely used [57] because they are expensiveeach predicted site must be verified individually. An approach that has been successful for us has involved the tight cooperation between computational biologists, developing the methods for TFBS recognition, and laboratory experimentalists, capable of validating a site's functionality.
This article outlines the computational methods for TFBS prediction, their experimental verification, and a case study of how the interaction between computational and laboratory biologists has led to validated predictive methods for Steroidogenic Factor 1 (SF1) recognition.
| COMPUTATIONAL APPROACHES |
|---|
Source data for TFBS recognition methods
Development of the methods for TFBS recognition has traditionally been based on the use of training sets of sequences that are known from laboratory experimentation to interact with the TF in question. Several specialized information resources compiling the eukaryotic TFBS are available now, namely, TRANSFAC [2], TRED [8], TRRD [9], ooTFD [10] and MPromDb [11].
Usually, these databases contain the data on the degree to which the sites are studied experimentally. These data are represented either as the quality of a site with indication of the experiment type (TRED and TRANSFAC), or as a digital code of the experimental technique used (TRRD). These allow the user to arrange the TFBS sequence sets on the basis of a range of different criteria.
A number of information resources contain ready-made TFBS weight matrices, namely, TRANSFAC [2], JASPAR [12] and ARTSITE [13]. These matrices can be divided into two types: those constructed based on natural (genomic) sequences and those from artificially selected techniques in vitro. Undoubtedly, the experimental method used can be critical.
Important points in a training sample forming
Good TFBS training samples undoubtedly require reliable experimentally confirmed site data, using laboratory techniques such as (i) electrophoretic mobility-shift assay (EMSA) using purified protein; (ii) DNase I footprinting with purified protein and (iii) EMSA using nuclear extracts and antibodies to the TF. It is important because indirect methods can lead to the inclusion of erroneous sequences in training sample sets. These correspond to the binding site of other TFs present in the gene-regulation complex and are the so-called tethering elements. For example, the well-studied glucocorticoid-responsive elements (GREs) include elements to which the glucocorticoid receptor (GR) binds directly. On the other hand, there is a group of tethering GREs. GR does not bind to these, but exerts its effect through proteinprotein interactions with the transcription factors Fos/Jun [14], Stat5 [15] and Smad3 [16], which do bind directly to the site.
In certain situations, TFBS datasets contain subsets corresponding to structural variants of the site. For example, analysis of 160 binding sites for GR from TRRD [9] has demonstrated that only 54% of these sites are homologous to palindromic GREs (AGAACAnnnTGTTCT), to which the GR homodimer binds. The remainder are half-sites, binding GR monomer; yet, the majority of these sites are active in glucocorticoid regulation [17]. The functional estrogen receptor binding sites [18] and CTF1/NF1 sites [19] are also represented by palindromes and half-sites.
The situation is complicated further when the binding sites of the same TF are represented by both the direct and inverse repeats. Examples here are the binding sites for the androgen receptor [20] and sterol-responsive element-binding protein (SREBP) [21]. TFBSs can also differ in the length of the spacer between the conserved motifs in the site. For example, RAR/RXR heterodimers bind to the direct repeats AGGTCA with a spacer of both 1 and 5 nucleotides (DR1 and DR5) [22], whereas PPAR/RXR interacts with direct repeats of DR1, DR0 and DR2 types [22, 23]. Partitioning of such TFBS variants is critical for ensuring that sites of a different length are not missed (leading to false negative results), or nucleotide base distributions are a given position might be distorted (leading to false positive results).
In certain cases, the methods for separation into subsamples are unclear; however, the need for such partition is evident. For example, Shelest et al. [24] applied a subtractive approach, which allowed the sample containing 350 C/EBP binding sites to be divided into subsamples, whose consensus sequences were essentially different.
| COMPUTATIONAL APPROACHES TO TFBS RECOGNITION |
|---|
|
|
|---|
The functional role of a TF arises from the specificity of the physical interactions it makes with its target DNA regulatory sequences. The specific features of these interactions impose certain limitations on the DNA sequence, which manifest themselves in a partial conservation of TFBS nucleotide sequences. This provides the basis for the majority of TFBS recognition methods.
The Classical Approach
One of the most widespread strategies in the search for potential sites is based on the use of weight matrices, describing the frequencies of four nucleotides at each position. These are calculated from a set of experimentally confirmed TFBS sequences [25]. This approach, based on the methods of statistical physics [26], represents a natural development of the search for consensus sequences [27], and assumes that each nucleotide acts independently with the TF. Hundreds of variants of weight matrices for TFBS recognition have been computed and applied with varying degrees of success. For example, the MatInspector [28] used over 700 weight matrices from the TRANSFAC database [2]. However, it has been shown that its recognition quality (i.e. its ability to predict sites correctly with minimal false positive and false negatives) varies markedly in different situations [29]. In addition, it is now widely accepted that the use of weight matrices for TFBS prediction results in a high rate of false positives [30].
A detailed analysis of the experimental data on the affinity of various TFs for DNA sites has given rise to doubts concerning the independent interaction of nucleotides with protein [29, 3133]. It has been demonstrated that taking into account the weak correlations between non-adjacent nucleotides alongside weight matrix scores can increase the recognition quality [33]. In particular, the weight matrices could be based on dinucleotide frequencies [34, 35]. However, as the dinucleotide alphabet sixteen characters (i.e. AA, AC, AG .... TG, TT) rather than the four of the mononucleotide alphabet, a far larger number of functional sequences are required in order to obtain accurate statistics. Large enough data sets are as yet unavailable for most TFs, especially in view of the variability in length mentioned above.
Alternative Approaches
There are alternative approaches to development of the methods for TFBS recognition. One of them is based on the consideration of context-dependent conformational and physicochemical properties of DNA sequences [36]. The DNA double helix exhibits sequence-dependent structure variations. The structure variations can be described by the set of parameters (the major groove width, interbase-pair roll, propeller twist, etc.), determined for di- or for tri-nucleotides [37]. These properties are among the main factors underlying the specificity and affinity of interactions between regulatory sequences and TFs [38, 39]. In particular, the analysis of MetJ binding sites demonstrated that recognition quality increases when DNA duplex structural properties are taken into account in combination with the conventional methods based on contextual analysis [40]. The conformational parameters of DNA have been determined for both di- and tri-nucleotides [37]. Therefore, analysis of the DNA duplex properties in TFBS regulatory regions forms the basis for the recognition method SITECON [41] and a method for determination of TFBS activity [42]. An evident advantage of this approach is that its construction does not require so many sequences for method-training. However, as for weight matrices, this method does not allow deletions or insertions in a site sequence to be taken into account [43].
The method of hidden Markov chains [44, 45] makes it possible to model insertions and deletions, to generate sequences in accord with the model, and to assess the similarity between any sequence and the model chosen [46]. Discriminant analysis [47, 7] is also less sensitive to deletions and insertions. In the case of nucleotide sequences, recognition is based on the analysis of their statistical characteristics, usually an examination of oligonucleotide frequencies.
Other approaches are also used in TFBS recognition, such as Gibbs sampling [48], expectation maximization [49, 50] and neural networks [51, 52]. These have been reviewed in detail elsewhere [53, 54]. Their recognition quality is comparable with that provided by the method of weight matrices.
The main problem in TFBS recognition is high over-prediction rates (false positives) [53, 55, 56]. It is possible to elevate the efficiency of predictions by taking into account a number of additional factors. These factors include (i) the similarity between expression profiles of TF and the genes they regulate [57]; (ii) the affiliation of target genes with one functional class [58]; (iii) a high representation of certain cis-element sequences in a given set of co-regulated genes [59, 60]; (iv) the similarity between TFBS sets in the regulatory regions of orthologous genes (comparative genomics) [59, 61, 62] and (v) the location of TFBS relative to the transcription start site [58].
| EXPERIMENTAL APPROACHES USED FOR VALIDATION OF TFBS PREDICTIONS |
|---|
|
|
|---|
Almost all experimental methods developed to identify TFBSs are used to verify sites predicted by various computational methods, such as TESS [63], MatInspector [28], MATCH [64], Matrix Search [65], etc. However, only some experimental approaches appear appropriate for TFBS verification. The main requirements for these approaches are the rapid analysis of several tens of sites simultaneously, unambiguous results and low consumables and labour costs.
EMSA meets precisely these requirements, making it possible to assess the TF binding to dozens of double-stranded oligonucleotides that correspond to predicted sites. Either cell nuclear extracts (in this case, the method is supplemented with competitive analysis or use of specific antibodies) or purified protein is used as a source of TF.
For example, EMSA variant where testing oligonucleotides were used as competitors to known TFBS was successfully applied by Tronche et al. [5] to verify predicted HNF1 binding sites. The search for potential sites was conducted by weight matrix technique in annotated genomic sequences of vertebrates. Overall, 54 potential binding sites were subjected to experimental verification; of them, 52 appeared strong HNF1 binding sites. Thus, an overprediction error amounted to <5%.
We used EMSA with specific antibodies to verify predicted SF1 binding sites. These sites were detected in the yet unstudied genes related to the system of steroidogenesis by the SiteGA [66] and the SITECON methods [41]. In each case, we confirmed experimentally 15/18 and 18/18 predicted sites, respectively. These correspond to 17 and 0% overprediction rate, respectively [7, 67]. EMSA has also been used with purified recombinant protein to verify SREBP binding sites predicted by SITECON (Figure 1). It was demonstrated that all 23 potential SREBP binding sites actually bind specifically to this protein.
|
EMSA's main advantage is the unambiguity of its results. The bands shift in response to TF binding or they do not. This makes it possible to accurately assess overprediction rates and select the operation conditions that would minimize false positives. One can also begin to assess how often functional TFBSs might be found in the genome and, thus, to get an adequate notion about DNA regulatory potential. For example, the above estimates for HNF1, SF1 (SiteGA and SITECON) sites amount to 1, 1.5 and 3/10 kbp [5, 7, 67]. Consequently, the human genome may contain hundreds of thousands of these TFBSs, and only experimental validation of more sites will reveal their true specificity. Thus, the EMSA results of TFBS verification demonstrate a high regulatory potential of eukaryotic DNA. However, the question of which of the actual TFBSs in the genome are involved in the transcription regulation (as opposed merely to binding) cannot be addressed by this method.
Chromatin immunoprecipitation (ChIP) assays provide a much improved assessment of the DNA regulatory potential of a given TF. This method is based on an in vivo fixation of DNAprotein interactions [53]. However, false positives in ChIP assays are probably higher than in EMSA. There are two reasons for this. First, a TF may be cross-linked not only to DNA but also to other chromatin components; and, second, ChIP locates a TFBS to within 100500 bp, which is not so precise. In addition, this method is more expensive, because of the need to conduct immunoprecipitation with antibodies to all the members of TF family capable of interacting with the same sites. Nonetheless, ChIP assays have been successfully used to verify experimentally several TFBSs predicted by various methods. For example, the use of antibodies to six members of the E2F family confirmed the existence of their TFBSs predicted in 10 cell-cycle genes [6], and likewise the Myc binding sites evaluated by phylogenetic footprinting in glycolytic genes [68].
Of special importance are the data from high-throughput variations of the ChIP technique (ChIP-chip), which make it possible to detect all genomic binding sites for a given protein [53]. These data suggest that tens of thousands of sites in the genome interact with a given TF in vivo. For example, ChIP-chip gives an estimate of 12 000 sites for SP1, 25 000 sites for c-myc and 1600 sites for p53 [69]. According to other data, the number of p53 binding sites amounts to 65 000 [70]. This estimate for CREB is 19 000 [71] which is in the same order of magnitude as the estimated number of CREB molecules in the cell (40 000) [72].
Thus, there is a drastic discrepancy (exceeding one order of magnitude) between the number of DNA-sites capable of binding a particular TF in vitro, estimated by EMSA (hundreds of thousands), and the amount of TFBSs revealed by ChIP-chip technique, which include false positive owing to binding to other chromatin components. In part, this discrepancy can be explained by the tissue specificity of the ChIP-chip sample finding only a subset of the total number. A different tissue or physiological state could identify other sites. On the other hand, abundant DNA-sites estimated by EMSA can correspond to traps accumulating TFs in the neighbourhood of transcription starts for effective formation of polymerase II preinitiation complex.
| WETDRY COLLABORATION: THE STORY OF SF1 |
|---|
|
|
|---|
There is a widely held view that biology is on the brink of adopting the same approach to research as did physics at the end of the nineteenth century, namely that theoretical and experimental scientists work collaboratively to uncover the processes that are taking place. In the case of biology, the theoreticians nowadays correspond to bioinformaticians and systems biologists. For many years, the research team in the Institute of Cytology and Genetics (ICG) has consisted both computational and laboratory scientists meeting on an almost daily basis to explore aspects of gene regulation together. What follows is a case study of SF1 research. It has involved a cyclical process of lab work leading to data for computational studies which provide hypotheses for laboratory testing and so on.
The first stage involved a variety of laboratories gradually determining a relatively small number of SF1 TFBSs experimentally. These were collated by annotators into TRRD [9] and used to construct the prediction techniques SiteGA and SITECON [66, 67]. These were used to identify potential TFBSs in the upstream regions of genes known to be regulated by SF1, which became the subject of EMSA experiments described above. SiteGA was subject to further work to increase its recognition accuracy.
For this purpose, a combined approach was developed that combined the SiteGA method and an optimized position weight matrix method. This combined approach appeared more efficient, as the rates of false negatives amounted to 20% and the rate of false positives, to 7 x 105. When these were subjected to experimental verification, 33 of the 35 sites predicted were capable of interacting with SF1. Thus, the rate of false positives detected experimentally decreased by almost 3-fold. At this level of specificity it becomes feasible to carry out a genome-wide analysis, especially if one focuses regions close to the start site of transcription. This criterion was used by searching for SF1-regulated human genes in the EPD database [73]. In our estimation, the number of the SF1-regulated genes is
900 in the human genome. From a consideration of Gene Ontology assignments, some are known or expected to be SF1 regulated. However, there are others that are currently under further testing.
| CONCLUSION |
|---|
Numerous computational approaches are now available for TFBS recognition, a necessary component of computer annotation of genomes, if we are to fully understand how organisms live, grow and respond to their environment. They are also an invaluable tool for analysis of microarray gene expression data. The main disadvantage of the majority of these approaches is a high-overprediction rate. However, only experimental verification of sites predicted by computational methods can allow overpredicted binding sites to be distinguished from the silent TFBS, which are not involved in the regulation of expression.
Analysis of available data from EMSA experiments has demonstrated that there are probably several hundreds of thousands of binding sites for a given TF in a higher eukaryotic genome [5, 7, 67] with only a very small number of these sites being false positives. Some predicted TFBSs are located in the genes that are not regulated by the factor in question.
The factor that decreases the excess binding of TFs is an inaccessibility of extended DNA regions. There are several levels of structural organization in chromatin, which could contribute to modulation of accessibility of TFBSs [74]. On the other hand, we assume that one hypothesis to explain the excess of TFBSs in the genome is that sites are traps whose role is solely to increase the local TF concentration in a particular region of the nucleus.
The observed excess content of TFBSs in genomic DNA as well as an intricate nature of the mechanisms underlying transcription regulation suggests that additional criteria should be used in the search for target genes of a certain TF. These criteria may be the patterns of TFBS location, clustering of the sites of one type, or patterns of mutual arrangements of sites belonging to different types. Nonetheless, a successful application of these criteria must be based on high-quality recognition of each TFBS type, which may be provided only by way of a tight collaboration of experimenters and theorists.
Key Points
|
| Funding |
|---|
This work was supported by the Program on Physicochemical Biology (10.4) of the Russian Academy of Sciences and by the program Origin and Evolution of the Biosphere of the Russian Academy of Sciences Presidium, RFBR (grants Nos. 05-04-49111 and 07-04-00441), U.S. CRDF (REC-008, grant Y2-B-08-02; RUX0-008-NO-06), and the RF Ministry of Education (grant No. DSP.2.1.1.4935).
| Acknowledgements |
|---|
The authors are grateful to Galina B. Chirikova for her assistance in translation.
| FOOTNOTES |
|---|
Nikolay A. Kolchanov is a Computer Scientist, Novosibirsk State University, ICG SB RAS, Novosibirsk, Russia.
Tatyana I. Merkulova is a Head of Laboratory of Gene Expression Control, ICG SB RAS, Novosibirsk, Russia.
Elena V. Ignatieva is a Computer Scientist, ICG SB RAS, Novosibirsk, Russia.
Elena A. Ananko is a Computer Scientist, ICG SB RAS, Novosibirsk, Russia.
Dmitry Yu. Oshchepkov is a Computer Scientist, ICG SB RAS, Novosibirsk, Russia.
Viktor G. Levitsky is a Computer Scientist, Novosibirsk State University, ICG SB RAS, Novosibirsk, Russia.
Gennady V. Vasiliev is an Experimental Scientist, ICG SB RAS, Novosibirsk, Russia.
Nataly V. Klimova is an Experimental Scientist, ICG SB RAS, Novosibirsk, Russia.
Vasily M. Merkulov is an Experimental Scientist ICG SB RAS, Novosibirsk, Russia.
T. Charles Hodgman is Professor of Bioinformatics and Systems Biology, Nottingham University.
Submitted: March 6, 2007. Received (in revised form): June 7, 2007.
| References |
|---|
- Latchman DS. Eukariotic Transcription Factors (2004) Elsevier Academic Press. 299330.
- Matys V, Kel-Margoulis OV, Fricke E, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res (2006) 34(Database issue):D10810.
[Abstract/Free Full Text] - Wenger RH, Stiehl DP, Camenisch G. Integration of oxygen signaling at the consensus HRE. Sci STKE (2005) 306:re12.
- Platanias LC. Mechanisms of type-I- and type-II-interferon-mediated signalling. Nat Rev Immunol (2005) 5:37586.[CrossRef][Web of Science][Medline]
- Tronche F, Ringeisen F, Blumenfeld M, et al. Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. J Mol Biol (1997) 266:23145.[CrossRef][Web of Science][Medline]
- Kel AE, Kel-Margoulis OV, Farnham PJ, et al. Computer-assisted identification of cell cycle-related genes: new targets for E2F transcription factors. J Mol Biol (2001) 309:99120.[CrossRef][Web of Science][Medline]
- Klimova NV, Levitskii VG, Ignateva EV, et al. Recognition of the potential SF-1 binding sites by SiteGA method, their experimental verification and search for new SF-1 target genes. Mol Biol (Mosk) (2006) 40:51223.[Medline]
- Jiang C, Xuan Z, Zhao F, et al. TRED: a transcriptional regulatory element database, new entries and other development. Nucleic Acids Res (2007) 35:D13740.
[Abstract/Free Full Text] - Kolchanov NA, Ignatieva EV, Ananko EA, et al. Transcription Regulatory Regions Database (TRRD): its status in 2002. Nucleic Acids Res (2002) 30:3127.
[Abstract/Free Full Text] - Ghosh D. Object-oriented transcription factors database (ooTFD). Nucleic Acids Res (2000) 28:30810.
[Abstract/Free Full Text] - Sun H, Palaniswamy SK, Pohar TT, et al. MPromDb: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-chip experimental data. Nucleic Acids Res (2006) 34(Database issue):D98103.
[Abstract/Free Full Text] - Vlieghe D, Sandelin A, De Bleser PJ, et al. A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res (2006) 34(Database issue):D957.
[Abstract/Free Full Text] - Khlebodarova T, Podkolodnaya O, Oshchepkov D, ARTSITE database: comparison of in vitro selected and natural binding sites of eukaryotic transcription factors. In: Bioinformatics of Genome Regulation and Structure IIKolchanov N, Hofestaedt R, eds. (2006) Springer Science+Business Media: Inc. 5565.
- Jonat C, Rahmsdorf HJ, Park KK, et al. Antitumor promotion and antiinflammation: down-modulation of AP-1 (Fos/Jun) activity by glucocorticoid hormone. Cell (1990) 62:1189204.[CrossRef][Web of Science][Medline]
- Stoecklin E, Wissler M, Moriggl R, et al. Specific DNA binding of Stat5, but not glucocorticoid receptor, is required for their functional cooperation in the regulation of gene transcription. Mol Cell Biol (1997) 17:670816.
[Abstract/Free Full Text] - Song CZ, Tian X, Gelehrter TD. Glucocorticoid receptor inhibits transforming growth factor-beta signaling by directly targeting the transcriptional activation function of Smad3. Proc Natl Acad Sci USA (1999) 96:1177681.
[Abstract/Free Full Text] - Merkulov VM, Merkulova TI. Structural variants of binding sites for glucocorticoid receptor and the mechanisms of glucocorticoid regulation: analysis of GR-TRRD database. Proceedings of fifth International Conference on Bioinformatics of Genome Regulation and Structure (BGRS'2006) (2006) 1:106109.
- OLone R, Frith MC, Karlsson EK, et al. Genomic targets of nuclear estrogen receptors. Mol Endocrinol (2004) 18:185975.
[Abstract/Free Full Text] - Roulet E, Bucher P, Schneider R, et al. Experimental analysis and computer prediction of CTF/NFI transcription factor DNA binding sites. J Mol Biol (2000) 297(4):83348.[CrossRef][Web of Science][Medline]
- Schoenmakers E, Alen P, Verrijdt G, et al. Differential DNA binding by the androgen and glucocorticoid receptors involves the second Zn-finger and a C-terminal extension of the DNA-binding domains. Biochem J (1999) 341:51521.[CrossRef][Web of Science][Medline]
- Kim JB, Spotts GD, Halvorsen YD, et al. Dual DNA binding specificity of ADD1/SREBP1 controlled by a single amino acid in the basic helix-loop-helix domain. Mol Cell Biol (1995) 15:25828.
[Abstract/Free Full Text] - Khorasanizadeh S, Rastinejad F. Nuclear-receptor interactions on DNA-response elements. Trends Biochem Sci (2001) 26:38490.[CrossRef][Web of Science][Medline]
- Okuno M, Arimoto E, Ikenobu Y, et al. Dual DNA-binding specificity of peroxisome-proliferator-activated receptor gamma controlled by heterodimer formation with retinoid X receptor alpha. Biochem J (2001) 353:1938.[CrossRef][Web of Science][Medline]
- Shelest E, Kel AE, Goessling E, et al. Prediction of potential C/EBP/NF-kappaB composite elements using matrix-based search methods. In Silico Biol (2003) 3:719.[Medline]
- Stormo GD, Schneider TD, Gold L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res (1986) 14:666179.
[Abstract/Free Full Text] - Berg OG, von Hippel PH. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol (1987) 193:72350.[CrossRef][Web of Science][Medline]
- Mulligan ME, Hawley DK, Entriken R, et al. Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity. Nucleic Acids Res (1984) 12:789800.
[Abstract/Free Full Text] - Cartharius K, Frech K, Grote K, et al. MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics (2005) 21:293342.
[Abstract/Free Full Text] - Roulet E, Fisch I, Junier T, et al. Evaluation of computer tools for the prediction of transcription factor binding sites on genomic DNA. In Silico Biol (1998) 1:218.[Medline]
- Stormo GD. DNA binding sites: representation and discovery. Bioinformatics (2000) 16:1623.
[Abstract/Free Full Text] - Man TK, Stormo GD. Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res (2001) 29:24718.
[Abstract/Free Full Text] - Bulyk ML, Johnson PL, Church GM. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res (2002) 30:125561.
[Abstract/Free Full Text] - Barash Y, Elidan G, Friedman F, et al. Modeling dependencies in protein-DNA binding sites. Proceedings of RECOMB '03 (2003) 2837.
- Zhang M, Marr T. A weight array method for splicing signal analysis. Comput Appl Biosci (1993) 9:499509.
[Abstract/Free Full Text] - Gershenzon NI, Stormo GD, Ioshikhes IP. Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites. Nucleic Acids Res (2005) 33:2290301.
[Abstract/Free Full Text] - Gorin AA, Zhurkin VB, Olson WK. B-DNA twisting correlates with base-pair morphology. J Mol Biol (1995) 247:3448.[CrossRef][Web of Science][Medline]
- Arauzo-Bravo MJ, Sarai A. Knowledge-based prediction of DNA atomic structure from nucleic sequence. Genome Inform (2005) 16:1221.[Medline]
- Starr DB, Hoopes BC, Hawley DK. DNA bending is an important component of site-specific recognition by the TATA binding protein. J Mol Biol (1995) 250:43446.[CrossRef][Web of Science][Medline]
- Meierhans D, Sieber M, Allemann RK. High affinity binding of MEF-2C correlates with DNA bending. Nucleic Acids Res (1997) 25:453744.
[Abstract/Free Full Text] - Liu R, Blackwell TW, States DJ. Conformational model for binding site recognition by the E. coli MetJ transcription factor. Bioinformatics (2001) 17:62233.
[Abstract/Free Full Text] - Oshchepkov DY, Vityaev EE, Grigorovich DA, et al. SITECON: a tool for detecting conservative conformational and physicochemical properties in transcription factor binding site alignments and for site recognition. Nucleic Acids Res (2004) 32(Web Server issue):W20812.
[Abstract/Free Full Text] - Ponomarenko JV, Ponomarenko MP, Frolov AS, et al. Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics (1999) 15:65468.
[Abstract/Free Full Text] - Oshchepkov DYu, Turnaev II, Pozdnyakov MA, SITECONA tool for analysis of DNA physicochemical and conformational properties: E2F/DP transcription factor binding site analysis and recognition. In: Bioinformatics of genome regulation and structureKolchanov N, Hofestaedt R, eds. (2004) Boston/Dordrecht/London: Kluwer Academic Publishers. 93102.
- Yada T, Nakao M, Totoki Y, et al. Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models. Bioinformatics (1999) 15:98793.
[Abstract/Free Full Text] - Durbin R, Eddy S, Krogh AGM. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (1998) Cambridge: Cambridge University Press.
- Eddy SR. Profile hidden Markov models. Bioinformatics (1998) 14:75563.
[Abstract/Free Full Text] - Gunewardena S, Jeavons P, Zhang Z. Enhancing the prediction of transcription factor binding sites by incorporating structural properties and nucleotide covariations. J Comput Biol (2006) 13:92945.[CrossRef][Web of Science][Medline]
- Lawrence CE, Altschul SF, Boguski MS, et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science (1993) 262:20814.
[Abstract/Free Full Text] - Bailey TL, Elkan C. The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol (1995) 3:219.[Medline]
- Grundy WN, Bailey TL, Elkan CP, et al. Meta-MEME: motif-based hidden Markov models of protein families. Comput Appl Biosci (1997) 13:397406.
[Abstract/Free Full Text] - ONeill MC. Training back-propagation neural networks to define and detect DNA-binding sites. Nucleic Acids Res (1991) 19:3138.
[Abstract/Free Full Text] - Horton PB, Kanehisa M. An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. Nucleic Acids Res (1992) 20:43318.
[Abstract/Free Full Text] - Elnitski L, Jin VX, Farnham PJ, et al. Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res (2006) 16:145564.
[Abstract/Free Full Text] - Gelfand MS. Prediction of function in DNA sequence analysis. J Comput Biol (1995) 2:87115.[Medline]
- Bucher P. Regulatory elements and expression profiles. Curr Opin Struct Biol (1999) 9:4007.[CrossRef][Web of Science][Medline]
- Jolly ER, Chin CS, Herskowitz I, et al. Genome-wide identification of the regulatory targets of a transcription factor using biochemical characterization and computational genomic analysis. BMC Bioinformatics (2005) 6:275.[CrossRef][Medline]
- Holloway DT, Kon M, DeLisi C. Integrating genomic data to predict transcription factor binding. Genome Inform (2005) 16:8394.[Medline]
- Long F, Liu H, Hahn C, et al. Genome-wide prediction and analysis of function-specific transcription factor binding sites. In Silico Biol (2004) 4:395410.[Medline]
- Ho Sui SJ, Mortimer JR, Arenillas DJ, et al. oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res (2005) 33:315464.
[Abstract/Free Full Text] - Elkon R, Linhart C, Sharan R, et al. Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res (2003) 13:77380.
[Abstract/Free Full Text] - Li X, Zhong S, Wong WH. Reliable prediction of transcription factor binding sites by phylogenetic verification. Proc Natl Acad Sci USA (2005) 102:1694550.
[Abstract/Free Full Text] - Chang LW, Nagarajan R, Magee JA, et al. A systematic model to predict transcriptional regulatory mechanisms based on overrepresentation of transcription factor binding profiles. Genome Res (2006) 16:40513.
[Abstract/Free Full Text] - Stoeckert CJ Jr, Salas F, Brunk B, et al. EpoDB: a prototype database for the analysis of genes expressed during vertebrate erythropoiesis. Nucleic Acids Res (1999) 27:2003.
[Abstract/Free Full Text] - Kel AE, Gossling E, Reuter I, et al. MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res (2003) 31:35769.
[Abstract/Free Full Text] - Chen QK, Hertz GZ, Stormo GD. MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comput Appl Biosci (1995) 11:5636.
[Abstract/Free Full Text] - Levitsky VG, Ignatieva EV, Ananko EA, et al. Method SiteGA for transcription factor binding sites recognition. Biofizika (2006) 51:6339.[Medline]
- Ignatieva EV, Oshchepkov DYu, Klimova NV, et al. SITECON: a quality tool for prediction of transcription factor binding sites now handles those for SF-1. Experimental verification and analysis of regulatory regions of orthologous genes. Proceedings of Fifth International Conference On Bioinformatics of Genome Regulation and Structure (2006) 1:525.
- Kim JW, Zeller KI, Wang Y, et al. Evaluation of myc E-box phylogenetic footprints in glycolytic genes by chromatin immunoprecipitation assays. Mol Cell Biol (2004) 24:592336.
[Abstract/Free Full Text] - Cawley S, Bekiranov S, Ng HH, et al. Unbiased mapping of transcription factor binding sites along huma chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell (2004) 116:499509.[CrossRef][Web of Science][Medline]
- Wei CL, Wu Q, Vega VB, et al. A global map of p53 transcription-factor binding sites in the human genome. Cell (2006) 124:20719.[CrossRef][Web of Science][Medline]
- Euskirchen G, Royce TE, Bertone P, et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol (2004) 24:380414.
[Abstract/Free Full Text] - Hagiwara M, Brindle P, Harootunian A, et al. Coupling of hormonal stimulation and transcription via the cyclic AMP-responsive factor CREB is rate limited by nuclear entry of protein kinase A. Mol Cell Biol (1993) 13:48529.
[Abstract/Free Full Text] - Schmid CD, Perier R, Praz V, et al. EPD in its twentieth year: towards complete promoter coverage of selected model organisms. Nucleic Acids Res (2006) 34:D825.
[Abstract/Free Full Text] - Beato M, Eisfeld K. Transcription factor access to chromatin. Nucleic Acids Res (1997) 25:355963.
[Abstract/Free Full Text] - Guan G, Dai PH, Osborn TF, et al. Multiple sequence elements are involved in the transcriptional regulation of the human squalene synthase gene. J Biol Chem (1997) 272:10295302.
[Abstract/Free Full Text]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
