Briefings in Bioinformatics Advance Access published online on April 12, 2007
Briefings in Bioinformatics, doi:10.1093/bib/bbm007
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bayesian methods in bioinformatics and computational systems biology
Corresponding author. D.J. Wilkinson, School of Mathematics & Statistics, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK. Tel: + 44-191-2227320; E-mail: d.j.wilkinson{at}ncl.ac.uk
| ABSTRACT |
|---|
|
|
|---|
Bayesian methods are valuable, inter alia, whenever there is a need to extract information from data that are uncertain or subject to any kind of error or noise (including measurement error and experimental error, as well as noise or random variation intrinsic to the process of interest). Bayesian methods offer a number of advantages over more conventional statistical techniques that make them particularly appropriate for complex data. It is therefore no surprise that Bayesian methods are becoming more widely used in the fields of genetics, genomics, bioinformatics and computational systems biology, where making sense of complex noisy data is the norm. This review provides an introduction to the growing literature in this area, with particular emphasis on recent developments in Bayesian bioinformatics relevant to computational systems biology.
Keywords: Bayesian inference, computational systems biology, networks, graphical models, quantitative, predictive biology
| INTRODUCTION |
|---|
|
|
|---|
Bioinformatics and computational systems biology are undergoing a Bayesian revolution similar to that already seen in genetics [1]. The reason is the samebiology is complex, and data are noisy. Traditional statistical techniques struggle to cope with complex non-linear models that are only partially observed. Due to the fact that the Bayesian statistical paradigm is fully probabilistic, there is no fundamental distinction between any of the unknowns in a statistical modelparameters, hidden variables and observations are all treated together in a consistent mannerand it is from this that the power of the methodology is derived [2]. Provided that you can write down a statistical model relating the quantities you are interested in to the data you can observe (possibly via many unobserved intermediary variables), then you can (in principle) carry out Bayesian inference to extract the information in the data to give fully probabilistic information on all unobserved model variables. The main limiting factor in applying Bayesian methods is computational. For non-trivial problems, analytic approaches to Bayesian inference are not possible, and their numerical solution is often challenging due to the need to solve high-dimensional integration problems (which in the discrete case translate to combinatorial summation problems). Advances in the speed of commodity computing hardware in recent decades has been parallelled by developments in computationally intensive algorithms for Bayesian inference. Arguably the most important advance has been the development of a range of techniques based on Markov chain Monte Carlo (MCMC). The ideas originate from statistical physics [3], but are now widely used for Bayesian inference [4, 5]. Although by no means a panacea, carefully crafted MCMC algorithms executed on fast computers are able to solve a phenomenal range of problems that would have been considered completely intractable only a few years ago.
In the simplest (continuous) setting, we are interested in making inferences about the parameter vector
of a probability (density) model p(y |
) giving rise to an observed data vector y. If we treat the parameters as uncertain, and allocate to them a prior probability density
(
), then Bayes theorem gives the posterior density
|
|
(
| y) is regarded as a function of
for fixed (observed) y, we can re-write this as |
|
) will not be known explicitly or marginalisation over some components of
will be required. Whilst analytically intractable, these integration problems are typically amenable to a Monte Carlo or MCMC solution. In the high-dimensional context, it is often necessary to decompose the full problem according to the underlying conditional independence structure of the model, and it is in this context that graphical models [6] (also known as conditional independence graphs) are particularly useful. In non-statistical communities, the term Bayes(ian) network is often used to describe a discrete graphical model. However, it is important to note that graphical models can be used to describe any probabilistic conditional independence structure, and that many of the techniques that are often used to learn Bayesian networks are not Bayesian.
The simplest example of a MCMC method is the Gibbs sampler [7, 8]. Here a Markov chain is constructed with equilibrium distribution
(
| y). Each iteration of the sampler involves cycling through each component of the p-dimensional vector
in order and sampling from
(
i |
-i,y), i=1, ... ,p, where
-i denotes the vector of all components of
except
i. Knowledge of the conditional independence graph for the model can simplify the computation of these so-called full-conditional distributions. In many cases, the full-conditionals will be straightforward to sample directly, but in others, a MetropolisHastings method will be required [9, 10]. Here a proposed new value is simulated from a largely arbitrary proposal distribution,
and accepted with a probability carefully chosen to preserve the detailed balance of the chain. Many practical details of the method are presented in [11, 12].
| BIOINFORMATICS |
|---|
|
|
|---|
Biological sequence analysis
One of the first areas to benefit from the application of Bayesian approaches was biological sequence analysis. Here it had already been recognised that working with probabilistic models was extremely useful [13]. Whilst for some simple hidden Markov models (HMMs) it is possible to estimate parameters using conventional statistical techniques (such as maximum likelihood via the EM algorithm) [14, 15], there are many interesting problems where a conventional approach would be inconvenient or unsatisfactory in terms of the information provided by the analysis; see [16] for a good introduction to the use of Bayesian methods in this area. Good examples of this include simultaneous multiple sequence alignment [17, 18], motif discovery and transcription factor binding site prediction [19, 20] and protein secondary structure prediction [21]. One of the key benefits of the Bayesian approach is that it allows proper propagation of uncertainty across different levels of modelling. So whilst a traditional approach to phylogeny estimation would use a pre-calculated multiple alignment, uncertainty in the alignment will not propagate through to uncertainty in the phylogeny. In fact, the converse is also true: models for alignment depend implicitly on an assumed phylogeny, so uncertainty in phylogeny induces alignment uncertainty. Using a Bayesian approach, simultaneous estimation is possible [22]. Even in the relatively simple context of HMM-based ab initio DNA sequence segmentation, the Bayesian approach enables the convenient inclusion of prior information, and provides much richer information about the model parameters [23]. Furthermore, since uncertainty about model structure is treated consistently with parameter uncertainty in the Bayesian context, variable dimension algorithms such as reversible jump MCMC (RJMCMC) [24] can be used to estimate the number of segments and the order of the base dependence along with all other aspects of the model [25]. Liu and Logvinenko [26] provide a detailed review of Bayesian methods in sequence analysis.
Microarray data analysis
The analysis of gene microarray data [27] is another area where Bayesian methods have proven to offer many advantages over more conventional approaches [28, 29]. Although amenable to simple statistical analyses such as ANOVA, microarray data analysis is often broken down into a collection of distinct steps that fail to correctly propagate uncertainty. For example, a typical analysis may begin with some kind of normalisation process that produces corrected expression levels. These normalised data will then be subject to a secondary statistical analysis (such as identification of differentially expressed genes) that ignores any uncertainty in the normalisation processes. Often, then the differentially expressed genes will be used for a further analysis that ignores the uncertainty in the identification procedure. Using Bayesian techniques it is possible to develop integrated models for the analysis of unnormalised cDNA microarray data that correctly propagate uncertainty across the various levels of analysis [30, 31]. Detailed modelling combined with a carefully designed experiment can allow coherent estimation of absolute transcript concentrations from cDNA array data [32, 33]. It is also much more convenient to pool information across multiple experiments and studies using a Bayesian approach [34]. For Affymetrix GeneChip data, developing probabilistic models of the hybridisation process down at the probe level again allows extraction of information likely to be missed using simpler stepwise approaches [35, 36]. Bayesian methods also offer advantages when clustering of expression profiles is felt to be relevant [3739]. In fact, the initial task of segmentation and raw intensity estimation can also benefit from a Bayesian approach [40]. Further modelling approaches and applications are discussed in [4146]. Some recent developments in the field are described in [29], which also covers some proteomic applications.
Protein informatics
There are many applications of Bayesian techniques to problems in protein informatics. Down at the structure level, Bayesian techniques for site matching and alignment have been shown to be particularly valuable [4749]. A Bayesian method for predicting proteinprotein interactions from genomic data is given in [50]. Mass spectrometry data are widely used for understanding the peptide/protein composition of a sample, but these data are subject to many sources of variation, making Bayesian approaches to data analysis highly desirable. Some methods for processing raw spectra are discussed in [51, 52] in the volume [29]. Bayesian methods can also be useful in the context of mass spectrometry clustering and classification [53, 54], as well as protein identification [55, 56].
| COMPUTATIONAL SYSTEMS BIOLOGY |
|---|
|
|
|---|
The analysis of microarray data is also central to much research in computational systems biology, although here the emphasis is slightly different. A major concern of computational systems biology is the development of dynamic predictive models of biological (especially genetic and biochemical) processes [57]. The first stage in this process is the identification of interacting partners (used in a loose sense). One approach to identifying genegene interactions is to attempt to use observed correlations in gene microarray data to infer networks of interaction.
Network inference
A variety of different approaches to network inference are possible, and many widely used techniques are fundamentally Bayesian in nature. Again, it is worth emphasising the apparent confusion between discrete Bayesian networks and more general Bayesian methods. The term Bayes net is generally used in non-statistical communities to refer to discrete probabilistic graphical models, irrespective of whether the techniques used to analyse them are Bayesian. Despite some suggestions to the contrary in the literature, there is no need to discretise continuous data in order to learn a Bayesian networkonly to learn a discrete Bayes net. As mentioned above, graphical models can be estimated without using Bayesian methods, but there are advantages in doing so. This is particularly true when the number of observations is small compared to the number of variables, which is typically the case in the context of microarray data analysis.
An early, influential paper on Bayesian networks for expression data was [58]; also see [59] for a more recent perspective. An approach based on manipulation experiments for inferring directed networks is described in [60]. An efficient method for inferring undirected Gaussian graphical models is described in [61]. More recently, a detailed comparison of various methods for static network inference has been carried out in [62]. Such methods do not have to be based on microarray data. Typically, using more quantitative data on a (small) system of interest will lead to more reliable conclusions. Single-cell flow cytometry data is potentially useful in this context, and a strategy for using it for inferring network structure is described in [63]. It should be pointed out, however, that most of these papers are not especially Bayesian in their approach. More Bayesian approaches to the problem of inferring sparse undirected (Gaussian) graphical models are described in [64] and [65], based on earlier work for graphical Gaussian model selection [66], and these are likely to provide more robust inferences in high-dimensional settings, particularly since most methods are able to provide marginal posterior probabilities for the presence of individual network edges.
Time-course expression data provide some information about system dynamics, and therefore dynamic network models provide a useful starting point for top-down systems biology modelling. Dynamic Bayesian networks (DBNs) have been widely used in this context; see [67, 68] for details. For dynamic networks based on linear Gaussian models, a fast Bayesian-inspired algorithm has recently been proposed [69]. As for static networks, fully Bayesian approaches to this problem are likely to offer significant advantages, and are currently the subject of ongoing research.
Using Bayesian inference for integrating multiple sources of data offers great potential, but currently remains largely unexplored; see [7072] for initial attempts and perspectives.
Quantitative network models
As has already been stated, a key aim of systems biology is to develop quantitative, dynamic models of biological processes of interest. One approach to this problem is to extend the top-down network models so that they provide some quantitative information regarding dynamics [73]. However, this approach has some shortcomings due to the fact that the elements of the model do not link directly to physical parameters of interest. There is therefore great interest in a different approach, based on using data to parameterise bottom-up mechanistic models of biological processes. Obviously, non-Bayesian approaches to this problem are possible [7476], but are limited in terms of the information they can provide. Even in the context of deterministic models of biochemical networks based on ordinary differential equations (ODEs), there is considerable utility in using a Bayesian approach in order to properly address issues of noise modelling and parameter uncertainty [77, 78]. It is also possible to improve parameter estimation using proper prior modelling of parameter uncertainty [79].
A nice application of Bayesian modelling in the context of quantitative modelling is the Characterizing Loss Of Cell Cycle Synchrony (CLOCCS) model [80] for loss of synchrony in yeast populations. A simple application of this model is in the alignment of data sets collected under different conditions. However, this model can also be combined with population level data (such as gene expression array data) in order to recover information about single-cell dynamics from the population averaged data. This detailed modelling of both the process of interest and its relationship with the experimental data is a powerful technique in this context, and similar strategies are likely to lead to many other examples of extracting better information from high-throughput data.
There is increasing evidence that stochasticity plays an important role in intracellular processes [81], and there is therefore a great deal of interest in developing stochastic kinetic models of biological processes [8285]. Furthermore, experimental technology is improving rapidly, so that (semi-)quantitative high-resolution single-cell data of the type that is most informative for the building of stochastic models is now realistically attainable [86]. Typically, data is generated via fluorescence microscopy, then processed to extract gene expression time series [87]. Although fully Bayesian approaches to this image-analysis step are likely to be extremely useful, such techniques do not yet seem to have been described in the literature. Stochastic kinetic models are particularly difficult to estimate using non-Bayesian methods. A valiant attempt is described in [88], but the applicability of the methods described is limited due to the extent to which non-Bayesian methods can cope with hidden data. In particular, the parsimony assumptions that are typically required have the effect of downward-biasing of parameter estimates. However, whilst a fully Bayesian approach to inference for discrete stochastic models is possible [85, 89], it is computationally problematic for models of realistic size and complexity. Also see [90] for a related approach. It turns out to be possible to instead work with a continuous (approximate) formulation of stochastic kinetics, known as the chemical Langevin equation [91, 85]. This model seems to be quite adequate for inferential purposes, and is advantageous due to the fact that inference for this diffusion approximation is more computationally amenable than for the discrete formulation. A basic inferential algorithm for this model is described in [92]. A better algorithm for models of this type, based on ideas of sequential Monte Carlo [93], is developed in [94], and applied to a general and flexible class of stochastic kinetic models in [95]. Finally, an efficient non-sequential MCMC algorithm for stochastic kinetic models is described in [96]. A recent review of fitting models to data by Jaqaman and Danuser [97] includes references to both the Bayesian and non-Bayesian literature.
There is another area of statistical methodology that has obvious applications to systems biology modelling: Bayesian analysis of computer code outputs (BACCO) [98]. Here, a complex (but typically, deterministic) computer simulation model is treated as a black-box from a statistical perspective, and the relationships between model inputs, outputs and experimental data are studied in a non-parametric way, often utilising Gaussian processes [99]. Although these techniques do not yet seem to have been applied to systems biology modelling problems, they have been applied to challenging problems in other application areas [100, 101], so it seems inevitable that as systems biology models become larger and more complex, and BACCO techniques become more sophisticated (better suited to high-dimensional inputs and outputs, and intrinsic stochasticity in the computer models), that applications of BACCO methods to problems in computational systems biology will become common-place.
| DISCUSSION |
|---|
|
|
|---|
It is impossible in an article of this nature to give a fully comprehensive review of all Bayesian work in bioinformatics. Here the focus has been on work which clearly demonstrates the advantages of the Bayesian approach, and that which is most directly relevant to the new science of computational systems biology. Of course, this latter area is still an emerging field, and it is not yet clear which (if any) of the methods and techniques described here will stand the test of time. The main drawback of fully Bayesian methods are the computational demands associated with their computer implementation. This has so far limited their application to certain challenging problems in the bioinformatics arena (such as whole-genome annotation). The Bayesian framework provides a coherent mathematical solution to the problem, but not always an efficient computational algorithm for practical implementation. Even in difficult scenarios, however, probabilistic statistical models (such as Hidden Markov Models) are becoming the accepted framework for analysis [13], and used in conjunction with point estimation methods (such as the EM algorithm) for parameter fitting. However, experience from closely related disciplines suggests that fully Bayesian approaches will turn out to provide the most satisfactory solutions to the complex statistical inference problems which lie at the heart of computational systems biology. Improvements in computing hardware, the widespread availability of parallel computer clusters and the development of computational Bayesian algorithms that are able to exploit them [102] mean that there is likely to be an increasing tendency to push for fully Bayesian solutions to the challenging inferential problems in this area, in order to maximise the information that can be extracted from expensive experimental data.
| FOOTNOTES |
|---|
|
|
|---|
Darren Wilkinson is a Senior Lecturer in Statistics within the School of Mathematics & Statistics at Newcastle University. He has a background in computational Bayesian statistics, and in recent years has become increasingly interested in applications to statistical bioinformatics and computational systems biology. At Newcastle, he is a member of the Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN) and the Systems Biology Resource Centre (SBRC).
| References |
|---|
|
|
|---|
- Beaumont MA, Rannala B. The Bayesian revolution in genetics. Nat Rev Genet 2004; 5:25161.[Web of Science][Medline]
- O'Hagan A, Forster JJ. Bayesian Inference. Kendall's Advanced Theory of Statistics.London: Arnold 2004; Vol. 2B.
- Metropolis N, Rosenbluth AW, Rosenbluth MN, et al. Equations of state calculations by fast computing machines. J Chem Phys 1953; 21:108792.[CrossRef]
- Gamerman D. Markov Chain Monte Carlo. Texts in Statistical Science.New York: Chapman and Hall 1997.
- Brooks SP. Markov chain Monte Carlo method and its application. Statistician 1998; 47:69100.
- Lauritzen SL. Graphical Models.Oxford: Oxford Science Publications 1996.
- Geman S, Geman D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 1984; 6:72141.[CrossRef]
- Cassella G, George EI. Explaining the Gibbs sampler. Am Stat 1992; 46:16774.[CrossRef]
- Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970; 57:97109.
[Abstract/Free Full Text] - Tierney L. Markov chains for exploring posterior distributions (with discussion). Ann Statist 1994; 21:170162.
- Geyer CJ. Practical Markov chain Monte Carlo. Statistical Sci 1992; 7:473511.[CrossRef]
- Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2nd edn Boca Raton, FL: Chapman & Hall/CRC Press 2003.
- Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids.Cambridge: Cambridge University Press 1998.
- Bishop MJ, Thompson EA. Maximum likelihood alignment of DNA sequences. J Mol Biol 1986; 190:15965.[CrossRef][Web of Science][Medline]
- Churchill GA. Stochastic models for heterogeneous DNA sequences. Bull Math Biol 1989; 51:7994.[Web of Science][Medline]
- Liu JS, Lawrence CE. Bayesian inference on biopolymer models. Bioinformatics 1999; 15:3852.
[Abstract/Free Full Text] - Liu JS, Neuwald AF, Lawrence CE. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J Am Stat Assoc 1995; 90:115670.[CrossRef][Web of Science]
- Liu J, Neuwald A, Lawrence C. Markovian structures in biological sequence alignments. J Am Stat Assoc 1999; 94:115.[CrossRef][Web of Science]
- Zhou Q, Liu JS. Modelling within-motif dependence for transcription factor binding site predictions. Bioinformatics 2004; 20:90916.
[Abstract/Free Full Text] - Narlikar L, Gordân R, Ohler U, Hartemink AJ. Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 2006; 22:e38492 ISMB06.
[Abstract/Free Full Text] - Schmidler SC, Liu JS, Brutlag DL. Bayesian segmentation of protein secondary structure. J Comput Biol 2000; 7:23348.[CrossRef][Web of Science][Medline]
- Lunter G, Miklós I, Drummond A, et al. Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 2005; 6:(83).
- Boys RJ, Henderson DA, Wilkinson DJ. Detecting homogeneous segments in DNA sequences by using hidden Markov models. J R Stat Soc 2000; C:49:26985.[CrossRef]
- Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 1995; 82:71132.
[Abstract/Free Full Text] - Boys RJ, Henderson DA. A Bayesian approach to DNA sequence segmentation (with discussion). Biometrics 2004; 60:57388.[CrossRef][Web of Science][Medline]
- Liu JS, Logvinenko T. Bayesian methods in biological sequence analysis. In Balding DJ, Bishop M, Cannings C (Eds.). Handbook of Statistical Genetics. 2nd edn New York: Wiley 2003 Chap. 3.
- In Speed TP (Ed.). Statistical Analysis of Gene Expression Microarray Data.Florida: Chapman & Hall/CRC, Boca Raton 2003.
- Wit E, McClure J. Statistics for Microarrays: Design, Analysis and Inference.New York: Wiley 2004.
- In Vanucci M, Do K-A, Müller P (Eds.). Bayesian Inference for Gene Expression and Proteomics.New York: Cambridge University Press 2006.
- Zhang D, Wells MT, Smart CD, Fry W. Bayesian normalization and identification for differential gene expression data. J Comput Biol 2005; 12:391406.[CrossRef][Web of Science][Medline]
- Bhattacharjee M, Pritchard CC, Nelson PS, Arjas E. Bayesian integrated functional analysis of microarray data. Bioinformatics 2004; 20:294353.
[Abstract/Free Full Text] - Frigessi A, van de Wiel MA, Holden M, et al. Genome-wide estimation of transcript concentrations from spotted cDNA microarray data. Nucleic Acids Res 2005; 17:(e143).
- van de Wiel MA, Holden M, Glad IK, et al. Bayesian process-based modeling of two-channel microarray experiments: estimating absolute mRNA concentrations. In Vanucci M, Do K-A, Müller P (Eds.). Bayesian Inference for Gene Expression and Proteomics.New York: Cambridge University Press 2006 pp. 7596.
- Conlon EM, Song JJ, Liu JS. Bayesian models for pooling microarray studies with multiple sources of variation. BMC Bioinformatics 2006; 7:(247).
- Hein A-MK, Richardson S, Causton HC, et al. BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data. Biostatistics 2005; 6:34973.
[Abstract/Free Full Text] - Lewin A, Richardson S, Marshall C, et al. Bayesian modelling of differential gene expression. Biometrics 2006; 62:108.[Web of Science][Medline]
- Yeung K, Fraley C, Murua A, et al. Model-based clustering and data transformations for gene expression data. Bioinformatics 2001; 17:97787.
[Abstract/Free Full Text] - Wakefield JC, Zhou C, Self SG. Modelling gene expression data over time: curve clustering with informative prior distributions. In Bernardo J-M, Bayarri MJ, Berger JO (Eds.), et al. Bayesian Statistics 7.Oxford: Oxford University Press 2003 pp. 72132.
- Heard NA, Holmes CC, Stephens DA, et al. Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges. Proceedings of the National Acadamy of Sciences 2005; 102:1693944.[CrossRef]
- Gottardo R, Besag J, Stephens M, Murua A. Probabilistic segmentation and intensity estimation for microarray images. Biostatistics 2006; 7:8599.
[Abstract/Free Full Text] - West M, Blanchette C, Dresden H, et al. Predicting the clinical status of human breast cancer utilizing gene expression profiles. Proc Natl Acad Sci 2001; 98:1146267.
[Abstract/Free Full Text] - Ibrahim JG, Chen M-H, Gray RJ. Bayesian models for gene expression with DNA microarray data. J Am Stat Assoc 2002; 97:8899.[CrossRef][Web of Science]
- Gottardo R, Pannucci JA, Kuske CR, Brettin T. Statistical analysis of microarray data: A Bayesian approach. Biostatistics 2003; 4:597620.[Abstract]
- Mertens BJA. On the application of logistic regression modelling in microarray studies. In Bernardo J-M, Bayarri MJ, Berger JO (Eds.), et al. Bayesian Statistics 7.Oxford: Oxford University Press 2003 pp. 60718.
- West M. Bayesian factor regression models in the large p, small n paradigm. In Bernardo J-M, Bayarri MJ, Berger JO (Eds.), et al. Bayesian Statistics 7.Oxford: Oxford University Press 2003 pp. 73342.
- Gottardo R, Raftery AE, Yeung KY, Bumgarner RE. Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 2006; 62:1018.[Web of Science][Medline]
- Green PJ, Mardia KV. Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 2006; 93:23554.
[Abstract/Free Full Text] - Schmidler SC. Fast Bayesian shape matching using geometric algorithms. In Bernardo J-M, Bayarri MJ, Berger JO (Eds.), et al. Bayesian Statistics 8.Oxford: Oxford University Press 2007 in press.
- Mardia KV, Green PJ, Nyirongo VB, et al. Bayesian refinement of protein functional site matching. BMC Bioinformatics. 2007 in press.
- Jansen R, Yu H, Greenbaum D, et al. Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003; 302:44953.
[Abstract/Free Full Text] - Morris JS, Brown PJ, Baggerly K, Coombes K. Analysis of mass spectrometry data using Bayesian wavelet-based functional mixed models. In Vanucci M, Do K-A, Müller P (Eds.). Bayesian Inference for Gene Expression and Proteomics.New York: Cambridge University Press 2006 Chap 14.
- Clyde M, House L, Wolpert R. Nonparametric models for proteomic peak identification and quantification. In Vanucci M, Do K-A, Müller P (Eds.). Bayesian Inference for Gene Expression and Proteomics.New York: Cambridge University Press 2006 Chap 15.
- Bensmail H, Golek J, Moody MM, et al. A novel approach for clustering proteomics data using Bayesian fast Fourier transform. Bioinformatics 2005; 21:221024.
[Abstract/Free Full Text] - Saksena A, Lucarelli D, Wang I-J. Bayesian model selection for mining mass spectrometry data. Neural Netw 2005; 18:8439.[CrossRef][Web of Science][Medline]
- Zhang W, Chait BT. Profound: an expert system for protein identification using mass spectrometric peptide mapping information. Ann Chem 2000; 72:248289.[CrossRef]
- Chen SS, Deutsch EW, Yi EC, et al. Improving mass and liquid chromatography based identification of proteins using Bayesian scoring. J Proteome Res 2005; 4:217484.[CrossRef][Web of Science][Medline]
- Kitano H. Computational systems biology. Nature 2002; 420:206210.[CrossRef][Medline]
- Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian networks to analyse expression data. J Comput Biol 2000; 7:60120.[CrossRef][Web of Science][Medline]
- Friedman N. Inferring cellular networks using probabilistic graphical models. Science 2004; 303:799805.
[Abstract/Free Full Text] - Pournara I, Wernisch L. Reconstruction of gene networks using Bayesian learning and manipulation experiments. Bioinformatics 2004; 20:293442.
[Abstract/Free Full Text] - Schäfer J, Strimmer K. An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 2005; 21:75464.
[Abstract/Free Full Text] - Werhli AV, Grzegorczyk M, Husmeier D. Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Bioinformatics 2006; 22:252331.
[Abstract/Free Full Text] - Sachs K, Perez O, Pe'er D, et al. Causal protein-signaling networks derived from multiparameter single-cell data. Science 2005; 308:52329.
[Abstract/Free Full Text] - Dobra A, Hans C, Jones B, et al. Sparse graphical models for exploring gene expression data. J Multivariate Anal 2004; 90:196212.[CrossRef]
- Jones B, Carvalho C, Dobra A, et al. Experiments in stochastic computation for high-dimensional graphical models. Statist Sci 2005; 20:388400.[CrossRef]
- Giudici P, Green PJ. Decomposable graphical Gaussian model determination. Biometrika 1999; 86:785801.
[Abstract/Free Full Text] - Husmeier D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics 2003; 19:227182.
[Abstract/Free Full Text] - Yu J, Smith VA, Wang PP, et al. Advances to Bayesian network inference for generating causal networks from observational data. Bioinformatics 2004; 20:3594603.
[Abstract/Free Full Text] - Opgen-Rhein R Strimmer K. Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process. Bioformatics 2007; 8:PMSB06 supplement, in press.
- Lee I, Date SV, Adai AT, Marcotte EM. A probabilistic functional network of yeast genes. Science 2004; 306:155558.
[Abstract/Free Full Text] - Bernard A, Hartemink AJ. Informative structure priors: joint learning of dynamic regulatory networks from multiple types of data. Pacific Symposium on Biocomputing 2005 In Altman R, Dunker AK, Hunter L (Eds.), et al. New Jersey: World Scientific 2005 45970.
- West M, Ginsburg GS, Huang AT, Nevins JR. Embracing the complexity of genomic data for personalised medicine. Genome Res 2006; 16:55966.
[Abstract/Free Full Text] - Nachman I, Regev A, Friedman N. Inferring quantitative models of regulatory networks from expression data. Bioinformatics 2004; 20:supp. 1, 24856.[CrossRef]
- Ronen M, Rosenberg R, Shraiman BI, Alon U. Assigning numbers to the arrows: parameterising a gene regulation network by using accurate expression kinetics. Proc Nat Acad Sci 2002; 99:1055560.
[Abstract/Free Full Text] - Moles CG, Mendes P, Banga JR. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res 2003; 13:246774.
[Abstract/Free Full Text] - Gadkar KG, Gunawan R, Doyle FJ III. Iterative approach to model identification of biological networks. BMC Bioinformatics 2005; 6:(155).
- Brown KS, Sethna JP. Statistical mechanical approaches to models with many poorly known parameters. Phys Rev E 2003; 68:(021904).
- Barenco M, Tomescu D, Brewer D, et al. Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biol 2006; 7:(R25).
- Liebermeister W, Klipp E. Biochemical networks with uncertain parameters. IEE Syst Biol 2005; 152:97107.[CrossRef]
- Orlando D, Lin C, Bernard A, et al. A probabilistic model for cell cycle distributions in synchrony experiments. Cell Cycle 2007; 6:(4)478488.[Medline]
- Bahcall OG. Single cell resolution in regulation of gene xpression. Mol Syst Biol 2005; 1: doi:10.1038/msb4100020.
- McAdams HH, Arkin A. Stochastic mechanisms in gene expression. Proc Nat Acad Sci USA 1997; 94:8149.
[Abstract/Free Full Text] - Arkin A, Ross J, McAdams HH. Stochastic kinetic analysis of developmental pathway bifurcation in phage
-infected Escherichia coli cells. Genetics 1998; 149:163348.[Abstract/Free Full Text] - McAdams HH, Arkin A. It's a noisy business: genetic regulation at the nanomolecular scale. Trends Genet 1999; 15:659.[CrossRef][Web of Science][Medline]
- Wilkinson DJ. Stochastic Modelling for Systems Biology.Boca Raton, Florida: Chapman & Hall/CRC Press 2006.
- Pepperkok R, Ellenberg J. High-throughput fluorescence microscopy for systems biology. Nat Rev Mol Cell Biol 2006; 7:6906.[CrossRef][Web of Science][Medline]
- Shen H, Nelson G, Nelson DE, et al. Automated tracking of gene expression profiles in individual cells and cell compartments. J R Soc Interface 2006; 3:787794.
[Abstract/Free Full Text] - Reinker S, Altman RM, Timmer J. Parameter estimation in stochastic biochemical reactions. IEE Syst Biol 2006; 153:16878.[CrossRef]
- Boys RJ, Wilkinson DJ, Kirkwood TBL. Bayesian inference for a discretely observed stochastic kinetic model. in press.
- Rempala GA, Ramos KS, Kalbfleisch T. A stochastic model of gene transcription: an application to L1 retrotransposition events. J Theor Biol 2006; 242:10116.[CrossRef][Web of Science][Medline]
- Gillespie DT. The chemical Langevin equation. J Chem Phys 2000; 113:297306.[CrossRef]
- Golightly A, Wilkinson DJ. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics 2005; 61:78188.[CrossRef][Web of Science][Medline]
- In Doucet A, de Freitas N, Gordon N (Eds.). Sequential Monte Carlo Methods in Practice.New York: Springer 2001.
- Golightly A, Wilkinson DJ. Bayesian sequential inference for nonlinear multivariate diffusions. Statist Comput 2006; 16:32338.[CrossRef]
- Golightly A, Wilkinson DJ. Bayesian sequential inference for stochastic kinetic biochemical network models. J Comput Biol 2006; 13:83851.[CrossRef][Web of Science][Medline]
- Golightly A, Wilkinson DJ. Bayesian inference for nonlinear multivariate diffusion models observed with error. in press.
- Jaqaman K, Danuser G. Linking data to models: data regression. Nat Rev Mol Cell Biol 2006; 7:81319.[CrossRef][Web of Science][Medline]
- O'Hagan A. Bayesian analysis of computer code outputs: a tutorial. Reliab Eng Sys Safe 2006; 91:1290300.[CrossRef]
- Kennedy MC, O'Hagan A. Bayesian calibration of computer models (with discussion). J R Stat Soc, Series B 2001; 63:42564.[CrossRef]
- Goldstein M, Rougier J. Bayes linear calibrated prediction for complex systems. J Am Stat Assoc 2006; 101:113243.[CrossRef][Web of Science]
- Challenor PG, Hankin RKS, Marsh R. Towards the probability of rapid climate change. In Schellnhuber HJ, Cramer W, Nakicenovic N (Eds.), et al. Avoiding Dangerous Climate Change.Cambridge: Cambridge University Press 2006 pp. 5363.
- Wilkinson DJ. Parallel Bayesian computation. In Kontoghiorghes EJ (Ed.). Handbook of Parallel Computing and Statistics.New York: Marcel Dekker/CRC Press 2005 pp. 481512.
This article has been cited by other articles:
![]() |
V. Vyshemirsky and M. A. Girolami Bayesian ranking of biochemical system models Bioinformatics, March 15, 2008; 24(6): 833 - 839. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
