Briefings in Bioinformatics Advance Access published online on July 18, 2007
Briefings in Bioinformatics, doi:10.1093/bib/bbm034
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Towards a calculus of biomolecular complexes at equilibrium
Corresponding author. Eric Mjolsness, Department of Computer Science, University of California, Irvine, CA 92617, USA. Tel: 949 824 3533; Fax: 949 824 4056; E-mail: emj{at}uci.edu
| ABSTRACT |
|---|
|
|
|---|
An overview is presented of the construction and use of algebraic partition functions to represent the equilibrium statistical mechanics of multimolecular complexes and their action within a larger regulatory network. Unlike many applications of equilibrium statistical mechanics, multimolecular complexes may operate with various subsets of their components present and connected to the others, the rest remaining in solution. Thus they are variable-structure systems. This aspect of their behavior may be accounted for by the use of fugacity variables as a representation within the partition functions.
Four principles are proposed by which the combinatorics of molecular complex construction can be reflected in the construction of their partition functions. The corresponding algebraic operations on partition functions are multiplication, addition, function composition and a less commonly used operation called contraction. Each has a natural interpretation in terms of probability distributions on multimolecular structures. Possible generalizations to nonequilibrium statistical mechanics are briefly discussed.
Keywords: biomolecular complex, allosteric enzyme, transcription complex, statistical mechanics, equilibrium, partition function, fugacity, composition, contraction
| INTRODUCTION |
|---|
|
|
|---|
With the recent flourishing of systems biology, interest has increased in methods for quantitatively modeling the effects that biomolecular complexes have on the larger regulatory networks and regulatory systems within which they are embedded. For example, transcription complexes can integrate a number of transcription factor inputs into an overall decision for or against the initiation of transcription of a specific gene. Thus it provides a probability per unit time, or a rate, of transcription initiation. How can we quantitatively model this kind of integration of inputs% In the transcription complex case we would like to translate from structural aspects of molecular interactions within the complex, such as protein:DNA binding and protein:protein interactions, which may be partly known and partly hypothesized, to some kind of rate law for the process of transcription. More generally we would like to translate from structure and experimentally measurable parameters to an algebraic formula or very fast algorithm, depending on the measurable parameters, for evaluating a rate law governing a process mediated by the complex. From such a rate law we can create an ordinary differential equation model for the effect of the complex within a larger network. If we also receive an algebraic formula for the standard deviation of actual rate from expected rate, as is possible in statistical mechanics, then we can create more realistic stochastic models such as stochastic differential equations.
Quite a variety of techniques can be used to achieve the goal of constructing such quantitative models. They can be divided into equilibrium and nonequilibrium models for the biomolecular complex or either can be used within a network-scale model, which must generally be nonequilibrium in a living system. Equilibration can happen at a fast time scale for the complex, which forms a small part of a larger system that changes on a slow time scale and is at equilibrium; the result is a quasi-equilibrium model of the complex and a rate law for its effect on the larger system.
In this note I will review a simplified way of thinking about these problems that can straightforwardly yield algebraic rate laws for the activities of multimolecular complexes in quasi-equilibrium, starting from hypotheses about the interaction connectivity of their state variables having to do with conformation and binding sites. Foundations of equilibrium and nonequilibrium statistical mechanics are recapitulated and related to one another in Section Equilibrium and nonequilibrium statistical mechanics, below. Very small examples of equilibrium models are discussed in Section Equilibrium for binding sites. More complex algebraic models and partition functions can then be built up step by step, using four main principles of construction introduced in Section Principles for combining partition functions. Multiplication of partition functions corresponds to independent probability distributions. Addition corresponds to mixtures of probability distributions. Function composition corresponds to a tree-like (acyclic) topology of interactions between subcomplexes that may or may not bind to one another. Cycles can be added to such a topology using a contraction operation on a partition function. Applications of each of these operations will be shown in the examples of Section Branching tree structure and Section Examples, drawn largely from quasi-equilibrium models of transcriptional regulation and allosteric enzymes.
The result of successfully applying this calculus of model-building operations will be a specialized but relatively simple reaction rate law that models the activity of a multimolecular complex within larger, network-level models. This law will take the form of an algebraic expression that can be used not only in simulations but also in mathematical analyses. Specific examples of the analyses that become available include (i) determining the qualitative behavior and stability of a network model using methods of dynamical systems theory [1], for example oscillations arising from Hopf bifurcations in the Repressilator-like [2] and p53/Mdm2 [3] network models, and (ii) determining system identifiability for automatic inference of pharmacokinetic parameters from data [4]. Such analyses can enable one to understand more deeply the actual biological systems being modeled.
| THEORY |
|---|
|
|
|---|
Equilibrium and nonequilibrium statistical mechanics
This section serves to encapsulate the needed elementary statistical mechanics in simple notation.
In an equilibrium statistical mechanics model, the probability of any discrete state I of the multimolecular complex, including all information about its discrete conformation and binding status, is proportional to the Boltzmann factor exp(–ßGI). Here GI is the Gibbs free energy of the state and ß is inversely proportional to the temperature: ß = 1/(kT) where k is Boltzmann's constant in appropriate units. Consequently the relative probabilities of any two states I and J have the ratio
|
| (1) |
|
| (2) |
|
| (3) |
IpI = 1. The discrete-state probabilities pI then must evolve exactly according to the master equation that would ensue from the law of mass action for such a set of reactions:
|
| (4) |
Equilibrium and nonequilibrium formulations of statistical mechanics are related, in that equilibrium can be achieved by constraining the forward and backward rates KIJ and KJI so as to satisfy the condition of detailed balance (
for some
). This implies a steady state for a Boltzmann distribution defined by
|
|
The nonequilibrium approach may be essential for many molecular machines, possibly in combination with equilibrium models for selected substructures. A general and systematic approach to constructing nonequilibrium stochastic models taking a form similar to the W matrix in Equation (4), starting from models specified in terms of reactions similar to Equation (3) augmented with parameters for the reactants, and resulting in simulation algorithms, has been proposed [5].
I now turn to the algebraic construction of solvable free energy functions GI.
Equilibrium for binding sites
A single binding site
As an example, consider a single binding site which may be unoccupied or may be occupied by a single molecule of only one particular molecular species A. The change in free energy for occupying the binding site is the sum of the binding energy (which we expect to be negative) and the change in free energy due to taking one molecule of A out of solution:
|
|
|
|
AzA and 1: |
|
|
|
{ 0, 1}, denoted by an open circle, in total isolation from any other variables (Figure 1A). The value of s is the number of occupying A's, either zero or one.
|
Two independent sites
To build up larger models we can use rules for appropriately combining partition functions. For two binding sites that have no influence on one another, and are thus independent, the partition functions multiply:
|
|
|
|
|
|
On the other hand if sites 1 and 2 both bind A and we want to know the average total number of A molecules bound, the answer is given by a different logarithmic derivative of Z (Supplementary Material Section S2).
The graphical model for this system would consist of two binary variable nodes s
{ 0, 1}, one for each site, with no connecting link between them or any other variable (Figure 1B).
Two nonindependent sites
If we now allow energetic interactions between two binding sites b = 1 and b = 2 that can each be empty or occupied by molecules of species 1 or 2, respectively, and no other internal states, then
|
|
A protein with a single binding site that can be empty or occupied by species 1 or 2 would be modeled the same way except that both occupiers cannot be present simultaneously, i.e.
must be true (Figure 2A), hence the impossible state s1 = s2 = 1 is omitted from the sum over all possible states in the partition function and
|
|
|
If the protein is itself regarded as another species that can be present or absent, with fugacity z0, then it must be present, so s0
must be true, and the partition function is Z(z1, z2) = z0(1 +
1z1 +
2z2). Likewise, a heterodimer consisting only of species 1 and 2 with no internal states would satisfy s1
s2 (Figure 2B) and therefore Z(z1, z2) =
12z1z2 (This is a trivial case since there is only one state, but it will be useful when the species are given internal states as well). In each case, as for any probability generating function, the coefficients can be normalized to give the probabilities of each possible configuration of bindings.
Thus partition functions may be expressed as polynomials in fugacity variables. This is a particularly convenient notation for molecules in a dilute solution which acts as a reservoir, since in that case fugacities zi are proportional to concentrations ci = [Si]. Their logarithmic derivatives, which commonly describe the average activity of a multimolecular complex, are then just ratios of such polynomials. Such a polynomial partition function can be cast as a polynomial with homogeneous degree by introducing the complementary fugacity variables
and
and substituting
, where di is the degree of zi in Z. This formulation is entirely equivalent mathematically, since we may set
in Zhomog to recover Z, and it can be useful when we want to treat pairs of binding/unbinding or activation/inactivation states in a symmetric manner.
Principles for combining partition functions
There is a useful set of principles for creating partition functions that reflect the structure of multimolecular complexes. The principles are named after the algebraic operations on partition functions that they invoke: multiplication, addition, composition and contraction.
Multiplication
From the foregoing example of two independent binding sites, a first principle for combining partition functions is: The product of two partition functions corresponding to probability distributions P1(s1) and P2(s2) is itself a partition function representing the independent distribution P1(s1) P2(s2). In other words, the partition functions of independent random variables multiply. In this case the logarithmic derivatives just add up.
Addition
A second principle results from adding partition functions. The weighted sum w1Z1(z) + w2Z2(z) reweights all the monomial terms within either Z1(z) or Z2(z). Thus, the nonnegatively weighted sum of partition functions represents a mixture distribution of the component probability distributions with related weights, in this case
.
Composition
A third principle is a powerful tool for constructing complex structures and will be illustrated in Section Branching tree structure: that functional composition of partition functions corresponds to a tree-like structure in the construction of the molecular system represented. The functional composition is achieved by substituting a whole partition function Zf for an appropriate fugacity variable zi occurring within another partition function Zc obtaining Zc(Zf (z'), z). The assumed molecular tree structure is reflected in the tree structure of partition functions and their arguments. Logarithmic derivatives may then be taken using the chain rule of differential calculus, as is done in Equation (6) in Section Transcriptional regulation by hierarchical cooperative activation subsequently. A similar phenomenon arises in the theory of birth and death processes or branching processes, in which a single generating function may be composed with itself many times [6].
Contraction
A fourth principle is not quite as simple. A special contraction operation on partition functions, defined in Section Contraction example: 1D Ring, allows cycles to be introduced into the treelike, noncyclic structure resulting from the operation of the third principle. This is important due to the many non-treelike structures, such as rings, that may be present in multimolecular complexes. The resulting computations appear to be generally more difficult, so that a premium is placed on minimizing the number of contractions.
These principles can be applied repeatedly in various orders to generate interesting structure, as we will show in the remainder of the article using their italicized names to highlight how each principle is employed. Together, these principles point towards a symbolic calculus for creating and reasoning about equilibrium multimolecular complex models.
Branching tree structure
Dyson [7] first solved the equilibrium statistical mechanics of a model with branching structure. Here we will provide notation to make this type of calculation convenient for molecular complexes that can be built up in solution.
Transcriptional regulation by multiple binding sites
In a simple quasi-equilibrium model of transcriptional regulation of gene i, which has a set of transcription factor binding sites labeled by (i,b), by transcription factors j, it is assumed that the complex has two states that permit or prohibit transcriptional initiation. We start out using the addition principle just as for the single-binding-site example, except that now the single binary variable s1 = ±1 refers to conformation and not occupancy. We give these two alternatives fugacity-like variables
, creating a homogeneous polynomial in z:
|
|
|
|
|
|
i to mark terms associated with si = 1, i.e. the transcriptionally active states. The fraction of maximal transcriptional activation is then [8]
|
| (5) |
The interaction graph corresponding to this model consists of a parent node si connected separately to each of a set of children nodes sib
{ 0, 1} representing the occupancy of the corresponding sites. There are no direct connections among the children. This situation is is illustrated in Figure 3A.
|
Further partition function engineering of this general flavor was performed in order to generalize the MWC model to allosteric enzymes with multiple activators, inhibitors and/or substrates [11]. Logarithmic derivatives were used to derive rate laws for the action of such enzymes for inclusion within a larger-scale ordinary differential equation model for the synthesis of branched chain amino acids in Escherichia coli. Since good agreement with experimental data was obtained, the resulting Generalized MWC (GMWC) model has been added to the Cellerator software [12] for cell model generation and analysis, which runs within the Mathematica computer algebra system.
| EXAMPLES |
|---|
|
|
|---|
Transcriptional regulation by hierarchical cooperative activation
We may take the foregoing model one step further by interposing an extra level of hierarchical structure in transcriptional activation corresponding to modules of interacting binding sites. Using addition, multiplication and composition exactly as before,
|
|
An example calculation using logarithmic derivatives and the chain rule is to calculate the fractional site occupancy of one binding site, which may be an observable quantity:
|
| (6) |
1D Chain
An example of a different sort has state information, but no occupancy information and no variable connectivity due to varying occupancy of binding sites. Instead there is one binary variable si
{ ± 1 } at every node in a one-dimensional chain of N identical nodes. We may regard the first node as the top one in a downward growing lineage tree that never branches, having additive partition function at the top level:
|
| (7) |
|
|
|
|
Similar chain models, albeit nonhomogeneous, can be used to augment the HCA model by describing the equilibrium of a chain of transcription factor binding sites that compete for occupancy by overlapping with one another along DNA [8].
Contraction example: 1D ring
A final example demonstrates the contraction principle by which we can go beyond treelike topologies by adding cycles. The graph interpretation of contraction is to identify two nodes representing state variables si and sj, demanding that they are identically equal and that any redundant probability factors be removed. Graphically this operation can add a cycle to a tree. Here we will demonstrate it on the trivial tree consisting of the chain in the 1D Ising model of length N + 1, tying the first and last state variables to each other as SN + 1 = S1.
In order to remember the value of s1 we introduce the temporary fugacity-like variable
in the previous calculation:
|
|
(s1) and must cancel out
if SN + 1 = S1: |
|
p term of the series expansion of ZN(
, z) in
, denoted here Coef[ZN(
, z),
, p], for p = 0, in this way performing the contraction operation:
|
| (8) |
For example, for N = 4 the resulting partition function is
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
The foregoing examples can be augmented with many others by repeatedly applying the four principles proposed for constructing partition functions. For example, one could create a ring-of-trees topology. The Drosophila transcriptional regulatory model family of [13], and the multiply regulated allosteric enzyme model of [11], can also be derived by using these principles.
Each of the model construction principles has an interpretation in terms of interaction graphs in equilibrium statistical mechanics, now widely applied as Markov random fields in pattern recognition or graphical models in machine learning. Multiplication corresponds to disconnected subgraphs. Addition corresponds to a mixture model gated by a discrete selection variable. Composition corresponds to a tree topology, which unlike conventional graphical models may have a variable structure due to subtrees that can be present or absent. Contraction corresponds to identifying and fusing two existing nodes and thereby possibly creating cycles. These latter two principles still require experience to apply correctly since they are not yet fully formalized and automated using a computer algebra representation such as that of [12].
| Funding |
|---|
|
|
|---|
Supported in part by the Biomedical Information Science and Technology Initiative (BISTI) grant (number 4 R33 GM069013) from the US National Institute of General Medical Sciences and Frontiers in Biological Research (FIBR) Award EF-0330786 from the the US National Science Foundation.
Key Points
|
| SUPPLEMENTARY MATERIALS |
|---|
|
|
|---|
Supplementary Materials are available at Briefings in Bioinformatics Online.
| Acknowledgements |
|---|
|
|
|---|
The author wishes to acknowledge useful discussions with Lee Bardwell and Vitali Likoshvai.
| FOOTNOTES |
|---|
|
|
|---|
Eric Mjolsness is an Associate Professor in the Department of Computer Science and the Institute for Genomics and Bioinformatics at the University of California, Irvine. His research interests include computational biology, mathematical methods and machine learning applied to the sciences.
Submitted: April 16, 2007. Received (in revised form): June 18, 2007.
| References |
|---|
|
|
|---|
- Guckenheimer J, Holmes P. Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields (1983) New York: Springer-Verlag.
- Golubyatnikov VP, Makarov EV. Closed trajectories in the gene networks. (2004) The Fourth International Conference on Bioinformatics of Genome Regulation and Structure (BGRS'2004): Novosibirsk. Siberian Branch: Russian Academy of Sciences. 42–45. http://www.bionet.nsc.ru/meeting/bgrs2004/tom2.pdf (10 July 2007, date last accessed).
- Ciliberto A, Novak B, Tyson JJ. Steady states and oscillations in the p53/Mdm2 Network. Cell Cycle (2005) 4:488–93.[Web of Science][Medline]
- Verdiere N, Denis-Vidal L, Joly-Blanchard G, et al. Identifiability and estimation of pharmacokinetic parameters of ligands of macrophage mannose receptor. Int J Applied Math Comput Sci (2005) 15:517–26.
- Mjolsness E, Yosiphon G. Stochastic process semantics for dynamical grammars. Ann Math Artif Intel (2007) 47:329–95.[CrossRef]
- Athreyea KB, Ney PE. Branching Processes (1972) Springer-Verlag: Dover.
- Dyson F. Existence of a phase-transition in a one-dimensional ising ferromagnet. Comm Math Phys (1969) 12:91–107.[CrossRef]
- Mjolsness E. On cooperative quasi-equilibrium models of transcriptional regulation. J Bioinformat Computat Biol (2007) 5(4).
- Monod J, Wyman J, Changeaux JP. On the nature of allosteric transitions: a plausible model. J Mol Biol (1965) 12:88–118.[Web of Science][Medline]
- Jaeger J, Surkova S, Blagov M, et al. Dynamic control of positional information in the early drosophila blastoderm. Nature (2004) 430:368–71.[CrossRef][Medline]
- Najdi TS, Yang CR, Shapiro BE, et al. Application of a generalized MWC model for the mathematical simulation of metabolic pathways regulated by allosteric enzymes. J Bioinformat Computat Biol (2006) 4:335–55.[CrossRef]
- Shapiro BE, Levchenko A, Meyerowitz EM, et al. Cellerator: extending a computer algebra system to include biochemical arrows for signal transduction simulations. Bioinformatics (2003) 19:677–8.
[Abstract/Free Full Text] - Zinzen RP, Senger K, Levine M, et al. Computational models for neurogenic gene expression in the drosophila embryo. Curr Biol (2006) 16:1–8.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










