Mathematical approaches to differentiation and gene regulation (original) (raw)

1 Introduction

In living organisms, many proteins are transcription factors: they can bind to DNA and regulate the transcription of specific genes, i.e. the synthesis of RNA from coding regions of chromosomal DNA. This regulation of transcription is a very complex mechanism, which can involve up to dozens of genes, and other factors as well. Furthermore, regulation occurs during the full process of gene expression. In all cases, one will say that a gene A activates (resp. inhibits) a gene B when A produces a protein that has a positive (resp. negative) effect on the expression of gene B. If several genes are involved in a given biological system, they form a gene network from which one can draw an interaction graph G. In mathematical terms, G is a finite oriented graph, the edges of which are endowed with a sign: the vertices are the genes, and a positive (resp. negative) edge from j to i means that j activates (resp. inhibits) i. Note that a gene can activate or inhibit itself, i.e. G can have edges that end where they start. Furthermore, depending on the concentrations of the proteins in the system, the effect of j on i can be positive, negative, or absent. In other words, G is a function of the concentrations.

In general, very little is known about the strength of the interactions between genes. One is thus faced with the following difficult problem: which dynamical properties of a gene network can be inferred from the topology of its interaction graph (despite the lack of quantitative information)? Several methods have been used to tackle this question. One of them is numerical simulation, which requires choosing kinetic parameters in a realistic way. Another method is to study the statistical properties of gene networks, by comparing their interaction graphs with random ones. One can also try to decompose a given graph into submodules of biological significance. Finally, some authors have focused their attention on special motifs, i.e. subgraphs with simple topology, for instance, those involving few vertices that are overrepresented in gene networks [1].

In this paper we shall study circuits. A circuit C in an interaction graph G is a sequence e1,…,ek of edges such that the end point of ei, i=1,…,k−1 (resp. ek) is the origin of ei+1 (resp. e1), each vertex of G occurring at most once in C. The sign of a circuit is the product of the signs of its edges. When gene regulation was discovered, it was soon noticed that circuits (or ‘feedback loops’) are often present in gene networks (or at least in mixed networks, with interactions between genes, proteins and metabolites), and that their biological role depends on their sign. For instance, if G=C consists of a single positive circuit, the network can have two possible stable stationary states. For instance, when G looks as follows:

where denotes a negative edge, common sense suggests that either A or B will win the competition and B (resp. A) will be shut off. Now, according to an idea of Delbrück [2], the possibility for a gene network to have several stationary states is one possible mechanism for biological differentiation.

However, when the interaction graph G contains a positive circuit C, there is no reason to expect that C will govern the dynamic behaviour of the underlying network; this depends on the strength of the interactions. Thomas [3] had the idea to turn the statement another way: positive circuits are necessary if not sufficient for multistationarity. He proposed the following:

Thomas' rule: Assume a gene network has several non-degenerate stationary states. Then its interaction graph contains, somewhere in phase space, a positive circuit.

Here, the expression “somewhere in phase space” refers to the fact, mentioned earlier, that the concentrations of proteins have to be given for the graph to be defined.

This rule of Thomas, if true, can be quite useful for geneticists. For instance, if one knows that a given biological system can differentiate, one will be led to search for a positive circuit relating the genes involved. It is therefore worthwhile to decide how general this rule is. One way to check it is the following. First, describe a gene network in a formal mathematical way (a model). Then, within this formalism, make sense of all the terms figuring in Thomas' rule (non-degenerate stationary state, interaction graph, phase space...). Thomas' rule then becomes, in this context, a precise mathematical statement (a conjecture), which one can try to prove (or to contradict) logically.

For instance, Thomas and Kaufman phrased Thomas' rule [3] as a precise mathematical conjecture by using a differential model of gene networks [4]. This conjecture was proved under additional assumptions in [5–7], and in general in [8]. Later, Thomas' rule was checked for Boolean models [9]. Many more models of gene networks have been proposed (see [10] for a thorough survey). In this paper, we shall show that Thomas' rule is true for five different types of models of gene networks: the Boolean, differential, differential with decay, piecewise-linear and multivalued discrete models. Of course, knowing that the rule is true for one type of models does not imply automatically that it is true for another one. Still, our arguments will exhibit interesting connections between several ways of modelling. When the spontaneous decay of all proteins is taken into account, Thomas' rule happens to be more robust, and this allows us to study the piecewise-linear case by approximation. And, in some cases, the piecewise-linear models can in turn be described in discrete terms. We conclude the paper by discussing open problems. Among them is whether Thomas' rule remains valid for other biological networks, or when the stochastic nature of gene regulation is taken into account.

2 Boolean models

Let n⩾1 be an integer and Ω={0,1}n, the set of strings of n letters in the alphabet {0,1}. Consider a map

The pair (Ω,F) is usually called a Boolean network. [According to Kaufmann [11], we can view the data (Ω,F) as a model for the dynamic of a network of n genes. A point x=(xi)∈Ω describes a state of the network: xi=1 (resp. xi=0) when the gene i is active (resp. inactive). The map F describes the evolution of this network: if it is in state x at a given time t, it will be in state F(x) at time t+1.]

To every x in Ω we attach an interaction graph G(x) which is described as follows. Fix j∈{1,…,n} and let y∈Ω be defined by:

Given i∈{1,…,n}, there is an edge from j to i when F(y)i≠F(x)i. This edge is positive if xj=F(x)i and it is negative otherwise. [To illustrate this definition, assume that xj=1, i.e. gene j is active in x and inactive in y. If F(x)i=xj=1 and F(y)i=0, we can say that, by inhibiting j in x, we have inhibited i in F(x). In other words, j is an activator of i in the state x, and we have a positive edge from j to i in the graph G(x).]

A stationary state of the network is a fixed point of F, i.e. a point x∈Ω such that F(x)=x. Part (2) of the following theorem says that Thomas' rule is true for Boolean models.

Theorem 1

(1) [10]Assume that none of the graphsG(x),x∈Ω, contains a circuit. Then F has a unique fixed point.
(2) [12]Assume that F has several fixed points. Then there existsx∈Ωsuch thatG(x)contains a positive circuit.

3 Differential models

Let n⩾1 be an integer and Ω=Rn the standard real vector space of dimension n. Consider a differential map:

and the system of differential equations: where x:R→Rn is any differential path in Ω. [According to Thomas and Kaufman [4], this is a model for a network of n genes. For every i=1,…,n, the number xi(t) is the concentration of the protein i at time t. Eq. (1) says that the variation of xi(t) is a function of all the concentrations xj(t),j=1,…,n.]

Given x∈Ω, we define an interaction graph G(x) as follows. Its set of vertices is {1,…,n} and there is a positive (resp. negative) edge from j to i when the partial derivative ∂Fi∂xj(x) is positive (resp. negative). [To illustrate this, assume ∂Fi∂xj(x)>0. If we increase the concentration of protein j in state x, the number Fi(x) will increase, and the production of i will accelerate. In other words, j is an activation of i in state x.]

From (1), we see that a stationary state of the network is a zero of F, i.e. x ∈Ω such that F(x)=0. This zero is called non-degenerate when the determinant det(∂Fi∂xj(x)) of the Jacobian matrix at x is different from zero. Thomas' rule is true for differentiable models.

Theorem 2 [8]

Assume that F has at least two non-degenerate zeroes. Then there existsx∈Ωsuch thatG(x)contains a positive circuit.

For previous results, see [5–7].

4 Differentiable models with decay

Since concentrations cannot be negative, one would like to get a version of Theorem 2 where F(x) is only defined for those x=(xi) such that xi⩾0, i=1,…,n. But it turns out that it is not true as stated with this restriction [8 (§3.5)]. However, a more realistic modelling of gene networks consists in taking into account that the concentration of every protein is submitted to a spontaneous decay, due to degradation and to the growth of cells. We are thus led to the following model.

For every i=1,…,n let Ωi⊂R be a real interval (i.e. Ωi=[ai,bi],]ai,bi],[ai,bi[ or ]ai,bi[, with −∞⩽ai and bi⩽+∞). On the product Ω=∏iΩi⊂Rn, consider a differentiable map

and the system of differential equations

dxidt=Fi(x)−γixi,i=1,…,n	(2)

where γ1>0,…,γn>0 are fixed constants [the degradation rates].

For every x∈Ω, let G(x) be the interaction graph defined from the signs of the partial derivatives ∂Fi∂xj(x) as in §2 above.

Theorem 3

Assume that there exists two pointsx≠yin Ω such that

| |Fi(x)−γixi−Fi(y)+γiyi|<γi|xi−yi| | | ---------------------------------- |

for all indices i such thatxi≠yi. Then there exists a point z in Ω such thatG(z)contains a positive circuit.

Remarks

(1) A stationary state is a point x∈Ω such that, for all i=1,…,n,
If x≠y are two stationary states of (x) we have, for all i=1,…,n
therefore the hypotheses of Theorem 3 are satisfied. In other words, Thomas' rule is again true, for arbitrary Ω, when we take the spontaneous decay of proteins into account. It is also more ‘robust’, since it remains valid for two states x≠y that are almost stationary.
(2) In the conclusion of Theorem 3, we can assume that zi = xi whenever xi=yi. To see that, let I be the set of indices i such that xi=yi, and apply Theorem 3 to the restriction of the functions Fi, i∉I, to the linear subspace of those z ∈Rn such that zi=xi when i∈I. A similar remark can be made in Theorems 1, 2 and 4.

5 Piecewise-linear models

Let Ω=∏iΩi be as in §4. Another model for gene networks [13–15] is given by the system of equations

where γi>0 and each function Fi is a polynomial combination of step functions.

More precisely, for every real number θ, define the step function s(x,θ) from R to subsets of R by:

s(x,θ)={1ifx>θ]0,1[ifx=θ0ifx<θ

For every j=1,…,n, choose finitely many distinct real thresholds θjk in the interior of Ωj, k=1,…,mj. For each i=1,…,n, fix a real polynomial Pi(Tjk) in ∑j=1nmj variables and, for every x in Ω, let:

By definition, Fi(x) is a subset of R, reduced to a single point if xj≠θjk for all j and k.

Now we consider the system of piecewise linear differential inclusions:

dxidt∈Fi(x)−γixi,i=1,…,n	(4)

for some fixed real constants γ1>0,…,γn>0.

A stationary state of (4) is a point x in Ω such that 0 lies in (Fi(x)−γixi). For every x in Ω, we define an interaction graph G(x) as follows. Its set of vertices is {1,…,n} and there is a positive (resp. negative) edge from j to i when xj=θjk is a threshold and the value of the partial derivative ∂Pi/∂Tjk is positive (resp. negative) at the some point in the set (s(xa,θab)). Note that in the case considered in [16] the interaction graph of §1 of op. cit. is the superposition of all the graphs G(x),x∈Ω.

Theorem 4

Assume that(4)has several stationary states. Then there existsx∈Ωsuch thatG(x)contains a positive circuit.

6 Discrete models

In [16] (Theorem 1) and [17] (Theorem 2), it is shown that the stationary states of some piecewise linear models can be described by fixed points of a map Ω→Ω, where Ω is a product of n finite sets Ω1,…,Ωn, with Ωi of order ni+1, where ni is the number of threshold values of the variable xi. Theorem 4 gives therefore a proof that Thomas' rule holds for the fixed points of these discrete models.

For example, assume that Ωi={0,1} for all i=1,…,n and let F=(Fi):Ω→Ω be any map as in Theorem 1 above. Consider the system of piecewise-linear differential equations:

dxidt=F˜i(x)−xi,i=1,…,n	(5)

where F˜i is defined below and x lies in the open set Ω˜⊂Rn consisting of those (xi) such that xi ≠0 for all i. Let: be the map defined by We let

F˜i(x)={1ifFi(d(x))=1−1ifFi(d(x))=0

and F˜=(F˜i):Ω˜→Rn. Notice that, for every x∈Ω˜: Assume that the system (5) satisfies the hypotheses of [16] §1, i.e., for every i, the function F˜ is written as a positive combination of sums and products of the functions sj(x), j=1,…,n, where sj(x) is equal either to s(xj,0) or to 1−s(xj,0).

Then one can check that, for every x∈Ω˜, if y lies in the closure in Rn of the component of Ω˜ containing x, the interaction graph G(y) defined from F˜ as in §4 is contained in the interaction graph G(d(x)) of §1. Furthermore, according to Theorem 1 [16], if d(x) is a fixed point of F, the point x∈Ω˜ is an ‘asymptotically stable’ steady state of (5). From Theorem 4, we thus get a new (and quite indirect!) proof of Theorem 1(2) under the above hypothesis. It would remain to decide which discrete models, and in particular which Boolean models, can be obtained from the piecewise-linear models considered in [16,17].

7 Concluding remarks

The results above give support to the validity of Thomas' rule. They do not ‘prove’ any kind of ‘biological law’, since they depend on the quality of the models we used for describing gene networks, and these are, clearly, gross simplifications of the biological reality. For instance, we did not consider the role of chromatin conformation in the regulation of transcription. Neither did we include any discussion of the alternate splicing phenomenon. Therefore, it might be worth checking this rule in new and more refined setups. Let us mention a few possible extensions of the results presented here.

Gene networks do not appear in isolation. They are usually coupled with other biological networks, like the metabolic networks, and those involving interactions between proteins. People have tried to describe these mixed networks in a single picture (see for example [18] or [19]), but this is not so easy, since edges in these enlarged graphs do not have the same meaning as in the case of gene regulation. On the other hand, one should notice that an enlarged mixed network may well contain a positive circuit that is not visible in any smaller pure one, as soon as its vertices are of different natures.

Thomas' rule is of course valid for any system that can be modelled by one of the five methods described above. However, one has to be careful that the interaction graphs defined in these models by means of a mathematical recipe need not have an obvious intuitive meaning. In particular, they might not be simply related to the usual way of representing these systems (as noted in [20], Remark 5)), and a positive circuit needs not be visible in the traditional graphic representation. For instance, let us consider a chemical system containing, among others, two compounds X and Y, and a reversible reaction:

The concentrations [X] and [Y] obey the usual laws of chemical reactions, which look like: where a and b are positive (products of a reaction rate with some concentrations). These equations are a special case of (1). If we compute the partial derivatives as in Section 2, we see that the corresponding interaction graph contains the motif which is a positive circuit. Thus, any reversible reaction gives rise to a positive circuit (although it does not imply any kind of active chemical regulation). This type of situation has been considered in [21,22], where the authors gave examples of what they called “differentiation with no positive feedbacks”. These are not counterexamples to Thomas' rule, since the corresponding interaction graphs do contain a positive circuit.

Another way to pursue our discussion would be to take into account the location of the proteins. Since the famous work of Turing in the 1950s, many models of development have considered diffusion phenomena. It might be worth noting, though, that the scenario of spatial differentiation proposed by Turing and its followers requires that at least one of the chemicals be a self-activator. In that sense, there is still a positive circuit involved, and if we delete the diffusion terms in Turing's equations we get back to the situation described in Theorem 1. I do not know if diffusion in space can lead to differentiation in the absence of any positive circuit in the appropriate interaction graph.

A third direction is the following. All the models presented above are deterministic. Now, many recent works on gene expression insist on the importance of stochastic effects. These are manifest in the variation of expression levels from one cell to another. This forces us to view the binding of a protein to DNA, and the whole gene regulation, as a stochastic event. In [23], Gillespie proposed to represent the elementary chemical reactions in a gene network as a discrete jump Markov process, and, when there are enough copies of each protein, to approximate the chemical master equation by Fokker–Planck stochastic differential equations. It would be very interesting to decide if a variant of Thomas' rule still remains true in such a context. It is known that introducing stochasticity in a deterministic model allows for occasional switching between different stationary states (and such switches have been observed experimentally, see [24] for a survey). But can Gillespie's model lead to completely new scenarios of differentiation (violating Thomas' rule)?

Finally, one can also seek upper bounds for the number of possible stationary states in a gene network [11]. Such a bound is obtained in [25] when gene networks are modelled by means of IN and OR networks. Similar upper bounds cannot be valid for an equation like (1), since, obviously, the number of stationary states depends on the complexity of the function F (e.g., its degree when F is algebraic). For a general result on the complexity of gene networks, we refer to [26].

Acknowledgements

The author wishes to thank H. De Jong, J. Demongeot, M. Kaufman, N. Le Novère, O. Radulescu, E. Rémy, P. Ruet, D. Thieffry, R. Thomas, and A. Wagner for helpful discussions and comments.

Appendix A Proof of Theorem 3

We proceed by contradiction and assume that none of the graphs G(z),z∈Ω, contains a positive circuit. Then, according to [8] (Lemma 2i), all the principal minors of the Jacobian matrix of (−Fi) at every point z are nonnegative. Fix a constant ɛ>0. By [8] (4), we conclude that all the principal minors of the Jacobian matrix of (−Fi(x)+ɛxi) are positive in Ω. By the univalence theorem of Gale–Nikaido ([27,28 (p. 20a)]), this implies that the map (−Fi(x)+ɛxi) is a P-function on Ω, i.e., when x≠y are two points in Ω, there exists k∈{1,…,n} such that:

(xk−yk)(−Fk(x)+ɛxk+Fk(y)−ɛyk)>0

Since this is true for all ɛ>0, there must exist k such that xk≠yk and: This means that xk ≠yk and

(xk−yk)(Fk(x)−γkxk−Fk(y)+γkyk)⩾γk(xk−yk)2

hence

| |Fk(x)−γkxk−Fk(y)+γkyk|⩾γk|xk−yk| | | ---------------------------------- |

This contradicts our hypothesis and proves Theorem 3.

Appendix B Proof of Theorem 4

Following a suggestion of J.-L. Giavitto, we use an argument of approximation. For every θ∈ R and every integer m ⩾0, choose a differentiable function sm(x,θ) on R such that sm(x,θ)=s(x,θ) when |x−θ|⩾1m, sm(x,θ) is strictly increasing when |x −θ|<1m, and, given x ∈ **R** and u ∈s(x,θ), for every ɛ>0, there exists m0 such that, if m⩾m0, there is ξ∈R with:

and Assume x∈Ω. It follows from (3) that, for every u∈F(x) and every ɛ>0, there exists ξ∈Ω such that, for all i=1,…,n: and where the functions Fi(⋅,m) are defined by the formula:

Assume now that the system (4) has two non-degenerate stationary states x≠y in Ω. The assertion 0∈(Fi(x)−γixi) means that ui∈Fi(x), where ui=γixi. Similarly, vi∈Fi(y) with vi=γiyi. For every ɛ>0, and m big enough, choose ξ and η in Ω such that, for all i=1,…,n:

| |Fi(ξ,m)−ui|<ɛ,and|Fi(η,m)−vi|<ɛ | | --------------------------------- |

From these inequalities we conclude that:

| |Fi(ξ,m)−γiξi−Fi(η,m)+γiηi|<|ui−γiξi−vi+γiηi|+2ɛ<2γiɛ+2ɛ | | --------------------------------------------------------- |

On the other hand, since γi>0, when xi≠yi we can choose ɛ small enough so that

It follows that (Fi(⋅,m)) satisfies the assumption of Theorem 3, hence its interaction graph contains a positive circuit somewhere in phase space. It remains to notice that, for every z ∈Ω, when m is big enough, the interaction graph of (Fi(.,m)) at z is the same as the one of F at some point in Ω.