• No results found

Statistical Physics and population genetics

N/A
N/A
Protected

Academic year: 2021

Share "Statistical Physics and population genetics"

Copied!
20
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Statistical Physics and population genetics

Heleen Niele 10217142

07-07-2014

Verslag van Bachelorproject Natuur- en Sterrenkunde, omvang 15 EC, uitgevoerd tussen 31-03-2014 en 27-06-2014

ITFA, UvA

Begeleider: Jean-Sébastien Caux Tweede beoordeelaar: Jacopo De Nardis

Abstract

Population genetics resulted from the unication of evolutionary biol-ogy and Mendelian genetics. This could be done by considering the change of gene frequencies as a statistical physics system and applying some of the standard methods of this discipline. The founders were R. A. Fisher, S. Wright and J. Haldane. They used a mathematical approach to de-scribe the change in gene frequencies. The outcome depended on whether mutations, migration and selection were present. The randomness of sur-vival and reproduction, demographics and mating system always have inuence on the gene frequency distribution. Due tot the random sam-pling of gametes a population will slowly become more homozygous. Rare mutations subject to natural selection on the other hand will enlarge the genetic variation. According to Fisher this eect was larger in a larger population. Wright however thought that in small subpopulations the genetic variation would be bigger.

M. Kimura was the rst to derive a general equation for the change in gene frequencies. By treating this as a continuous stochastic Markovian process he derived the approximated diusion equation for the change in gene frequencies. When the initial condition, the mean and the variance

(2)

of the change in gene ratio is known, this partial dierential equation can theoretically be solved. Such a diusion equation can be turned into a path integral, summing all possible paths in time for the gene ratios. De Vladar and Barton did this for the specic case of natural selection with random sampling of gametes.

This thesis is written to get a better grasp on the origins of population genetics, so that further research can be executed in this eld. The point of view is a physical one, therefore mathematical approaches are studied. Specically Kimura's. for it is the rst general mathematical approach.

Contents

1 Populair wetenschappelijke samenvatting 2

2 Glossary 3

3 Introduction 3

4 Fisher, Wright and Haldane 5

4.1 Gene ratios . . . 6

4.2 Random sampling of gametes . . . 7

4.3 Natural selection without mutations . . . 7

4.4 Mutations . . . 7

4.5 Migration . . . 8

5 Kimura's diusion equation 8 5.1 Treatment . . . 8

5.2 Derivation of the diusion equation . . . 9

5.3 The net probability ow across x . . . 10

5.4 The backward diusion equation . . . 12

5.5 Random Drift . . . 12

5.6 Random drift with migration, selection and mutations . . . 14

5.7 Summary . . . 16

5.8 Further research . . . 17

6 Discussion 17

1 Populair wetenschappelijke samenvatting

Charles Darwin was de eerste die kon verklaren waarom soorten evolueren. Ver-schillen tussen individuen leveren verVer-schillen in overlevingskans tussen hen op. De voordelige verschillen zullen zich zo in de populatie verspreiden, de nadelige verdwijnen. Waardoor deze verschillen veroorzaakt werden kon Darwin niet zeggen. Dit werd later ontdekt door Mendel, die ontdekte dat erfelijke eigen-schappen worden doorgegeven via genen, waarvan per gen verschillende vormen oftewel allelen bestaan. Nu dit bekend was kon er kwantitatief onderzoek gedaan worden naar evolutie. Deze tak van evolutie wordt populatie genetica genoemd.

(3)

Drie wetenschappers, Ronald Fisher, Sewall Wright en John Haldane, waren de grondleggers van de populatie genetica. Zij gebruikten hun kennis van statis-tische fysica om om te kunnen gaan met de grote hoeveelheid genen en indi-viduen. In statistische fysica heeft men namelijk te maken met veel deeltjes die veel kanten op kunnen bewegen. Door te kijken naar het gedrag van een verza-meling deeltjes in plaats van naar ieder deeltje apart worden de berekeningen aan bijvoorbeeld snelheid veel minder omvangrijk. In het geval van populatie genetica wordt er gekeken naar de genfrequentie van een gen in de hele pop-ulatie. Vervolgens kan de invloed van bijvoorbeeld willekeurige voorplanting en natuurlijke selectie worden onderzocht. Algemene eecten zijn hieruit af te leiden, zoals genetische drift.

Er kon nu gerekend worden aan verschillende evolutionaire processen. Een algemene beschrijving bestond er echter nog niet. Motoo Kimura bracht hier verandering in. Hij stelde een formule op waarmee in principe ieder evolutionair proces beschreven kon worden. Dit deed hij door gebruik te maken van de frequentie distributie en de waarschijnlijkheids dichtheid. Hij kwam zo op een formule uit die gelijk is aan de Fokker-Planck vergelijking, deze beschrijft een diusieproces en ziet er als volgt uit:

∂φ(p, x; t) ∂t = 1 2 ∂2 ∂x2{Vδxφ(x; t)} − ∂ ∂x{Mδxφ(x; t)}. (1) In dit verslag wordt ingegaan op het werk van Fisher, Wright, Haldane en vooral Kimura en de herleiding van zijn diusievergelijking.

2 Glossary

All denitions in the glossary originate from Drossel's article [5]. allele one of multiple alternative forms of a gene

tness measure of degree of adaptivity to the environment; the number of ospring will be able to reproduce

gamete a cell that fuses with another cell during sexual reproduction heterozygote an individual that has two dierent alleles for a specic gene homozygote an individual that has two of the same alleles for a specic gene

3 Introduction

Physics has played a big role in several biological disciplines. X-ray diraction for example revealed the double helix structure of DNA and thereby generated many applications in molecular biology [8]. Another biological discipline to which physics has contributed is evolutionary biology. By comparing biological concepts with statistical physics concepts, techniques already used in the latter

(4)

could be applied to the former. Both are dealing with populations, rather than specic molecules or genotypes. This results in a statistical description of the macrostates, summarizing all the possible congurations in the microstates. No reference to the microstate has to be made anymore when predicting certain other properties. This reduces many problems in which there are too many degrees of freedom to make any feasible calculation with, to solvable problems [4].

In this thesis the aim is to get a better understanding of the applications from statistical physics to evolutionary biology. Specically the contributions that Ronald Fisher, Sewall Wright, John Haldane and Motoo Kimura have made to modeling the change in gene frequencies will be taken into account. The reason for this, is that these rst three scientists are considered the most inuential scientists in the founding of population genetics [1]. All of them stud-ied individual problems, but couldn't succeed in constructing a model for the entire process of the change of gene frequencies. Kimura did accomplish this, by means of formulating a diusion equation that approximated the change in gene frequencies [13]. De Vladar and Barton used a form of this diusion equation to construct a path ensemble [4]. They used the path integral formulation of quantum mechanics, where the notion of a single, unique trajectory is replaced with a sum or functional integral over all possible trajectories, computing an amplitude [3].

Before Fisher, Wright and Haldane natural selection and Mendelian genetics were separated disciplines [1]. It is generally believed that Darwin is the founder of evolutionary biology. He did not only notice changes in species through gener-ations, but also proposed a mechanism for these changes: natural selection. He presumed variations within species, some of which were advantageous, some dis-advantageous. By natural selection the organisms with advantageous variations were chosen over others to reproduce with and were more t to survive in their habitat. When this variation-selection-reproduction algorithm is successively applied, many dierent, complex species can evolve from one or a few simple forms of life. However, what these variations were, and how they originated Darwin could not tell [5]. Mendel has discovered that traits inherited from par-ents do not blend in ospring, but that there exist heredity units, alleles, from which there are dierent forms per gene. Each ospring will get half the alleles of his father and half of his mother. Depending on the combination of alleles the ospring will get a certain trait, which species its genotype. When both the dominant and recessive allele are present, the only one expressed in the geno-type will be the dominant one [5]. In 1930 Fisher published his genetic theory of natural selection unifying evolutionary biology and Mendelian genetics. He stated that natural selection was the selection of alleles, where some might be more t than others [6]. This was the beginning of population genetics, and one of the discoveries leading to the evolutionary synthesis, where several biological disciplines were unied in a theory for evolution [1].

The discovery of DNA in the 1950's has shed light on the origin of varia-tion and on the origin of alleles. Variavaria-tions were caused by mutavaria-tions in the DNA sequence. This sequence is determined by four nucleobases adenine (A),

(5)

thymine(T), guanine(G) and cytosine(C). A pairs up with T and C with G in a double helix structure. The DNA can be copied, resulting in RNA, a single string of nucleobases. The RNA can be read and used to produce molecules the cell needs. A section of these nucleobases that codes for a certain molecule like a protein is called a gene. During the replication of DNA many things can go wrong or the DNA can be deliberately altered, both leading to mutations. By accident, the wrong nucleobase can be placed in the copy of the DNA or one or more nucleobases can get deleted or duplicated. Depending on the locus and the amount of nucleobases this can change the working of the specic gene. Certain radiation and chemicals can also cause these random mutations. How-ever, not all mutations are random. In some places in the DNA mutations occur more often than others, and the mutation rate in general can dier depending on change and selection. Furthermore, the gene expression can be regulated by DNA segments that can move from one locus to another and turn genes on or o, or copy genes from one place to another. These mutations are overall benecial and probably contribute a lot to the evolutionary process [5].

Fisher, Haldane and Wright had to do without the knowledge of DNA to construct their models and carry out their experiments. Yet some of their ndings are still not outdated, such as their nding of genetic drift by treating the change of allele frequencies as a chance event evolving in time [13]. In the following sections each of their ndings will be discussed in anticipation of the main subject of this thesis: Kimura's diusion equation. The application of mathematics and physics to population genetics in the paper he has written in 1964 is considered of such quality by the author that it will serve as a leading thread in the exploration of the applications of statistical physics to population genetics. Furthermore, this will open the way to constructing path ensembles from dierent forms of the diusion equation by using the method of path integrals.

4 Fisher, Wright and Haldane

In the early 1930s Fisher, Wright and Haldane all wrote books and papers unifying evolutionary theory and Mendelian genetics by means of mathematics. They were the rst to investigate the inuence natural selection of random mutations could have on the evolutionary process. They also studied the eects of natural selection and mutations alone, the inuence of randomness in death and reproduction, migration, isolation, inbreeding and many other contributing factors. All of them had their own thoughts, methods and point of interests [16].

Fisher indulged himself in the question whether natural selection of random mutations could and did explain adaptation. Each allele had its own occurrence in the population, which could be inuenced by the randomness of survival and reproduction and by natural selection. He formed a mathematical theory for de-termining the rate of improvement of some species in relation to its environment based on its present condition. He determined the mean and the variance of a

(6)

gene frequency for randomness of survival, mutations and selection, and com-posed dierential equations for these three cases. The one for random survival looked as follows: ∂y ∂T = 1 4n{ ∂ ∂θ(ycotθ) + ∂2y ∂θ2}, (2) where ∂y

∂T is the ux past every value of θ in one generation. A change

of coordinates from gene frequency p lead to θ. Solving this equation lead to a time of relaxation of 2n generations and the variance being halved after 4n generations. This time is large for most populations, therefore Fisher believed random survival had little inuence on evolutionary systems [7]. He also believed that the larger a population was, the smaller the selective advantage of an allele had to be for it to get xed. The nature of genes and the possibility for tness to depend on combinations of genes did not bother Fisher [7, 16].

Wright also wondered about which factors contributed to adaptation. In contrast to Fisher, Wright was aware of several complexities that arise in the translation from gene to population. For example, when considering the inu-ence of the gene frequency of one gene on the population, the inuinu-ence dierent genes can have on each other is ignored. Furthermore, gene frequency dis-tributions could be inuenced by dominance of alleles, inbreeding and mating systems. He thought that the evolutionary process was quickened when a popu-lation was divided in isolated subpopupopu-lations. More dierent gene combinations could then arise and get xed [16].

Haldane investigated all dierent sorts of eects natural selection could have on a population, depending on its composition, dominance and the process of reproduction. In A Mathematical Theory of Natural and Articial Selection he addresses many specic cases by a systematical mathematical approach [10]. He showed that the number of generations needed for a certain change is inversely proportional to the selection intensity. All of his calculations were deterministic, but can still be very useful for large populations [14]. Furthermore, he wondered why evolution so often seemed to be non-adaptive. He thought it probably had to do with conicts between the individual's and the population's gain. He is also known for his primordial soup theory, stating that life arose from chemical compounds in the oceans [16].

Fisher's, Wright's and Haldane's works are too extensive for all of them to be treated in this thesis. Only their assumptions and calculations considering factors that determine the mean and variance of the gene frequency distribution will be studied to some extent.

4.1 Gene ratios

All of the calculations are based on changing gene ratios. The ratio between gene A and a is p:q. If these are the only two alleles present, this ratio is equal to p:1-p. Alleles always come in pairs in individuals, so then the ratio AA:Aa:aa is equal to p2: 2pq : q2. To keep it as simple as possible, a diploid population of

(7)

that for every generation there are 2N gametes. Furthermore, the generations do not overlap and the amount of males and females is equal.

4.2 Random sampling of gametes

Fisher states that the variance was pq

2N in the case of no mutation, migration or

selection [6]. This is the binomial variance. The probability that the number of alleles A x is X is represented in 3.

P (x = X) =2N X



pXq2N −X. (3)

For x the expected value is 2Np and the variance 2Npq. In the next gen-eration A has the frequency x

2N, so the expected allele frequency is 2N p

2N = p.

The expected value of p has not changed, and therefore the mean stays the same. The variance of the allele frequency is 2N pq

(2N )2 =

pq

2N as stated before. On

the average, no change is expected in gene frequency but in a nite population random drift will be observed due to the non-zero variance [17].

4.3 Natural selection without mutations

Selection pressures inuence the gene frequency distribution. For simplicity assume the selection coecient s to be constant through time. s is the selective advantage of A over a, and assume there are only two alleles for the concerning gene. A and a reproduce as (1-s):1. The gene ratio now changes from q:1-q to q : (1 − s)(1 − q), so when including that the chance at A or a is 1, the gene array and the change of gene frequency for A become as follows:

(1 − s)(1 − q)a + qA 1 − s(1 − q) ; (4) ∆q = sq(1 − q) 1 − s(1 − q). (5) If s is suciently small: ∆q ' sq(1 − q). (6)

This is according to the assumptions in the Wright-Fisher model. The equilib-rium distribution can only be obtained when q=1, so when only the selective advantageous allele remains [20].

4.4 Mutations

In the simplest case only mutations from A to a and a to A take place. The rate for the rst transition is µ, for the second it is ν. Then the change in gene frequency q is as given below. When no other pressures are present, ∆q = 0 and q obtains a xed value depending on µ and ν [20].

(8)

q = ν

µ + ν. (8)

4.5 Migration

Migration may also inuence the gene distribution. Suppose that if the average gene frequency for a particular gene in the entire population is qm and that

the subgroups exchange m part of its individuals with a random sample of the whole population. For a subgroup then the change in gene frequency depends on its own gene frequency and that of the population as a whole [20].

∆q = m(qm− q). (9)

5 Kimura's diusion equation

Kimura was the rst to present a model for the entire process for the change of allele frequencies. Starting with an initial frequency he could solve several cases of evolution. He used a diusion equation approximation to obtain these solu-tions [13]. A review of how he exactly gained this equation and these solusolu-tions will now follow. Extensions concerning physical equation and such have been made where thought necessary.

5.1 Treatment

Gene frequencies can be approximated as continuous variables, for natural pop-ulations should be large to play a signicant role in the evolutionary process. Furthermore, based on fossil ndings, evolution has been shown to be a slow process most of the time. Therefore the assumption is that gene frequencies change slowly as well. For these two reasons, evolution is depicted as a con-tinuous stochastic process. It is also treated as a Markovian process, for the gene frequencies at time t are only to depend on the gene frequencies at time t0< t, but not on events that led to the frequency distribution at time t0 [13].

When all of the previous properties hold, the Kolmogorov equations for a diu-sion process are very useful. There are respectively a forward and a backward equation which look as follows in one dimension [9]:

∂ ∂tf (s, x, t) = − ∂ ∂x[A(x, t)f (s, x, t)] + ∂2 ∂x2[B(x, t)f (s, x, t)]; (10) ∂ ∂sf (s, x, t) = −A(s, x) ∂ ∂xf (s, x, t) − B(s, x) ∂2 ∂x2f (s, x, t). (11)

s is the value of x at time t=0. f(s,x;t) is the probability density of the velocity of a particle, and A(x;t) and B(x;t) are respectively the rst and second moment of δx for δt → 0. The forward equation is also called the Fokker-Planck equation by physicists. It is a partial dierential equation, describing the time

(9)

evolution of the probability density function, depending on drag (A(x;t)) and random forces (B(x;t)). The initial condition is known, and the probability density funtion at a later time is to be calculated by integrating forward in time. The backward equation on the other hand, is used for calculating the probability of ending in a certain state s at time t for every possible state x at t0< t[9].

5.2 Derivation of the diusion equation

Only diploid populations with a xed number of N individuals are considered. From this follows that for each locus there are 2N genes. A locus is the specic location of a gene on a chromosome. Now look at the cases where there are only 2 alleles: A1 with frequency x and A2 with frequency (1-x). Dene the function φ(p, x; t) to be the transition probability that the gene frequency is x at time t given that the initial frequency at t=0 is p. When p is xed, φ(p, x; t) is the frequency distribution of the gene frequencies. The function f = φ(p, x; t) dx = φ(p, x; t)2N1 if dx = 2N1 then approximates the frequency of the class with gene frequency x at time t for (0<x<1). When x=0 or x=1 the case has to be handled dierently [13].

Dene g(δx, x; δt, t) to be the probability density function of the change in gene frequency from x to x + δx during (t, t + δt). Because the process studied is continuous stochastic, the following equation holds for δt → 0:



|δx|<

g(δx, x; δt, t)d(δx) = o(δt). (12) Then when integrating over all possible δx:

φ(p, x; t + δt) = 

φ(p, x − δx; t)g(δx, x − δx; δt, t)d(δx). (13) So the chance that the gene frequency is x at time t + δt is the sum of all the probabilities that the gene frequency is x − δx at time t while the gene frequency increases with δx during (t, t + δt). δx may be any value as long as (o < x−δx < 1). However, because the process is continuous stochastic, only the cases with |δx| <  are signicant.This equation is a form of the Kolmogorov-Chapman identity. This identity relates the joint probability distribution of dierent sets of coordinates to a stochastic process.

Now expand the term in the integral in powers of dx and obtain: φ(p, x − δx; t)g(δx, x; δt, t) = φg − δx ∂ ∂x(φg) + (δx)2 2! ∂2 ∂x2(φg). (14)

Here, φ = φ(p, x; t) and g = g(δx, x; δt, t). So combining 13 and 14 gives: φ(p, x; t + δt) = φ  gd(δx) − ∂ ∂x{φ  (δx)g d(δx)}+ 1 2 ∂2 ∂x2{φ  (δx)2g d(δx)} − ... . (15)

(10)

By expanding the term in the integral, φ no longer depends on x − δx, but on x only. Therefore φ can be put in front of the integral, for the integration is over δx. Furthermore,g d(δx) = 1because the probability that an allele has a frequency between 0 and 1 is 1. Now derive the derivative of φ to δt by moving the rst term on the right of equation 15 to the left and dividing by δt:

φ(p, x; t + δt) − φ(p, x; t) δt = − ∂ ∂x{φ 1 δt  (δx)g d(δx)}+ 1 2 ∂2 ∂x2{φ 1 δt  (δx)2g d(δx)} − ... . (16) Now dene the limits where δt → 0 so that the partial derivative from φ to t can be dened. Assume that only the limits for the rst and second moment of δx are non-zero. lim δt→0 1 δt  (δx)g d(δx) = M (x, t); (17) lim δt→0 1 δt  (δx)2g d(δx) = V (x, t); (18) lim δt→0 1 δt  (δx)ng d(δx) = 0 f or n = 3. (19) M(x,t) and V(x,t) are respectively the rst and second moment of dx dur-ing the innitesimal time interval (t, t+dt). M(x,t) enhances the systematic pressures such as mutation and migration rates. V(x,t) enhances the random uctuations caused by the random sampling of gametes and systematic evo-lutionary pressures. The limit δt → 0 can only be extrapolated, because the smallest time interval is that of one generation. Therefore, M(x, t) → Mδxand

V (x, t) → Vδxso that they are respectively the mean and the variance of change

in gene frequency per generation. The nal diusion equation that approximates the change in gene frequencies now is 20. This equation is the Fokker-Planck or Kolmogorov forward equation, derived for the change in gene frequencies [13].

∂φ(p, x; t) ∂t = 1 2 ∂2 ∂x2{Vδxφ(x; t)} − ∂ ∂x{Mδxφ(x; t)}. (20)

5.3 The net probability ow across x

Kimura [13] also derives a function that describes the rate of net probability ow across x. To obtain this he subtracts the ow in the negative direction from the ow in the positive direction. During the time interval δt the probability ow in the positive direction over x due to the class with gene frequency ξ is:

φ(p, ξ; t) dξ 

δξ>x−ξ

g(δξ, ξ; δt, t) d(δξ) f or ξ < x. (21) To integrate over x the interval over which is integrated should be larger than the dierence between x and x. Because the positive probability ow is

(11)

considered, x should be smaller than x. So then the total positive probability ow during δt is as follows:

P+(x)δt =  ξ<x φ(p, ξ; t) dξ  δξ>x−ξ g(δξ, ξ; δt, t) d(δξ) =  δξ>0 d(δξ)  x x−δξ φ(p, ξ; t)g(δξ, ξ; δt, t) dξ. (22) This reformulation is allowed, for still only the cases where ξ < x and the probability ows over x are considered. The rst is enclosed in δξ > 0 in the rst integral and the second in integrating the second integral from x − δξ to x. Now approximate P+(x)δtby substituting ξ = x + η and expanding the term

in the second integral in terms of η:

φ(p, ξ; t)g (δξ, ξ; δt, t) = φg + η∂ (φg)

∂x + ... . (23)

Where the abbreviations φ = φ(p, x; t) and g = g(δξ, x; δt, t) are used. The integral then becomes 24.

P+(x)δt =  δξ>0 d(δξ)  0 −δξ {(φg) + η∂(φg) ∂x + ...} dη =  δξ>0 {(δξ)φg −(δξ) 2 2! ∂(φg) ∂x + ...} d(δξ). (24) Similary, the approximation for P−(x)δtcan be derived, now only with the

requirements that δξ < 0 and integrating from x to x − δx instead of the other way around. P−(x)δt =  δξ<0 {(−δξ)φg +(δξ) 2 2! ∂(φg) ∂x + ...} d(δξ). (25) So then the net probability ow is:

P+(x)δt − P−(x)δt =  (δξ)φg d(δξ) −1 2  (δξ)2∂(φg) ∂x d(δξ) + ... . (26) The limit δt → 0 again leads to M(x,t) and V(x,t) in the rst two terms:

P (x, t) = M (x, t)φ(p, x; t) − ∂

∂x{V (x, t)φ(p, x; t)}. (27) From equation 20 and 27 follows that ∂φ(x;t)

∂t = −

∂P (x;t)

∂x . By determining

P(x;t) it is possible to calculate rates of change in the terminal classes x=0 and x=1. This was not possible with the diusion equation itself, for it was only valid in the interval (0 < x < 1).The probability ow in the cases x=0 and x=1 can be calculated, if these are considered to be absorbing boundaries. So once the gene ratio is x=0 or x=1, it will keep this ratio till innity, if we consider the case where no mutations occur. Naturally,

∂P (x; t) = ∂φ(x; t) ∂t δx −→ P (x; t) = ∂f (x; t) ∂t if δx = 1 2N. (28)

(12)

5.4 The backward diusion equation

Kimura [13] also derives the diusion equation for n variables and the probability ow in the case of two independent variables. Furthermore, he derives the Kolmogorov backwards equation, by xing x instead of p. The diusion equation now looks as follows in 29. With xing x at 1 the probability that a gene with initial frequency p gets xed after time dt can be calculated.

∂φ(p, x; t) δt = 1 2 ∂2φ(p, x; t) ∂p2 Vδp+ ∂φ(x; t) ∂p Mδp. (29)

5.5 Random Drift

Random drift is caused by the random sampling of gametes. Mutation, migra-tion and selecmigra-tion take no part in this. Therefore M=0 for the mean stays the same, and x(1−x)

2N for V now represents the binomial variance. The diusion

equation in this specic case then is: ∂φ(p, x; t)

∂t =

1

4N{x(1 − x)φ(x; t)}.

Because the initial frequency is known to be p, φ(p, x; 0) = δ(x − p). The diusion equation is this case can be solven by separation of variables. So try the solution φ = X(x)T (t), this leads to:

1 T ∂T ∂t = 1 4N X ∂2 ∂x2{x(1 − x)X} = −λ. (30)

−λ is a constant, for the term in 30 on the left only depends on t and the term on the right only on x. For T, the solution is straightforward: T = Ce−λt.

When the partial derivative to x is worked out the hypergeometric function for x appears: x(1 − x)∂ 2X ∂x2 + 2(1 − 2x) ∂X ∂x − 2X = −4N Xλ. (31)

This equation can be solved by using the fact that if you represent the function as x(1 − x)∂X∂x22+ (γ − (α + β + 1)x)

∂X

∂x − αβX = 0, and then note that α + β = 3,

αβ = 2 − 4N λ and γ = 2. From this follows that α = 3+ √ 1+16N λ) 2 and β = 3− √ 1+16N λ)

2 . There is standard way to solve this equation, which has

singularities at x=0 and x=1, so at the boundaries. Because it is a second order dierential equation, there will be two independent solutions.

γ is a positive integer, and therefore the only solution that is regular at point z is F (α, β, 2, x) = P∞

k=0 (α)k(b)k

(2)k

xk

k!. This solution is not regular at x=1.

Therefore Kimura uses the following relation: F (α, β, 2, x) =Γ(2)Γ(2 − α − β)

Γ(2 − α)Γ(2 − β)F (α, β, −1 + α + β, 1 − x) +Γ(2)Γ(α + β − 2)

Γ(α)Γ(β) (1 − x)

(13)

32 only terminates for the limit x → 1 if either 2 − αor 2 − β is equal to a negative integer or zero. Kimura chose for 2 − α to be a negative integer. So αm= 2 + mand βm= 1 − iif m is a positive integer. Then for λ 33 holds.

2 − 3+

√ 1+16N λ

2 = −m m = 1, 2, 3, ...;

λ = m(m+1)4N . (33)

Because 2 − α always is a negative integer, α always is a positive integer of at least 3. β then must be zero or a negative integer for α + β = 3 to hold. The solution for X is thus F (2 + m, 1 − m, 2, x) multiplied by some constant. The total solution for φ looks like φ = P∞

m=1CmF (2 + m, 1 − m, 2, x)e−

m(m+1) 4N t.

However, in this form it is impossible to derive Cmwith Fourier's trick, for the

hypergeometric series are not orthogonal to each other. Therefore, Kimura uses the Gegenbauer polynomial in 35. In general, equation 34 is the formula for the Gegenbauer polynomial given as a hypergeometric series. In this specic case, n = m − 1 and κ = 32. For these coecients the the Gegenbauer polynomial becomes equation 35. Tn(κ)= Γ(2κ + n) Γ(n + 1)Γ(2κ)F (2κ + n; −n; κ + 1 2; 1 − z 2 ); (34) Tm−11 (z) = m(m + 1) 2 F (m + 2, 1 − i, 2, 1 − z 2 ). (35) Now if z=1-2x then X = T1 m−1(z) and φ = P ∞ m=1DmTm−11 (z)e− m(m+1) 4N t. Use that δ(x − p) = ∞ X m=1 DmTm−11 (z). By multiplying with (1 − z2)T1

m−1(z)the orthogonality property

 1 −1

(1 − z2)Tm1(z)Tm−11 (z) dz = δm,m−1

2(m + 1)m (2m + 1) can be used. So then the solution for Dmcan be obtained:

2{1 − (1 − 2p)2}T1 m−1(1 − 2p) = Dm 2(m + 1)m (2m + 1) ; Dm= 4p(1 − p) (2m + 1) m(m + 1)T 1 m−1(1 − 2p). (36)

The nal solution then is equation 37. The Gegenbauer polynomials are ex-pressed in hypergeometric functions again and the coecients have been worked out. φ(p, x; t) = ∞ X m=1 p(1 − p)(2m + 1)m(m + 1)· F (m + 2; 1 − m; 2; x)F (m + 2; 1 − m; 2; p)e−m(m+1)4N t. (37)

(14)

The series is convergent for t > 0, because of the exponential term at the end of equation 37. The rst few terms of φ(p, x; t) have been worked out in equation 38. φ(p, x; t) = ∞ X m=0 p(1 − p)(2m + 1)(m + 1)m [1 +(2 + m)(1 − m) 2 · 1 x + (2 + m)(3 + m)(1 − m)(2 − m) 2 · 3 · 1 · 2 x 2+ ..]· [(2 + m)(1 − m) 2 · 1 p + (2 + m)(3 + m)(1 − m)(2 − m) 2 · 3 · 2 · 1 p 2+ ...]e−m(m+1) 4N t= 0 + 6p(1 − p)e−2Nt + 30p(1 − p)(1 − 2x)(1 − 2p)e− 3t 2N + ... . (38)

Now if p is known the function φ(p, x; t) can be graphed for dierent values for x at time t. The two graphs Kimura made are depicted in Figure 1. The area under each graph equals the probability that the alleles A and a coexist. This area reduces in time. So the larger t, the smaller the values for φ(p, x; t)will be. Therefore, the function will atten down, which means that the frequency of unxed classes will decrease. The boundaries x=0 and x=1 work as absorbing boundaries. limt→∞φ(p, x; t) ∼ 6p(1 − p)e−

t

2N and so the height of the

proba-bility function decreases approximately at the rate of 1

2N. This rate was already

determined by Wright[20]. The value of p determines how fast the function will atten down. The population will become more homogeneous in time if solely the random sampling of gametes inuences the gene frequency distribution.

The case where there are more than two alleles to begin with is also consid-ered. It is shown that as the number of coexisting alleles increases, the eect of random drift increases rapidly as well. Therefore, it could be that random drift keeps down the number of coexisting alleles in a population.

5.6 Random drift with migration, selection and mutations

The random sampling of gametes only inuences Vδx, not Mδx. Migration,

mutations and selection on the other hand only aect Mδx. Kimura states that

the eect of migration is much larger than that of mutations. In the case were only mutation is present, the diusion equation is 39. Equation 9 is used for Mδx. Again only two alleles A and a are considered. ¯x is the gene ratio of A

in the total population, x the gene ratio of A in the subpopulation in question. To include the specic case of mutations from A to a and a to A, m should be replaced by m + n + o and m¯x by m¯x + o. n is the mutation rate from A to a, and o from a to A. ∂φ(p, x; t) ∂t = 1 2 ∂2 ∂x2{ x(1 − x) 2N φ(x; t)} − ∂ ∂x{m(¯x − x)φ(x; t)}. (39) Treating mutations separately leads to equation 40. where 7 is used for Mδx.

∂φ(p, x; t) ∂t = 1 2 ∂2 ∂x2{ x(1 − x) 2N φ(x; t)} − ∂ ∂x{(−µx + ν − νx)φ(x; t)}. (40)

(15)

Figure 1:

The process of change in gene frequencies due to the random sampling of gametes is depicted. The probability function on times t:{N/10,N/5,N/2,N,2N} for the gene frequency x in the interval (0,1) is given for p=0.1 (left gure) and p=0.5 (right gure). The function attens down in both cases. x=0 and x=1 work as absorbing boundaries leading to a loss of heterozygosity [13].

(16)

The solutions of equations 39 and 40 will not be treated in this thesis. The simplest case of natural selection and random sampling of gametes is ex-pressed by the diusion equation with Mδx= sx(1 − x)and Vδx=

x(1−x) 2N where

s is small. In order to solve this equation Kimura puts φ ∝ e2N sxV (x)e−λt.

Substituting this in diusion equation 41 leads to equation 42. ∂φ ∂t = −{ 1 2N + s(1 − 2x)}φ + { 1 − 2x 2N − sx(1 − x)} ∂φ ∂x + x(1 − x) 4N ∂2φ ∂x2; (41) x(1 − x)d 2V dx2 + 2(1 − 2x) dV dx − {2 + 4(sN ) 2x(1 − x) − 4N λ}V = 0. (42)

Equation 42 can be recognized as the oblate spheroidal equation if the sub-stitution x = 1−z 2 is executed and (−1 < z < 1): (1 − z2)d 2V dz2 − 4z dV dz + {(4N λ − 2 − c 2) + c2z2}V = 0. (43)

When s=0 the solution should reduce to the solution obtained for genetic drift. Stratton and others used the solution in 44 where V(1)

1k =

P0

n=0f k nTn1(z).

λk is the kth eigenvalue. The primed summation is over even values of n if k is

odd and over odd values if k is even. φ(p, x; t) = ∞ X k=0 Cke−λk+2N sx{ 0 X n=0,1 fnkTn1(z)}. (44)

Ck can be determined by using Fourier's trick, the fact that φ(p, x; 0) =

δ(x − p)and the orthogonal relation 5.6.  1 −1 (1 − z2)V1k(1)(z)V1l(1)(z) dz = δkl 0 X n=0,1 (fnk)2 (n + 2)! n!(2n + 3). (45) Thus for Ck: Ck = (1 − (1 − 2p)2)e−N s2pV(1) 1k (1 − 2p) P0 n=0,1 (n+1)(n+2) 2n+3 (fnk)2 . (46)

5.7 Summary

Kimura treated the rate of change of gene frequencies as a continuous stochastic, Markovian process. He dened φ(p, x; t) to be the transition probability that the gene frequency was x at time when the initial frequency was p. If p is xed, this is a frequency distribution of the gene frequencies. He also dened g(δx, x; δt, t) as the probability density of change in gene frequency from x to x + δx during δt. He derived the following equation:

φ(p, x; t + δt) − φ(p, x; t) δt = − ∂ ∂x{φ 1 δt  (δx)g d(δx)}+ 1 2 ∂2 ∂x2{φ 1 δt  (δx)2g d(δx)} − ... . (47)

(17)

He dened M(x,t) and V(x,t) as in 17 and 18 and stated the condition 19. He thereby obtained the general approximated diusion equation:

∂φ(p, x; t) ∂t = 1 2 ∂2 ∂x2{Vδxφ(x; t)} − ∂ ∂x{Mδxφ(x; t)}. (48) He studied several cases of evolution, of which genetic drift and selection were considered here. Due to the random sampling of gametes the frequency distribution function attens down at the rate of 1

2N. Finally, the allele

be-comes either xed or extinct. The more alleles there are the faster this process is. Therefore it seems to be that random drift keeps the number of alleles down. Kimura also obtains a solution for the random uctuations in selection inten-sity. While on the average the selection coecient can be zero or constant, it will uctuate through time and thereby have an inuence on the frequency dis-tribution. The cases of equilibrium and the xation of a mutant gene are also considered. For the latter he used the backward diusion equation.

5.8 Further research

Apart from composing the diusion equation, Kimura has also formed the neu-tral theory of molecular evolution. By insights in the DNA sequence the uni-cation between molecular and evolutionary biology was in reach. He discovered that most evolutionary changes at the molecular level are nearly neutral. There-fore, most xations are random and mutations and random drift are the main important explanatory factors. Natural selection still plays a role, but just a smaller one compared to before [14].

6 Discussion

Assumptions have to be made to use the diusion equation. The real biological situation is to complex to catch in a model. The number of individuals in the population determines whether the change of gene frequencies can be seen as an continuous process, and what the inuence of randomness is. The number of alleles and the way they possibly interact determine the both the mean and the variance. The selection coecient has to be determined, bases on present knowledge. It might as well be a linear function or there may be no clear pattern at all. There are numerous possibilities for the number and kind of mutations. Which have to be taken into consideration depends on the timescale. Depending on how well the real situation can be expressed in a solvable diusion equation determines the use of it.

The diusion equation has been used in several cases. Campos and Wahl studied the competition between asexual populations were multiple benecial mutations have arisen. The experimental work on this subject was all performed on populations growing exponentially between bottlenecks, but the theories were all for constant size populations. Therefore, they used the diusion equation and considered one time step as a complete cycle of population growth and

(18)

sampling. Also the number of individuals was not kept constant, but grew exponentially between the bottlenecks. Their results are in agreement with an individual-based simulation [2].

Based on the diusion equation, a path ensemble can be made considering all the possible histories that lead from one xed state to another in the interval (0, T ). For one history the functional reaches a minimum. The standard devi-ation from this minimum can be calculated. Predictions can be made based on this information. The probability of trajectory ρ(t) for the case V = p(1−p)

2N is

proportional to [4]:

e−N



(dpdt−Mδp)2p(1−p)dt . (49)

This function is obtained by using the path integral formulation of quantum mechanics, where the notion of a single, unique trajectory is replaced with a sum or functional integral over all possible trajectories, computing an amplitude [3]. In this case the trajectories were all possible gene frequency histories in time [4]. Only natural selection and the random sampling of gametes were taken into account in the ensemble they composed. With a thorough understanding of Kimura's diusion equation and all that lead to it, similar path ensembles for other evolutionary processes such a migration can be made in future research.

There possibly is a more fundamental problem with dynamical approaches based on tness to describe the process of evolution. New species continually arise thereby inuencing the conditions for already existing species. The func-tions and coecients mostly used in methods of population geneticists however, are only based on the conditions without the new species. Predictions therefore can not allow for the new species. Fitness therefore can be measured afterwards, but not be used for predictions, or so Klimek, Thurner and Hanel say. They propose a variational model instead, in a spin-model-like setup of evolutionary systems and dene an evolutionary potential. The description now does not start from an initial gene frequency p at t = 0. It describes the coevolution of the species and their habitat from the very beginning. Macroscopic events therefore can be described better than before [15]. Whether predictions for evo-lutionary systems based on their present state cannot be accurate remains to be seen. Speciation is probably mostly caused by small mutations and takes many generations [20]. So it will only have an signicant inuence on predictions for larger time scales. Furthermore, by treating evolution as a continuous stochastic process, deviations caused by new species can partly be taken into account in the variance, apart from the specic direction they could have. Independently from whether the old methods are intrinsically wrong, this new method can give valuable new insights in evolutionary systems.

References

[1] Bowler, P. J. (2003). Evolution: The History of an Idea. University of California Press Science [327-328]

(19)

[2] Campos, P. R. A. & Wahl, L. M. (2009) The Eects of Population Bottle-necks on Clonal Interference, and the Adaptation Eective Population Size Evolution Volume 63 Issue 4 [950-958]

[3] Chaichian, M. & Demichev, A. (2001). Path Integrals in Physics: Volume I Stochastic Processes and Quantum Mechanics Institute of Physics Pub-lishing [1]

[4] De Vladar, H. P. & Barton, N. H. ( 2011). The contribution of statistical physics to evolutionary biology. Trends in Ecology & Evolution , Volume 26 , Issue 8 [424 - 432]

[5] Drossel, B. (2010). Biological evolution and statistical physics. Advances in Physics Volume 50 Issue 2 [209-295]

[6] Fisher, R.A. (1930). The Genetical Theory of Natural Selection Oxford at the Clarendon Press [8-47]

[7] Fisher, R. A. (1930). The Distribution of Gene Ratios for Rare Mutations Proceedings of the Royal Society of Edinburgh Volume 50 [205-220] [8] Fuller, W. (1967). Physical contributions to the determination of biological

structure and function. Reports on Progress in Physics, Volume 30 [445] [9] Girlich, H-J (2003) A. N. Kolmogoro (1903-1987) und die Ursprünge der

Theorie stochastischer Prozesse Universität Leipzig, Mathematisches Insti-tut [7]

[10] Haldane, J. B. S. (1990; originally published between 1924-1932). A Mathe-matical Theory of Natural and Articial Selection Bulletin of MatheMathe-matical Biology Volume 52 [209-240]

[11] Huxley, J. (1942). Evolution: the Modern Synthesis. John Wiley & Sons [12] Kimura, M. (1955). Solution of a Process of Random Genetic Drift with a

Continuous Model PNAS Volume 41 Issue 3 [144-155]

[13] Kimura, M. (1964). Diusion Models in Population Genetics. Journal of Applied Probability, Volume 1 Issue 2 [177-232]

[14] Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge University Press [1-10]

[15] Klimek, P., Thurner, S. & Hanel (2010) Evolutionary dynamics from a variational principle. The American Physical Society 011901:[1-10] [16] Leigh, E. G. (1990) Preview and Introduction in 'The Causes of Evolution

' Princeton Science Library viii and ix

[17] Templeton, A. R. (2006) Population Genetics and Microevolutionary The-ory John Wiley & Sons [104-105]

(20)

[18] Wright, S. (1920). Systems of Mating I. The Biometric Relations Between Parent and Ospring Genetics Volume 6 [111-123]

[19] Wright, S. (1920) Systems of Mating II. The Eects of Inbreeding on the Genetic Composition of a Population Genetics Volume 6 [124-143]

[20] Wright, S. (1930) Evolution in Mendelian Populations Genetics Volume 16 [97-159]

[21] Wright, S. (1932). The roles of mutation, inbreeding, crossbreeding and selection in evolution Proceedings of the Sixth International Congress of Genetics [356-366]

Referenties

GERELATEERDE DOCUMENTEN

ALS is a very basic approach in comparison with the advanced techniques in current numerical linear algebra (for instance for the computation of the GSVD)... This means that prior

giese verskille wat ook aanleiding tot klem- en fokusverskille gee, het tot gevolg dat die Suid-Afrikaanse skoolgeskiedenishandboeke, asook akademiese publikasies, met betrekking

Although it is true that one well-powered study is better than two, each with half the sample size (see also our section in the target article on the dangers of multiple underpowered

The analysis of paleolithic material has not posited serious problems, perhaps because the tasks the flint tools were involved in turned out to be relatively

In the end, we interviewed 14 design professionals who have all been active in projects integrating major road infrastructure issues and land uses with a regional or local

In addition, literature (Urista &amp; Day, 2008) confirms that users satisfy their need for personal and interpersonal desires with online activities. Hypothesis 2,3 and 4 state

Additional file 4: Groups determined by statistical parsimony and GMYC tests for population-level entities for cases where there was more than one in the group.. Species Ficus host

At a time when immense changes seem to accelerate in various domains of life (Rosa 2013), when those domains exhibit multiple temporalities (Jordheim 2014), when we witness