ZOOL 304

Mathematical Models in Evolutionary Genetics

IMPORTANT NOTE:  Mathematical models are NOT laws of nature.  Equations apply ONLY to the formal models in which they are derived. This is a general problem with formal models.  Without exception, models in population genetics depend on assumptions whose applicability to particular cases may be doubted.  (Indeed, common assumptions are both simplistic and unrealistic.)  

"In so far as the propositions of mathematics apply to reality they are not certain,
and in so far as they are certain they do not apply to reality."

Albert Einstein

What are mathematical models in population genetics?

HARDY-WEINBERG EQUILBRIUM

Modelling selection

Modelling genetic drift

For a readable discussion of history, assumptions, and limitations of models in population genetics, see:

John Wakeley (2005) The limits of theoretical population genetics, Genetics 169: 1-7.

304 index page

MODELLING BASIC POPULATION GENETICS

Mathematical models of populations begin with some assumptions and then proceed by analyzing the quantitative consequences of the assumptions, typically by calculating repeated population cycles.  The cycle can begin at any point but for convenience let us begin with an initial pool of gametes.  Then...

The initial assumptions may be simple or complex, but in either case they are intended to reflect our understanding of some aspect of biological reality.

For some relatively simple models, the long-term effects can be calculated directly from the initial conditions.  But for most models with any degree of complexity, the modelling must proceed by iteration, one generation at a time.

In the example below, we shall use the following conventions.  [NOTE:  Every italicized word is critical; if you don't confidently remember this terminology from introductory biology, please ask ASAP.]

We shall assume a sexually-reproducing population of diploid organisms.   Adult organisms produce haploid gametes by meiosis.  Gametes then fertilize one another to form zygotes.  Zygotes then mature to become adults.  This cycle then repeats indefinitely, with each cycle comprising one generation.  

We shall assume two different alleles, A and a, at a single locus.  These alleles can combine to form any of three possible genotypes, AA, Aa, and aa.

We shall use the variable p to represent the frequency of allele A, and we shall use the variable q to represent the frequency of allele a.  Since the sum of all the allele frequencies must equal unity, p + q = 1 .

304 index page

The Hardy-Weinberg Equilibrium is a mathematical idealization that serves as a useful starting point for thinking about population genetics.

The concept of a Hardy-Weinberg Equilibrium is built upon several important, simplifying ASSUMPTIONS.

Note that none of these assumptions is strictly applicable to ordinary natural populations.  The main purpose of this model is to permit subsequent exploration of the consequences when any of these assumptions is altered.  Nevertheless, these assumptions are approximately true for many real populations, close enough to make the model applicable in nature.

Let us consider a Hardy-Weinberg model of a diploid population with two alleles A and a.  We define p and q to be the frequencies of these alleles, respectively.

p = frequency of A      We abbreviate this as [freq A].
         and
q = frequency of a      We abbreviate this as [freq a].

Since there are only two alleles,

p + q = 1.  

If this isn't obvious, just remember that the frequencies of all the elements which comprise a population must sum to one.  That is, all the parts equal the whole.  Also, by simple algebra,

p = 1 - q  and q = 1 - p .  

These algebraic relationships will come in useful later, if you try to work through the steps for yourself.

Of course, alleles are usually found within genotypes.  Allele frequencies, therefore, depend on the genotype frequencies.  If we want to determine allele frequencies, we usually measure genotype frequencies.  Once we know genotype frequencies, it then easy to calculate allele frequencies:

The frequencies of alleles A and a (p and q, respectively) are:

p = [freq A] = [freq AA] + 1/2 x [freq Aa]
         and
q = [freq a] = [freq aa] + 1/2 x [freq Aa]

If this is not obvious, then remember that each homozygote contains two alleles of the same type, while the heterozygote contains one of each allele.  Therefore, we can calculate the allele frequencies for any combination of genotype frequencies, basically, by counting the alleles in each genotype.  

Next, since frequencies must sum to one (all the parts equal the whole),

[freq AA] + [freq Aa] + [freq aa] = 1

And we can see that, as expected,

p + q = [freq AA] + [freq Aa] + [freq aa] = 1.

This is not a new result; it merely confirms that we can "prove" what we already knew, i.e. that p + q = 1.

After random mating, new genotypes are formed at frequencies which correspond to the products of the frequencies of their respective alleles (for explanation of this, see Hardy-Weinberg frequencies). 

[freq AA] = p2

[freq Aa] = 2pq  

[freq aa] = q2

This is basically the mathematical definition of what is meant by "random mating" in a Hardy-Weinberg populations.  In a Hardy-Weinberg (i.e., infinitely large) population, the probability that a particular gamete contains a particular allele is equal to the frequency of that allele in the gene pool. The probability that any two particular alleles will come together to form a particular genotype is equal to the product of the probability of choosing one of the alleles multiplied by the probability of choosing the other allele, in other words the product of the frequencies of the two alleles.  And (in an infinite population), the expected frequency of a genotype is exactly equal to probability that that genotype will be formed from the random combination of its alleles.  (The expected frequency of genotype Aa is twice pq because there are two different types of fertilization which yield the Aa genotype -- sperm A fertilizes egg a, and sperm a fertilizes egg A.)

We now recompute new allele frequencies (p' and q') in the next generation,

p' = [freq AA] + 1/2 x [freq Aa] = p2 + 1/2 x 2pq = p2 + pq

Substituting and simplifying,

p' p2 + pq  =  p x p + p x p x ( p + q)  =  p x 1  =  p , or

p' = p    (and, similarly, q' = q )

Hence allele frequencies do not change in a population which meets the Hardy-Weinberg assumptions.  Furthermore, random matings will always produce zygotes with the same standard genotype frequencies:

Genotypes AA Aa aa
Frequencies p2 2pq q2

These are the Hardy-Weinberg equilibrium genotype frequencies.  As long the conditions of Hardy-Weinberg equilibrium are met, allele frequencies will not change and these genotype frequencies will occur.

304 index page

MODELS OF SELECTION.

To model selection, we simply repeat what we did above to model the Hardy-Weinberg equilibrium, but introduce some change in genotype frequency or mating pattern along the way.  In other words, we alter one of the Hardy-Weinberg assumptions.  Typical introductory models assume that one genotype increases (or decreases) in frequency during the step from zygote maturation to mating.  In the model, that change in frequency is selection.

Since "frequency" is always measured relative to total standing population size, it really doesn't matter whether we model selection as increasing the frequency of a favored genotype or decreasing the frequency of a disadvantaged genotype.

Here is a step-by-step reconstruction of the example on p. 77 of our textbook.

We begin with the Hardy-Weinberg allele frequencies and genotype frequencies,

Genotypes AA Aa aa
Frequencies p2 2pq q2

 

We assume that selection favors genotype aa.  The degree of selection is quantified by a variable s, called the selection coefficient, such that after selection the relative proportion of aa is increased by a factor of (1 + s).  That is,

[freq aa before selection] = q2 
[freq aa after selection] = q2 x (1 + s)

(1 + s) is a mathematical definition of the fitness of genotype aa.  (More complex models may have more complex definitions of fitness.)

Furthermore, in this example we assume that the heterozygote Aa has an intermediate fitness value of (1 + hs).  

[freq Aa before selection] = 2pq 
[freq Aa after selection] = 2pq x (1 + hs)

The variable h describes the degree of dominance.  Note that if h = 0, then the fitness of Aa is equal to that of AA.  With respect to this pattern of selection, when h = 0 the a allele is effectively recessive to the A allele.  Conversely, if h = 1, the fitness of Aa equals that of aa, and a is effectively dominant.  Intermediate values 0 < h < 1 represent partial dominance.  If h is greater than one, the condition is called "overdominance" (also called heterosis or heterozygote advantage).

Before selection, the genotype frequences were:

Genotypes AA Aa aa
Frequencies p2 2pq q2

Now, after selection, the relative genotype frequences are:

Genotypes AA Aa aa
Relative requencies p2 2pq x (1 + hs) q2 x (1 + s)

To find the actual genotype frequences, each relative frequency must be divided by the sum of them all.  This sum of relative genotype frequencies after selection is called the mean fitness of the population, and is designated w (Sewall Wright, who introduced w as a designation for fitness, is said to have explained that w stands for "worth").  (See the definition of fitness.)

w  =  mean fitness  =  p2 + 2pq x (1 + hs) + q2 x (1 + s)

From this we calculate the new frequency for a as:

q'  =  [freq a, after selection]  =  { 1/2 x 2pq x (1 + hs) + q2 x (1 + s} / w

Simplifying (slightly) yields equation 4.2 in the textbook (p. 77):  

q'  =  q x { 1 + s (q + hp) } / w

Note that q' (the frequency of allele a) depends not only on s but also on q (the frequency of allele a in the previous generation).  

Such models are not much fun to analyze by hand algebraically.  But with a computer program it is quite straightforward to solve for allele frequencies generation after generation and observe how they behave in these particular conditions or with some other set of assumptions.

You may request from Dr. King a program that plots allele frequency change in response to selection while you experiment (i.e., play) with the parameters of selection coefficient, allele frequency, and dominance.  The program, which will be returned as an e-mail attachment, should run on PCs with Windows or DOS-based operating systems.

An alternative resource is The Natural Selection Model, (from the University of Tennessee at Martin), a website which allows you to enter allele frequencies and selection coefficients and then returns with a list of allele frequencies for up to 200 generations.

Several general conclusions have been reached from analyzing such models.

This points us toward R.A. Fisher's (1930) Fundamental Theorem of Natural Selection:

Alternatively, Sewall Wright conceived a plot of mean fitness vs. gene frequency as an adaptive landscape.  Selection always drives populations "uphill" on a single-locus adaptive landscape.

 

Notes for chapter 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13 / 14 / 15 / 16 / 17

304 index page

Comments and questions: dgking@siu.edu
Department of Zoology e-mail: zoology@zoology.siu.edu
Comments and questions related to web server: webmaster@science.siu.edu


SIUC / College of Science / Zoology / Faculty / David King / ZOOL 304
URL: http://www.science.siu.edu/zoology/king/304/models.htm
Last updated:  28 January 2005 / dgk