Zoology 304, Evolution

ZOOL 304

Mathematical Models in Evolutionary Genetics

IMPORTANT NOTE: Mathematical models are NOT laws of nature. Equations apply ONLY to the formal models in which they are derived. This is a general problem with formal models. Without exception, models in population genetics depend on assumptions whose applicability to particular cases may be doubted. (Indeed, common assumptions are both simplistic and unrealistic.)

"In so far as the propositions of mathematics apply to reality they are not certain,
and in so far as they are certain they do not apply to reality."
Albert Einstein

What are mathematical models in population genetics?

HARDY-WEINBERG EQUILBRIUM

Modelling selection

Modelling genetic drift

For a readable discussion of history, assumptions, and limitations of models in population genetics, see:

John Wakeley (2005) The limits of theoretical population genetics, Genetics 169: 1-7.

304 index page

MODELLING BASIC POPULATION GENETICS

Mathematical models of populations begin with some assumptions and then proceed by analyzing the quantitative consequences of the assumptions, typically by calculating repeated population cycles. The cycle can begin at any point but for convenience let us begin with an initial pool of gametes. Then...

We begin by assuming initial values for allele frequencies.

From the initial allele frequencies, we calculate the frequencies for zygotes of each genotype, using some assumptions about mating. Typically random mating is assumed (i.e., random pairings among all gametes), but we could also assume some degree of assortative mating such that a genotype is more (or less) likely to mate with a matching genotype.

We then follow zygotes to adulthood, through production of new gametes, mating, and production of the next generation of zygotes. At each step, we may introduce assumptions about the relative proportion of each genotype which survives and mates, and about the number of zygotes produced by each mating.

We then repeat the cycle, for as many generations as we wish.

The initial assumptions may be simple or complex, but in either case they are intended to reflect our understanding of some aspect of biological reality.

For some relatively simple models, the long-term effects can be calculated directly from the initial conditions. But for most models with any degree of complexity, the modelling must proceed by iteration, one generation at a time.

In the example below, we shall use the following conventions. [NOTE: Every italicized word is critical; if you don't confidently remember this terminology from introductory biology, please ask ASAP.]

We shall assume a sexually-reproducing population of diploid organisms. Adult organisms produce haploid gametes by meiosis. Gametes then fertilize one another to form zygotes. Zygotes then mature to become adults. This cycle then repeats indefinitely, with each cycle comprising one generation.

We shall assume two different alleles, A and a, at a single locus. These alleles can combine to form any of three possible genotypes, AA, Aa, and aa.

We shall use the variable p to represent the frequency of allele A, and we shall use the variable q to represent the frequency of allele a. Since the sum of all the allele frequencies must equal unity, p + q = 1 .

304 index page

The Hardy-Weinberg Equilibrium is a mathematical idealization that serves as a useful starting point for thinking about population genetics.

The concept of a Hardy-Weinberg Equilibrium is built upon several important, simplifying ASSUMPTIONS.

Random mating and fertilization. We assume that alleles pair by chance according to their frequencies. (Imagine that adults all pour equal numbers of gametes into a common gene pool, much like what actually happens for many marine invertebrates where gametes are shed en masse into the seawater.)

No selection. We assume each genotype has an equal probability of surviving to adulthood and of mating and that all matings produce similar numbers of offspring.

No mutation. We assume that no allele ever changes from one form into another.

No movement in or out of the population. We assume no introduction or loss of alleles due to immigration or emigration.

Indefinitely large population. We assume that the population is mathematically infinite. This absurd assumption allows us to calculate exact frequencies and probabilities, thereby eliminating chance statistical fluctuations ("genetic drift").

Note that none of these assumptions is strictly applicable to ordinary natural populations. The main purpose of this model is to permit subsequent exploration of the consequences when any of these assumptions is altered. Nevertheless, these assumptions are approximately true for many real populations, close enough to make the model applicable in nature.

Let us consider a Hardy-Weinberg model of a diploid population with two alleles A and a. We define p and q to be the frequencies of these alleles, respectively.

p = frequency of A We abbreviate this as [freq A].
and
q = frequency of a We abbreviate this as [freq a].

Since there are only two alleles,

p + q = 1.

If this isn't obvious, just remember that the frequencies of all the elements which comprise a population must sum to one. That is, all the parts equal the whole. Also, by simple algebra,

p = 1 - q and q = 1 - p .

These algebraic relationships will come in useful later, if you try to work through the steps for yourself.

Of course, alleles are usually found within genotypes. Allele frequencies, therefore, depend on the genotype frequencies. If we want to determine allele frequencies, we usually measure genotype frequencies. Once we know genotype frequencies, it then easy to calculate allele frequencies:

The frequencies of alleles A and a (p and q, respectively) are:

p = [freq A] = [freq AA] + 1/2 x [freq Aa]
and
q = [freq a] = [freq aa] + 1/2 x [freq Aa]

If this is not obvious, then remember that each homozygote contains two alleles of the same type, while the heterozygote contains one of each allele. Therefore, we can calculate the allele frequencies for any combination of genotype frequencies, basically, by counting the alleles in each genotype.

Next, since frequencies must sum to one (all the parts equal the whole),

[freq AA] + [freq Aa] + [freq aa] = 1

And we can see that, as expected,

p + q = [freq AA] + [freq Aa] + [freq aa] = 1.

This is not a new result; it merely confirms that we can "prove" what we already knew, i.e. that p + q = 1.

After random mating, new genotypes are formed at frequencies which correspond to the products of the frequencies of their respective alleles (for explanation of this, see Hardy-Weinberg frequencies).

[freq AA] = p²

[freq Aa] = 2pq

[freq aa] = q²

This is basically the mathematical definition of what is meant by "random mating" in a Hardy-Weinberg populations. In a Hardy-Weinberg (i.e., infinitely large) population, the probability that a particular gamete contains a particular allele is equal to the frequency of that allele in the gene pool. The probability that any two particular alleles will come together to form a particular genotype is equal to the product of the probability of choosing one of the alleles multiplied by the probability of choosing the other allele, in other words the product of the frequencies of the two alleles. And (in an infinite population), the expected frequency of a genotype is exactly equal to probability that that genotype will be formed from the random combination of its alleles. (The expected frequency of genotype Aa is twice pq because there are two different types of fertilization which yield the Aa genotype -- sperm A fertilizes egg a, and sperm a fertilizes egg A.)

We now recompute new allele frequencies (p' and q') in the next generation,

p' = [freq AA] + 1/2 x [freq Aa] = p² + 1/2 x 2pq = p² + pq

Substituting and simplifying,

p' = p² + pq = p x p + p x q = p x ( p + q) = p x 1 = p , or

p' = p (and, similarly, q' = q )

Hence allele frequencies do not change in a population which meets the Hardy-Weinberg assumptions. Furthermore, random matings will always produce zygotes with the same standard genotype frequencies:

Genotypes AA Aa aa

Frequencies p² 2pq q²

These are the Hardy-Weinberg equilibrium genotype frequencies. As long the conditions of Hardy-Weinberg equilibrium are met, allele frequencies will not change and these genotype frequencies will occur.

Therefore, in any real population, IF the genotype frequencies differ from this, or IF allele frequencies change from one generation to the next, then we can conclude that one or more of the Hardy-Weinberg assumptions are violated.

Furthermore, we can explore quantitatively how particular departures from the Hardy-Weinberg assumptions should affect the population, and thereby determine what sets of conditions could match the observed pattern of allele and genotype frequencies.

304 index page

MODELS OF SELECTION.

To model selection, we simply repeat what we did above to model the Hardy-Weinberg equilibrium, but introduce some change in genotype frequency or mating pattern along the way. In other words, we alter one of the Hardy-Weinberg assumptions. Typical introductory models assume that one genotype increases (or decreases) in frequency during the step from zygote maturation to mating. In the model, that change in frequency is selection.

Since "frequency" is always measured relative to total standing population size, it really doesn't matter whether we model selection as increasing the frequency of a favored genotype or decreasing the frequency of a disadvantaged genotype.

Here is a step-by-step reconstruction of the example on p. 77 of our textbook.

We begin with the Hardy-Weinberg allele frequencies and genotype frequencies,

Genotypes AA Aa aa

Frequencies p² 2pq q²

We assume that selection favors genotype aa. The degree of selection is quantified by a variable s, called the selection coefficient, such that after selection the relative proportion of aa is increased by a factor of (1 + s). That is,

[freq aa before selection] = q²
[freq aa after selection] = q² x (1 + s)

(1 + s) is a mathematical definition of the fitness of genotype aa. (More complex models may have more complex definitions of fitness.)

Furthermore, in this example we assume that the heterozygote Aa has an intermediate fitness value of (1 + hs).

[freq Aa before selection] = 2pq
[freq Aa after selection] = 2pq x (1 + hs)

The variable h describes the degree of dominance. Note that if h = 0, then the fitness of Aa is equal to that of AA. With respect to this pattern of selection, when h = 0 the a allele is effectively recessive to the A allele. Conversely, if h = 1, the fitness of Aa equals that of aa, and a is effectively dominant. Intermediate values 0 < h < 1 represent partial dominance. If h is greater than one, the condition is called "overdominance" (also called heterosis or heterozygote advantage).

Before selection, the genotype frequences were:

Genotypes AA Aa aa

Frequencies p² 2pq q²

Now, after selection, the relative genotype frequences are:

Genotypes AA Aa aa

Relative requencies p² 2pq x (1 + hs) q² x (1 + s)

To find the actual genotype frequences, each relative frequency must be divided by the sum of them all. This sum of relative genotype frequencies after selection is called the mean fitness of the population, and is designated w (Sewall Wright, who introduced w as a designation for fitness, is said to have explained that w stands for "worth"). (See the definition of fitness.)

w = mean fitness = p² + 2pq x (1 + hs) + q² x (1 + s)

From this we calculate the new frequency for a as:

q' = [freq a, after selection] = { 1/2 x 2pq x (1 + hs) + q² x (1 + s) } / w

Simplifying (slightly) yields equation 4.2 in the textbook (p. 77):

q' = q x { 1 + s (q + hp) } / w

Note that q' (the frequency of allele a) depends not only on s but also on q (the frequency of allele a in the previous generation).

Such models are not much fun to analyze by hand algebraically. But with a computer program it is quite straightforward to solve for allele frequencies generation after generation and observe how they behave in these particular conditions or with some other set of assumptions.

You may request from Dr. King a program that plots allele frequency change in response to selection while you experiment (i.e., play) with the parameters of selection coefficient, allele frequency, and dominance. The program, which will be returned as an e-mail attachment, should run on PCs with Windows or DOS-based operating systems.

An alternative resource is The Natural Selection Model, (from the University of Tennessee at Martin), a website which allows you to enter allele frequencies and selection coefficients and then returns with a list of allele frequencies for up to 200 generations.

Several general conclusions have been reached from analyzing such models.

Predictable rates of change in frequency. An advantageous allele will increase in frequency generation by generation.

When an allele frequency is low (less than 5%), the rate of change is slow.

When an allele frequency is high (more than than 95%), the rate of change is slow.

At intermediate frequencies, the rate of change is high.

Equivalence of "positive" and "negative" selection. Selection against one allele is, necessarily, selection for another allele. Similarly, when one allele is rare, the alternative allele must be common (and vice versa).

The slow rate of change at both low and high frequency simply reflects this equivalence.

Thus, when a rare allele changes frequency a slow rate, the common allele must also be changing at a slow rate (and vice versa).

Dominance effect. At very low (or high) frequency, the rate of increase depends on dominance. (For a graphical illustration, see here.)

When a recessive allele is at low frequency (or, equivalently, when a dominant allele is at high frequency), the recessive allele is effectively "hidden" from selection in heterozygote genotypes.

When the frequency of one allele is very low, the rate of change in allele frequency is lower if that allele is recessive rather than dominant.

Conversely (by equivalence of positive and negative selection), when the frequency of one allele is very high, the rate of change in allele frequency is lower if that allele is dominant rather than recessive.

A newly introduced, advantageous recessive allele may take a very long time to reach an appreciable frequency in a population.

The elimination of a deleterious recessive allele can take a very long time. Essentially, selection can only keep the frequency very low. The final elimination of a deleterious recessive occurs by drift (but such elimination is assured because selection keeps the frequency low).

Fixation. Eventually, any allele which confers a consistent advantage will become "fixed", meaning that alternative alleles will disappear from the population. However, in large populations a deleterious recessive allele becomes "invisible" to selection at very low frequency and can persist indefinitely until eliminated by random statistical fluctuation (genetic drift).

Heterosis. Selection which favors the heterozygote over either homozygote will yield a stable equilibrium in which both alleles persist at intermediate frequencies (see Chapter 5).

Mutation. If mutation introduces deleterious alleles into a population, one may calculate an equilibrium allele frequency such that the rate at which new mutations appear is balanced by the rate at which they are eliminated by selection.

This points us toward R.A. Fisher's (1930) Fundamental Theorem of Natural Selection:

"The rate of increase in fitness of any given organism at any given time is equal to its genetic variance in fitness at that time." When genotype fitnesses depend on a single locus, natural selection acts to increase the mean fitness of a population. (more) (more)

Alternatively, Sewall Wright conceived a plot of mean fitness vs. gene frequency as an adaptive landscape. Selection always drives populations "uphill" on a single-locus adaptive landscape.

Notes for chapter 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13 / 14 / 15 / 16 / 17

304 index page

Comments and questions: dgking@siu.edu
Department of Zoology e-mail: zoology@zoology.siu.edu
Comments and questions related to web server: webmaster@science.siu.edu

SIUC / College of Science / Zoology / Faculty / David King / ZOOL 304
URL: http://www.science.siu.edu/zoology/king/304/models.htm
Last updated: 28 January 2005 / dgk

Genotypes	AA	Aa	aa
Relative requencies	p²	2pq x (1 + hs)	q² x (1 + s)