Hardy–Weinberg principle

In population genetics, the Hardy–Weinberg principle, also known as the Hardy–Weinberg equilibrium, model, theorem, or law, states that allele and genotype frequencies in a population will remain constant from generation to generation in the absence of other evolutionary influences. These influences include genetic drift, mate choice, assortative mating, natural selection, sexual selection, mutation, gene flow, meiotic drive, genetic hitchhiking, population bottleneck, founder effect and inbreeding.
In the simplest case of a single locus with two alleles denoted A and a with frequencies and, respectively, the expected genotype frequencies under random mating are for the AA homozygotes, for the aa homozygotes, and for the heterozygotes. In the absence of selection, mutation, genetic drift, or other forces, allele frequencies p and q are constant between generations, so equilibrium is reached.
The principle is named after G. H. Hardy and Wilhelm Weinberg, who first demonstrated it mathematically. Hardy's paper was focused on debunking the then-commonly held view that a dominant allele would automatically tend to increase in frequency; today, confusion between dominance and selection is less common. Today, tests for Hardy–Weinberg genotype frequencies are used primarily to test for population stratification and other forms of non-random mating.

Derivation

Consider a population of monoecious diploids, where each organism produces male and female gametes at equal frequency, and has two alleles at each gene locus. Organisms reproduce by random union of gametes. A locus in this population has two alleles, A and a, that occur with initial frequencies and, respectively. The allele frequencies at each generation are obtained by pooling together the alleles from each genotype of the same generation according to the expected contribution from the homozygote and heterozygote genotypes, which are 1 and 1/2, respectively:
The different ways to form genotypes for the next generation can be shown in a Punnett square, where the proportion of each genotype is equal to the product of the row and column allele frequencies from the current generation.
The sum of the entries is, as the genotype frequencies must sum to one.
Note again that as, the binomial expansion of gives the same relationships.
Summing the elements of the Punnett square or the binomial expansion, we obtain the expected genotype proportions among the offspring after a single generation:
These frequencies define the Hardy–Weinberg equilibrium. It should be mentioned that the genotype frequencies after the first generation need not equal the genotype frequencies from the initial generation, e.g.. However, the genotype frequencies for all future times will equal the Hardy–Weinberg frequencies, e.g. for. This follows since the genotype frequencies of the next generation depend only on the allele frequencies of the current generation which, as calculated by equations and, are preserved from the initial generation:
For the more general case of dioecious diploids that reproduce by random mating of individuals, it is necessary to calculate the genotype frequencies from the nine possible matings between each parental genotype in either sex, weighted by the expected genotype contributions of each such mating. Equivalently, one considers the six unique diploid-diploid combinations:
and constructs a Punnett square for each, so as to calculate its contribution to the next generation's genotypes. These contributions are weighted according to the probability of each diploid-diploid combination, which follows a multinomial distribution with. For example, the probability of the mating combination is and it can only result in the genotype:. Overall, the resulting genotype frequencies are calculated as:
As before, one can show that the allele frequencies at time equal those at time, and so, are constant in time. Similarly, the genotype frequencies depend only on the allele frequencies, and so, after time are also constant in time.
If in either monoecious or dioecious organisms, either the allele or genotype proportions are initially unequal in either sex, it can be shown that constant proportions are obtained after one generation of random mating. If dioecious organisms are heterogametic and the gene locus is located on the X chromosome, it can be shown that if the allele frequencies are initially unequal in the two sexes , in the heterogametic sex ‘chases’ in the homogametic sex of the previous generation, until an equilibrium is reached at the weighted average of the two initial frequencies.

Deviations from Hardy–Weinberg equilibrium

The seven assumptions underlying Hardy–Weinberg equilibrium are as follows:

organisms are diploid
only sexual reproduction occurs
generations are nonoverlapping
mating is random
population size is infinitely large
allele frequencies are equal in the sexes
there is no migration, gene flow, admixture, mutation or selection

Violations of the Hardy–Weinberg assumptions can cause deviations from expectation. How this affects the population depends on the assumptions that are violated.

Random mating. The HWP states the population will have the given genotypic frequencies after a single generation of random mating within the population. When the random mating assumption is violated, the population will not have Hardy–Weinberg proportions. A common cause of non-random mating is inbreeding, which causes an increase in homozygosity for all genes.

If a population violates one of the following four assumptions, the population may continue to have Hardy–Weinberg proportions each generation, but the allele frequencies will change over time.

Selection, in general, causes allele frequencies to change, often quite rapidly. While directional selection eventually leads to the loss of all alleles except the favored one, some forms of selection, such as balancing selection, lead to equilibrium without loss of alleles.
Mutation will have a very subtle effect on allele frequencies. Mutation rates are of the order 10⁻⁴ to 10⁻⁸, and the change in allele frequency will be, at most, the same order. Recurrent mutation will maintain alleles in the population, even if there is strong selection against them.
Migration genetically links two or more populations together. In general, allele frequencies will become more homogeneous among the populations. Some models for migration inherently include nonrandom mating. For those models, the Hardy–Weinberg proportions will normally not be valid.
Small population size can cause a random change in allele frequencies. This is due to a sampling effect, and is called genetic drift. Sampling effects are most important when the allele is present in a small number of copies.
Sex linkage

Where the A gene is sex linked, the heterogametic sex have only one copy of the gene, while the homogametic sex have two copies. The genotype frequencies at equilibrium are p and q for the heterogametic sex but p², 2pq and q² for the homogametic sex.
For example, in humans red–green colorblindness is an X-linked recessive trait. In western European males, the trait affects about 1 in 12, whereas it affects about 1 in 200 females, very close to Hardy–Weinberg proportions.
If a population is brought together with males and females with a different allele frequency in each subpopulation, the allele frequency of the male population in the next generation will follow that of the female population because each son receives its X chromosome from its mother. The population converges on equilibrium very quickly.

Generalizations

The simple derivation above can be generalized for more than two alleles and polyploidy.

Generalization for more than two alleles

Consider an extra allele frequency, r. The two-allele case is the binomial expansion of ², and thus the three-allele case is the trinomial expansion of ².
More generally, consider the alleles A₁,..., A_n given by the allele frequencies p₁ to p_n;
giving for all homozygotes:
and for all heterozygotes:

Generalization for polyploidy

The Hardy–Weinberg principle may also be generalized to polyploid systems, that is, for organisms that have more than two copies of each chromosome. Consider again only two alleles. The diploid case is the binomial expansion of:
and therefore the polyploid case is the polynomial expansion of:
where c is the ploidy, for example with tetraploid :

Genotype	Frequency
AAAA
AAAa
AAaa
Aaaa
aaaa

Whether the organism is a 'true' tetraploid or an amphidiploid will determine how long it will take for the population to reach Hardy–Weinberg equilibrium.

Complete generalization

For distinct alleles in -ploids, the genotype frequencies in the Hardy–Weinberg equilibrium are given by individual terms in the multinomial expansion of :

Applications

The Hardy–Weinberg principle may be applied in two ways, either a population is assumed to be in Hardy–Weinberg proportions, in which the genotype frequencies can be calculated, or if the genotype frequencies of all three genotypes are known, they can be tested for deviations that are statistically significant.

Application to cases of complete dominance

Suppose that the phenotypes of AA and Aa are indistinguishable, i.e., there is complete dominance. Assuming that the Hardy–Weinberg principle applies to the population, then can still be calculated from f:
and can be calculated from. And thus an estimate of f and f derived from and respectively. Note however, such a population cannot be tested for equilibrium using the significance tests below because it is assumed a priori.

Significance tests for deviation

Testing deviation from the HWP is generally performed using Pearson's chi-squared test, using the observed genotype frequencies obtained from the data and the expected genotype frequencies obtained using the HWP. For systems where there are large numbers of alleles, this may result in data with many empty possible genotypes and low genotype counts, because there are often not enough individuals present in the sample to adequately represent all genotype classes. If this is the case, then the asymptotic assumption of the chi-squared distribution, will no longer hold, and it may be necessary to use a form of Fisher's exact test, which requires a computer to solve. More recently a number of MCMC methods of testing for deviations from HWP have been proposed

Example $\chi^2$ test for deviation

This data is from E. B. Ford on the scarlet tiger moth, for which the phenotypes of a sample of the population were recorded. Genotype-phenotype distinction is assumed to be negligibly small. The null hypothesis is that the population is in Hardy–Weinberg proportions, and the alternative hypothesis is that the population is not in Hardy–Weinberg proportions.

Phenotype	White-spotted	Intermediate	Little spotting	Total
Number	1469	138	5	1612

From this, allele frequencies can be calculated:
and
So the Hardy–Weinberg expectation is:
Pearson's chi-squared test states:
There is 1 degree of freedom. The 5% significance level for 1 degree of freedom is 3.84, and since the χ² value is less than this, the null hypothesis that the population is in Hardy–Weinberg frequencies is not rejected.

Fisher's exact test (probability test)

can be applied to testing for Hardy–Weinberg proportions. Since the test is conditional on the allele frequencies, p and q, the problem can be viewed as testing for the proper number of heterozygotes. In this way, the hypothesis of Hardy–Weinberg proportions is rejected if the number of heterozygotes is too large or too small. The conditional probabilities for the heterozygote, given the allele frequencies are given in Emigh as
where n₁₁, n₁₂, n₂₂ are the observed numbers of the three genotypes, AA, Aa, and aa, respectively, and n₁ is the number of A alleles, where.
An example
Using one of the examples from Emigh, we can consider the case where n = 100, and p = 0.34. The possible observed heterozygotes and their exact significance level is given in Table 4.

Number of heterozygotes	Significance level
0	0.000
2	0.000
4	0.000
6	0.000
8	0.000
10	0.000
12	0.000
14	0.000
16	0.000
18	0.001
20	0.007
22	0.034
34	0.067
24	0.151
32	0.291
26	0.474
30	0.730
28	1.000

Using this table, one must look up the significance level of the test based on the observed number of heterozygotes. For example, if one observed 20 heterozygotes, the significance level for the test is 0.007. As is typical for Fisher's exact test for small samples, the gradation of significance levels is quite coarse.
However, a table like this has to be created for every experiment, since the tables are dependent on both n and p.

Equivalence tests

The equivalence tests are developed in order to establish sufficiently good agreement of the observed genotype frequencies and Hardy Weinberg equilibrium. Let denote the family of the genotype distributions under the assumption of Hardy Weinberg equilibrium. The distance between a genotype distribution and Hardy Weinberg equilibrium is defined by, where is some distance. The equivalence test problem is given by and, where is a tolerance parameter. If the hypothesis can be rejected then the population is close to Hardy Weinberg equilibrium with a high probability. The equivalence tests for the biallelic case are developed among others in Wellek. The equivalence tests for the case of multiple alleles are proposed in Ostrovski.

Inbreeding coefficient

The inbreeding coefficient, F, is one minus the observed frequency of heterozygotes over that expected from Hardy–Weinberg equilibrium.
where the expected value from Hardy–Weinberg equilibrium is given by
For example, for Ford's data above;
For two alleles, the chi-squared goodness of fit test for Hardy–Weinberg proportions is equivalent to the test for inbreeding, F = 0.
The inbreeding coefficient is unstable as the expected value approaches zero, and thus not useful for rare and very common alleles. For: E = 0, O > 0, F = −∞ and E = 0, O = 0, F is undefined.

History

were rediscovered in 1900. However, it remained somewhat controversial for several years as it was not then known how it could cause continuous characteristics. Udny Yule argued against Mendelism because he thought that dominant alleles would increase in the population. The American William E. Castle showed that without selection, the genotype frequencies would remain stable. Karl Pearson found one equilibrium position with values of p = q = 0.5. Reginald Punnett, unable to counter Yule's point, introduced the problem to G. H. Hardy, a British mathematician, with whom he played cricket. Hardy was a pure mathematician and held applied mathematics in some contempt; his view of biologists' use of mathematics comes across in his 1908 paper where he describes this as "very simple":
The principle was thus known as Hardy's law in the English-speaking world until 1943, when Curt Stern pointed out that it had first been formulated independently in 1908 by the German physician Wilhelm Weinberg. William Castle in 1903 also derived the ratios for the special case of equal allele frequencies, and it is sometimes called the Hardy–Weinberg–Castle Law.

Derivation of Hardy's equations

Hardy's statement begins with a recurrence relation for the frequencies p, 2q, and r. These recurrence relations follow from fundamental concepts in probability, specifically independence, and conditional probability. For example, consider the probability of an offspring from the generation being homozygous dominant. Alleles are inherited independently from each parent. A dominant allele can be inherited from a homozygous dominant parent with probability 1, or from a heterozygous parent with probability 0.5. To represent this reasoning in an equation, let represent inheritance of a dominant allele from a parent. Furthermore, let and represent potential parental genotypes in the preceding generation.
The same reasoning, applied to the other genotypes yields the two remaining recurrence relations. Equilibrium occurs when each proportion is constant between subsequent generations. More formally, a population is at equilibrium at generation when
By solving these equations necessary and sufficient conditions for equilibrium to occur can be determined. Again, consider the frequency of homozygous dominant animals. Equilibrium implies
First consider the case, where, and note that it implies that and. Now consider the remaining case, where ≠
Where the final equality holds because the allele proportions must sum to one. In both cases,. It can be shown that the other two equilibrium conditions imply the same equation. Together, the solutions of the three equilibrium equations imply sufficiency of Hardy's condition for equilibrium. Since the condition always holds for the second generation, all succeeding generations have the same proportions.

Numerical example

Estimation of Genotype distribution

An example computation of the genotype distribution given by Hardy's original equations is instructive. The phenotype distribution from Table 3 above will be used to compute Hardy's initial genotype distribution. Note that the p and q values used by Hardy are not the same as those used above.
As checks on the distribution, compute
and
For the next generation, Hardy's equations give
Again as checks on the distribution, compute
and
which are the expected values. The reader may demonstrate that subsequent use of the second-generation values for a third generation will yield identical results.

Estimation of Carrier frequency

The Hardy–Weinberg principle can also be used to estimate the frequency of carriers of an autosomal recessive condition in a population based on the frequency of suffers.
Let us assume an estimated babies are born with cystic fibrosis, this is about the frequency of homozygous individuals observed in Northern European populations. We can use the Hardy–Weinberg equations to estimate the carrier frequency, the frequency of heterozygous individuals,.
As is small we can take p,, to be 1.
We therefore estimate the carrier rate to be, which is about the frequency observed in Northern European populations.
This can be simplified to the carrier frequency being about twice the square root of the birth frequency.

Graphical representation

It is possible to represent the distribution of genotype frequencies for a bi-allelic locus within a population graphically using a de Finetti diagram. This uses a triangular plot to represent the distribution of the three genotype frequencies in relation to each other. It differs from many other such plots in that the direction of one of the axes has been reversed. The curved line in the diagram is the Hardy–Weinberg parabola and represents the state where alleles are in Hardy–Weinberg equilibrium. It is possible to represent the effects of natural selection and its effect on allele frequency on such graphs. The de Finetti diagram was developed and used extensively by A. W. F. Edwards in his book Foundations of Mathematical Genetics.

Citations

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...