Fixation index

The fixation index is a measure of population differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data, such as single-nucleotide polymorphisms or microsatellites. Developed as a special case of Wright's F-statistics, it is one of the most commonly used statistics in population genetics.

Definition

Two of the most commonly used definitions for F_ST at a given locus are based on the variance of allele frequencies between populations, and on the probability of Identity by descent.
If is the average frequency of an allele in the total population, is the variance in the frequency of the allele between different subpopulations, weighted by the sizes of the subpopulations, and is the variance of the allelic state in the total population, F_ST is defined as
Wright's definition illustrates that F_ST measures the amount of genetic variance that can be explained by population structure. This can also be thought of as the fraction of total diversity that is not a consequence of the average diversity within subpopulations, where diversity is measured by the probability that two randomly selected alleles are different, namely. If the allele frequency in the th population is and the relative size of the th population is, then
Alternatively,
where is the probability of identity by descent of two individuals given that the two individuals are in the same subpopulation, and is the probability that two individuals from the total population are identical by descent. Using this definition, F_ST can be interpreted as measuring how much closer two individuals from the same subpopulation are, compared to the total population. If the mutation rate is small, this interpretation can be made more explicit by linking the probability of identity by descent to coalescent times: Let T₀ and T denote the average time to coalescence for individuals from the same subpopulation and the total population, respectively. Then,
This formulation has the advantage that the expected time to coalescence can easily be estimated from genetic data, which led to the development of various estimators for F_ST.

Estimation

In practice, none of the quantities used for the definitions can be easily measured. As a consequence, various estimators have been proposed. A particularly simple estimator applicable to DNA sequence data is:
where and represent the average number of pairwise differences between two individuals sampled from different sub-populations or from the same sub-population. The average pairwise difference within a population can be calculated as the sum of the pairwise differences divided by the number of pairs. However, this estimator is biased when sample sizes are small or if they vary between populations. Therefore, more elaborate methods are used to compute F_ST in practice. Two of the most widely used procedures are the estimator by Weir & Cockerham, or performing an Analysis of molecular variance. A list of implementations is available at the end of this article.

Interpretation

This comparison of genetic variability within and between populations is frequently used in applied population genetics. The values range from 0 to 1. A zero value implies complete panmixis; that is, that the two populations are interbreeding freely. A value of one implies that all genetic variation is explained by the population structure, and that the two populations do not share any genetic diversity.
For idealized models such as Wright's finite island model, F_ST can be used to estimate migration rates. Under that model, the migration rate is
where is the migration rate per generation, and is the mutation rate per generation.
The interpretation of F_ST can be difficult when the data analyzed are highly polymorphic. In this case, the probability of identity by descent is very low and F_ST can have an arbitrarily low upper bound, which might lead to misinterpretation of the data. Also, strictly speaking F_ST is not a distance in the mathematical sense, as it does not satisfy the triangle inequality.
For populations of plants which clearly belong to the same species, values of F_ST greater than 15% are considered "great" or "significant" differentiation, while values below 5% are considered "small" or "insignificant" differentiation.
Values for mammal populations between subspecies, or closely related species, typical values are of the order of 5% to 20%. F_ST between the Eurasian and North American populations of the gray wolf were reported at 9.9%, those between the Red wolf and Gray wolf populations at between 17% and 18%. The Eastern wolf, a recently recognized highly admixed "wolf-like species" has values of F_ST below 10% in comparison with both Eurasian and North American gray wolves, with the Red wolf, and even an even lower value when paired with the Coyote.

F_ST in humans

F_ST values depend strongly on the choice of populations.
Closely related ethnic groups, such as the Danes vs. the Dutch, or the French vs. the Spaniards show values significantly below 1%, indistinguishable from panmixia.
Within Europe, the most divergent ethnic groups have been found to have values of the order of 7%.
Larger values are found if highly divergent homogenous groups are compared: the highest such value found was at close to 46%, between Mbuti and Papuans.

Autosomal genetic distances based on classical markers

In their study The History and Geography of Human Genes , Cavalli-Sforza, Menozzi and Piazza provide some of the most detailed and comprehensive estimates of genetic distances between human populations, within and across continents. Their initial database contains 76,676 gene frequencies, corresponding to 6,633 samples in different locations. By culling and pooling such samples, they restrict their analysis to 491 populations. They focus on aboriginal populations that were at their present location at the end of the 15th century when the great European migrations began. When studying genetic difference at the world level, the number is reduced to 42 representative populations, aggregating subpopulations characterized by a high level of genetic similarity.
For these 42 populations, Cavalli-Sforza and coauthors report bilateral distances computed from 120 alleles. Among this set of 42 world populations, the greatest genetic distance observed is between Mbuti Pygmies and Papua New Guineans, where the Fst distance is 0.4573, while the smallest genetic distance is between the Danish and the English. When considering more disaggregated data for 26 European populations, the smallest genetic distance is between the Dutch and the Danes, and the largest is between the Lapps and the Sardinians. The mean genetic distance among the 861 available pairings of the 42 selected populations was found to be 0.1338.. A genetic distance of 0.1338 implies that kinship between unrelated individuals of the same ancestry relative to the world population is equivalent to kinship between half siblings in a randomly mating population. This also implies that if a human from a given ancestral population has a mixed half-sibling, that human is closer genetically to an unrelated individual of their ancestral population than to their mixed half-sibling.

Autosomal genetic distances based on SNPs

A 2012 study based on International HapMap Project data estimated F_ST
between the three major "continental" populations of Europeans, East Asians and
Sub-Saharan Africans. It reported a value close to 12% between continental populations, and values close to panmixia within continental populations.

	Europe	Sub-Saharan Africa	East-Asia
Sub-Saharan Africa	0.153
East-Asia	0.111	0.190
East-Asia	0.110	0.192	0.007

	Italians	Palestinians	Swedish	Finns	Spanish	Germans	Russians	French	Greeks
Palestinians	0.0064
Swedish	0.0064-0.0090	0.0191
Finns	0.0130-0.0230		0.0050-0.0110
Spanish	0.0010-0.0050	0.0101	0.0040-0055	0.0110-0.0170
Germans	0.0029-0.0080	0.0136	0.0007-0.0010	0.0060-0.0130	0.0015-0.0030
Russians	0.0088-0.0120	0.0202	0.0030-0.0036	0.0060-0.0120	0.0070-0.0079	0.0030-0.0037
French	0.0030-0.0050		0.0020	0.0080-0.0150	0.0010	0.0010	0.0050
Greeks	0.0000	0.0057	0.0084		0.0035	0.0039	0.0108

Programs for calculating F_ST

Arlequin
Fstat
DnaSP
Modules for calculating F_ST
[|BioPerl]

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Fixation index

Definition

Estimation

Interpretation

FST in humans

Autosomal genetic distances based on classical markers

Autosomal genetic distances based on SNPs

Programs for calculating FST

Modules for calculating FST

F_ST in humans

Programs for calculating F_ST

Modules for calculating F_ST