Minimum evolution


Minimum evolution is a distance method employed in phylogenetics modeling. It shares with maximum parsimony the aspect of searching for the phylogeny that has the shortest total sum of branch lengths.
The theoretical foundations of the minimum evolution criterion lay in the seminal works of both Kidd and Sgaramella-Zonta and Rzhetsky and Nei. In these frameworks, the molecular sequences from taxa are replaced by a set of measures of their dissimilarity and a fundamental result states that if such distances were unbiased estimates of the true evolutionary distances from taxa, then the true phylogeny of taxa would have an expected length shorter than any other possible phylogeny T compatible with those distances.

Relationships and differences with maximum parsimony

It is worth noting here a subtle difference between the maximum-parsimony criterion and the ME criterion: while maximum-parsimony is based on an abductive heuristic, i.e., the plausibility of the simplest evolutionary hypothesis of taxa with respect to the more complex ones, the ME criterion is based on Kidd and Sgaramella-Zonta's conjectures that were proven true 22 years later by Rzhetsky and Nei. These mathematical results set the ME criterion free from the Occam's razor principle and confer it a solid theoretical and quantitative basis.

Statistical consistency

The ME criterion is known to be statistically consistent whenever the branch lengths are estimated via the Ordinary Least-Squares or via linear programming.

However, as observed in Rzhetsky & Nei's article, the phylogeny having the minimum length under the OLS branch length estimation model may be characterized, in some circumstance, by negative branch lengths, which unfortunately are empty of biological meaning.
To solve this drawback, Pauplin proposed to replace OLS with a new particular branch length estimation model, known as Balanced Minimum Evolution. Richard Desper and Olivier Gascuel showed that the BME branch length estimation model ensures the general statistical consistency of the minimum length phylogeny as well as the non-negativity of its branch lengths, whenever the estimated evolutionary distances from taxa satisfy the triangle inequality.
Le Sy Vinh and Arndt von Haeseler have shown, by means of massive and systematic simulation experiments, that the accuracy of the ME criterion under the BME branch length estimation model is by far the highest in distance methods and not inferior to those of alternative criteria based e.g., on Maximum Likelihood or Bayesian Inference. Moreover, as shown by Daniele Catanzaro, Martin Frohn and Raffaele Pesenti, the minimum length phylogeny under the BME branch length estimation model can be interpreted as the consensus tree between concurrent minimum entropy processes encoded by a forest of n phylogenies rooted on the n analyzed taxa. This particular information theory-based interpretation is conjectured to be shared by all distance methods in phylogenetics.

Algorithmic aspects

The search for the shortest length phylogeny is generally carried out by means of exact approaches, such as those described in as well as heuristics such as the neighbor-joining algorithm, FASTME, or other metaheuristics