Algorithm selection


Algorithm selection is a meta-algorithmic technique to choose an algorithm from a portfolio on an instance-by-instance basis. It is motivated by the observation that on many practical problems, algorithms have different performances. That is, while one algorithm performs well on some instances, it performs poorly on others and vice versa for another algorithm. If we can identify when to use which algorithm, we can get the best of both worlds and improve overall performance. This is what algorithm selection aims to do. The only prerequisite for applying algorithm selection techniques is that there exists a set of complementary algorithms.

Definition

Given a portfolio of algorithms, a set of instances and a cost metric, the algorithm selection problem consists of finding a mapping from instances to algorithms such that the cost across all instances is optimized.

Examples

Boolean satisfiability problem (and other hard combinatorial problems)

A well-known application of algorithm selection is the Boolean satisfiability problem. Here, the portfolio of algorithms is a set of SAT solvers, the instances are Boolean formulas, the cost metric is for example average runtime or number of unsolved instances. So, the goal is to select a well-performing SAT solver for each individual instance. In the same way, algorithm selection can be applied to many other -hard problems. Competition-winning systems in SAT are SATzilla, 3S and CSHC

Machine learning

In machine learning, algorithm selection is better known as meta-learning. The portfolio of algorithms consists of machine learning algorithms, the instances are data sets and the cost metric is for example the error rate. So, the goal is to predict which machine learning algorithm will have a small error on each data set.

Instance features

The algorithm selection problem is mainly solved with machine learning techniques. By representing the problem instances by numerical features, algorithm selection can be seen as a multi-class classification problem by learning a mapping for a given instance.
Instance features are numerical representations of instances. For example, we can count the number of variables, clauses, average clause length for Boolean formulas, or number of samples, features, class balance for ML data sets to get an impression about their characteristics.

Static vs. probing features

We distinguish between two kinds of features:
  1. Static features are in most cases some counts and statistics. These features ranges from very cheap features to very complex features.
  2. Probing features are computed by running some analysis of algorithm behavior on an instance. These feature often cost more than simple static features.

    Feature costs

Depending on the used performance metric, feature computation can be associated with costs.
For example, if we use running time as performance metric, we include the time to compute our instance features into the performance of an algorithm selection system.
SAT solving is a concrete example, where such feature costs cannot be neglected, since instance features for CNF formulas can be either very cheap or very expensive.
It is important to take the overhead of feature computation into account in practice in such scenarios; otherwise a misleading impression of the performance of the algorithm selection approach is created. For example, if the decision which algorithm to choose can be made with perfect accuracy, but the features are the running time of the portfolio algorithms, there is no benefit to the portfolio approach. This would not be obvious if feature costs were omitted.

Approaches

Regression approach

One of the first successful algorithm selection approaches predicted the performance of each algorithm and selected the algorithm with the best predicted performance for an instance.

Clustering approach

A common assumption is that the given set of instances can be clustered into homogeneous subsets
and for each of these subsets, there is one well-performing algorithm for all instances in there.
So, the training consists of identifying the homogeneous clusters via an unsupervised clustering approach and associating an algorithm with each cluster.
A new instance is assigned to a cluster and the associated algorithm selected.
A more modern approach is cost-sensitive hierarchical clustering using supervised learning to identify the homogeneous instance subsets.

Pairwise cost-sensitive classification approach

A common approach for multi-class classification is to learn pairwise models between every pair of classes
and choose the class that was predicted most often by the pairwise models.
We can weight the instances of the pairwise prediction problem by the performance difference between the two algorithms.
This is motivated by the fact that we care most about getting predictions with large differences correct, but the penalty for an incorrect prediction is small if there is almost no performance difference.
Therefore, each instance for training a classification model vs is associated with a cost.

Requirements

The algorithm selection problem can be effectively applied under the following assumptions:
Algorithm selection is not limited to single domains but can be applied to any kind of algorithm if the above requirements are satisfied.
Application domains include:
For an extensive list of literature about algorithm selection, we refer to a literature overview.

Variants of algorithm selection

Online selection

Online algorithm selection refers to switching between different algorithms during the solving process. This is useful as a hyper-heuristic. In contrast, offline algorithm selection selects an algorithm for a given instance only once and before the solving process.

Computation of schedules

An extension of algorithm selection is the per-instance algorithm scheduling problem, in which we do not select only one solver, but we select a time budget for each algorithm on a per-instance base. This approach improves the performance of selection systems in particular if the instance features are not very informative and a wrong selection of a single solver is likely.

Selection of parallel portfolios

Given the increasing importance of parallel computation,
an extension of algorithm selection for parallel computation is parallel portfolio selection,
in which we select a subset of the algorithms to simultaneously run in a parallel portfolio.