Multi-label classification

In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes; in the multi-label problem there is no constraint on how many of the classes the instance can be assigned to.
Formally, multi-label classification is the problem of finding a model that maps inputs x to binary vectors y.

Problem transformation methods

Several problem transformation methods exist for multi-label classification, and can be roughly broken down into:

Transformation into binary classification problems: the baseline approach, called the binary relevance method, amounts to independently training one binary classifier for each label. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective classifiers predict a positive result. Although this method of dividing the task into multiple binary tasks may resemble superficially the one-vs.-all and one-vs.-rest methods for multiclass classification, it is essentially different from both, because a single classifier under binary relevance deals with a single label, without any regard to other labels whatsoever. A classifier chain is an alternative method for transforming a multi-label classification problem into several binary classification problems. It differs from binary relevance in that labels are predicted sequentially, and the output of all previous classifiers are input as features to subsequent classifiers. Classifier chains have been applied, for instance, in HIV drug resistance prediction. Bayesian network has also been applied to optimally order classifiers in Classifier chains.
Transformation into multi-class classification problem: The label powerset transformation creates one binary classifier for every label combination present in the training set. For example, if possible labels for an example were A, B, and C, the label powerset representation of this problem is a multi-class classification problem with the classes , , , , , , . where for example denotes an example where labels A and C are present and label B is absent.
Ensemble methods: A set of multi-class classifiers can be used to create a multi-label ensemble classifier. For a given example, each classifier outputs a single class. These predictions are then combined by an ensemble method, usually a voting scheme where every class that receives a requisite percentage of votes from individual classifiers is predicted as a present label in the multi-label output. However, more complex ensemble methods exist, such as committee machines. Another variation is the random -labelsets algorithm, which uses multiple LP classifiers, each trained on a random subset of the actual labels; label prediction is then carried out by a voting scheme. A set of multi-label classifiers can be used in a similar way to create a multi-label ensemble classifier. In this case, each classifier votes once for each label it predicts rather than for a single label.
Adapted algorithms

Some classification algorithms/models have been adapted to the multi-label task, without requiring problem transformations. Examples of these incl for multi-label data.

k-nearest neighbors: the ML-kNN algorithm extends the k-NN classifier to multi-label data.
decision trees: "Clare" is an adapted C4.5 algorithm for multi-label classification; the modification involves the entropy calculations. MMC, MMDT, and SSC refined MMDT, can classify multi-labeled data based on multi-valued attributes without transforming the attributes into single-values. They are also named multi-valued and multi-labeled decision tree classification methods.
kernel methods for vector output
neural networks: BP-MLL is an adaptation of the popular back-propagation algorithm for multi-label learning.
Learning paradigms

Based on learning paradigms, the existing multi-label classification techniques can be classified into batch learning and online machine learning. Batch learning algorithms require all the data samples to be available beforehand. It trains the model using the entire training data and then predicts the test sample using the found relationship. The online learning algorithms, on the other hand, incrementally build their models in sequential iterations. In iteration t, an online algorithm receives a sample, x_t and predicts its label ŷ_t using the current model; the algorithm then receives y_t, the true label of x_t and updates its model based on the sample-label pair:.

Multi-label stream classification

s are possibly infinite sequences of data that continuously and rapidly grow over time. Multi-label stream classification is the version of multi-label classification task that takes place in data streams. It is sometimes also called online multi-label classification. The difficulties of multi-label classification are combined with difficulties of data streams.
Many MLSC methods resort to ensemble methods in order to increase their predictive performance and deal with concept drifts. Below are the most widely used ensemble methods in the literature:

Online Bagging -based methods: Observing the probability of having K many of a certain data point in a bootstrap sample is approximately Poisson for big datasets, each incoming data instance in a data stream can be weighted proportional to Poisson distribution to mimic bootstrapping in an online setting. This is called Online Bagging. Many multi-label methods that use Online Bagging are proposed in the literature, each of which utilizes different problem transformation methods. EBR, ECC, EPS, E_BRT, E_BMT, ML-Random Rules are examples of such methods.
ADWIN Bagging-based methods: Online Bagging methods for MLSC are sometimes combined with explicit concept drift detection mechanisms such as ADWIN. ADWIN keeps a variable-sized window to detect changes in the distribution of the data, and improves the ensemble by resetting the components that perform poorly when there is a drift in the incoming data. Generally, the letter 'a' is used as a subscript in the name of such ensembles to indicate the usage of ADWIN change detector. E_aBR, E_aCC, E_aHT_PS are examples of such multi-label ensembles.
GOOWE-ML-based methods: Interpreting the relevance scores of each component of the ensemble as vectors in the label space and solving a least squares problem at the end of each batch, Geometrically-Optimum Online-Weighted Ensemble for Multi-label Classification is proposed. The ensemble tries to minimize the distance between the weighted prediction of its components and the ground truth vector for each instance over a batch. Unlike Online Bagging and ADWIN Bagging, GOOWE-ML utilizes a weighted voting scheme where better performing components of the ensemble are given more weight. The GOOWE-ML ensemble grows over time, and the lowest weight component is replaced by a new component when it is full at the end of a batch. GOBR, GOCC, GOPS, GORT are the proposed GOOWE-ML-based multi-label ensembles.
Multiple Windows : Here, BR models that use a sliding window are replaced with two windows for each label, one for relevant and one for non-relevant examples. Instances are oversampled or undersampled according to a load factor that is kept between these two windows. This allows concept drifts that are independent for each label to be detected, and class-imbalance to be handled.
Statistics and evaluation metrics

Considering to be a set of labels for data sample, the extent to which a dataset is multi-label can be captured in two statistics:

Label cardinality is the average number of labels per example in the set: ;
Label density is the number of labels per sample divided by the total number of labels, averaged over the samples: where.

Evaluation metrics for multi-label classification performance are inherently different from those used in multi-class classification, due to the inherent differences of the classification problem. If denotes the true set of labels for a given sample, and the predicted set of labels, then the following metrics can be defined on that sample:

Hamming loss: the fraction of the wrong labels to the total number of labels, i.e., where is the target and is the prediction. This is a loss function, so the optimal value is zero.
The closely related Jaccard index, also called Intersection over Union in the multi-label setting, is defined as the number of correctly predicted labels divided by the union of predicted and true labels,, where and are sets of predicted labels and true labels respectively.
Precision, recall and score: precision is, recall is, and is their harmonic mean.
Exact match : is the most strict metric, indicating the percentage of samples that have all their labels classified correctly.

Cross-validation in multi-label settings is complicated by the fact that the ordinary way of stratified sampling will not work; alternative ways of approximate stratified sampling have been suggested.

Implementations and datasets

Java implementations of multi-label algorithms are available in the and software packages, both based on Weka.
The scikit-learn Python package implements some .
The binary relevance method, classifier chains and other multilabel algorithms with a lot of different base learners are implemented in the R-package
A list of commonly used multi-label data-sets is available at the .

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Multi-label classification

Problem transformation methods

Adapted algorithms

Learning paradigms

Multi-label stream classification

Statistics and evaluation metrics

Implementations and datasets