Learning to rank
Learning to rank or machine-learned ranking is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The ranking model purposes to rank, i.e. producing a permutation of items in new, unseen lists in a similar way to rankings in the training data.
Applications
In information retrieval
Ranking is a central part of many information retrieval problems, such as document retrieval, collaborative filtering, sentiment analysis, and online advertising.A possible architecture of a machine-learned search engine is shown in the accompanying figure.
Training data consists of queries and documents matching them together with relevance degree of each match. It may be prepared manually by human assessors,
who check results for some queries and determine relevance of each result. It is not feasible to check the relevance of all documents, and so typically a technique called pooling is used — only the top few documents, retrieved by some existing ranking models are checked. Alternatively, training data may be derived automatically by analyzing clickthrough logs, query chains, or such search engines' features as Google's SearchWiki.
Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries.
Typically, users expect a search query to complete in a short time, which makes it impossible to evaluate a complex ranking model on each document in the corpus, and so a two-phase scheme is used. First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as the vector space model, boolean model, weighted AND, or BM25. This phase is called top- document retrieval and many heuristics were proposed in the literature to accelerate it, such as using a document's static quality score and tiered indexes. In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents.
In other areas
Learning to rank algorithms have been applied in areas other than information retrieval:- In machine translation for ranking a set of hypothesized translations;
- In computational biology for ranking candidate 3-D structures in protein structure prediction problem.
- In recommender systems for identifying a ranked list of related news articles to recommend to a user after he or she has read a current news article.
- In software engineering, learning-to-rank methods have been used for fault localization.
Feature vectors
Components of such vectors are called features, factors or ranking signals. They may be divided into three groups :
- Query-independent or static features — those features, which depend only on the document, but not on the query. For example, PageRank or document's length. Such features can be precomputed in off-line mode during indexing. They may be used to compute document's static quality score, which is often used to speed up search query evaluation.
- Query-dependent or dynamic features — those features, which depend both on the contents of the document and the query, such as TF-IDF score or other non-machine-learned ranking functions.
- Query level features or query features, which depend only on the query. For example, the number of words in a query. Further information: query level feature
- TF, TF-IDF, BM25, and language modeling scores of document's zones for a given query;
- Lengths and IDF sums of document's zones;
- Document's PageRank, HITS ranks and their variants.
Evaluation measures
There are several measures which are commonly used to judge how well an algorithm is doing on training data and to compare the performance of different MLR algorithms. Often a learning-to-rank problem is reformulated as an optimization problem with respect to one of these metrics.Examples of ranking quality measures:
- Mean average precision ;
- DCG and NDCG;
- Precision@n, NDCG@n, where "@n" denotes that the metrics are evaluated only on top n documents;
- Mean reciprocal rank;
- Kendall's tau;
- Spearman's rho.
Recently, there have been proposed several new evaluation metrics which claim to model user's satisfaction with search results better than the DCG metric:
- Expected reciprocal rank ;
- Yandex's pfound.
Approaches
Tie-Yan Liu of Microsoft Research Asia has analyzed existing algorithms for learning to rank problems in his paper "Learning to Rank for Information Retrieval". He categorized them into three groups by their input representation and loss function: the pointwise, pairwise, and listwise approach. In practice, listwise approaches often outperform pairwise approaches and pointwise approaches. This statement was further supported by a large scale experiment on the performance of different learning-to-rank methods on a large collection of benchmark data sets.Pointwise approach
In this case, it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then the learning-to-rank problem can be approximated by a regression problem — given a single query-document pair, predict its score.A number of existing supervised machine learning algorithms can be readily used for this purpose. Ordinal regression and classification algorithms can also be used in pointwise approach when they are used to predict the score of a single query-document pair, and it takes a small, finite number of values.
Pairwise approach
In this case, the learning-to-rank problem is approximated by a classification problem — learning a binary classifier that can tell which document is better in a given pair of documents. The goal is to minimize the average number of inversions in ranking.Listwise approach
These algorithms try to directly optimize the value of one of the above evaluation measures, averaged over all queries in the training data. This is difficult because most evaluation measures are not continuous functions with respect to ranking model's parameters, and so continuous approximations or bounds on evaluation measures have to be used.List of methods
A partial list of published learning-to-rank algorithms is shown below with years of first publication of each method:Regularized least-squares based ranking. The work is extended in
to learning to rank from general preference graphs.
Note: as most supervised learning algorithms can be applied to pointwise case, only those methods which are specifically designed with ranking in mind are shown above.
History
introduced the general idea of MLR in 1992, describing learning approaches in information retrieval as a generalization of parameter estimation; a specific variant of this approach had been published by him three years earlier. Bill Cooper proposed logistic regression for the same purpose in 1992 and used it with his Berkeley research group to train a successful ranking function for TREC. Manning et al. suggest that these early works achieved limited results in their time due to little available training data and poor machine learning techniques.Several conferences, such as NIPS, SIGIR and ICML had workshops devoted to the learning-to-rank problem since mid-2000s.
Practical usage by search engines
Commercial web search engines began using machine learned ranking systems since the 2000s. One of the first search engines to start using it was AltaVista, which launched a gradient boosting-trained ranking function in April 2003.Bing's search is said to be powered by algorithm, which was invented at Microsoft Research in 2005.
In November 2009 a Russian search engine Yandex announced that it had significantly increased its search quality due to deployment of a new proprietary MatrixNet algorithm, a variant of gradient boosting method which uses oblivious decision trees. Recently they have also sponsored a machine-learned ranking competition "Internet Mathematics 2009" based on their own search engine's production data. Yahoo has announced a similar competition in 2010.
As of 2008, Google's Peter Norvig denied that their search engine exclusively relies on machine-learned ranking. Cuil's CEO, Tom Costello, suggests that they prefer hand-built models because they can outperform machine-learned models when measured against metrics like click-through rate or time on landing page, which is because machine-learned models "learn what people say they like, not what people actually like".
In January 2017 the technology was included in the open source search engine Apache Solr™, thus making machine learned search rank widely accessible also for enterprise search.
Vulnerabilities
Similar to recognition applications in computer vision, recent neural network based ranking algorithms are also found to be susceptible to covert adversarial attacks, both on the candidates and the queries. With small perturbations imperceptible to human beings, ranking order could be arbitrarily altered. In addition, model-agnostic transferable adversarial examples are found to be possible, which enables black-box adversarial attacks on deep ranking systems without requiring access to their underlying implementations.Conversely, the robustness of such ranking systems can be improved via adversarial defenses such as the Madry defense.