Lazy learning


In machine learning, lazy learning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to in eager learning, where the system tries to generalize the training data before receiving queries.
The primary motivation for employing lazy learning, as in the K-nearest neighbors algorithm, used by online recommendation systems is that the data set is continuously updated with new entries. Because of the continuous update, the "training data" would be rendered obsolete in a relatively short time especially in areas like books and movies, where new best-sellers or hit movies/music are published/released continuously. Therefore, one cannot really talk of a "training phase".
Lazy classifiers are most useful for large, continuously changing datasets with few attributes that are commonly queried. Specifically, even if a large set of attributes exist - for example, books have a year of publication, author/s, publisher, title, edition, ISBN, selling price, etc. - recommendation queries rely on far fewer attributes - e.g., purchase or viewing co-occurrence data, and user ratings of items purchased/viewed.

Advantages

The main advantage gained in employing a lazy learning method is that the target function will be approximated locally, such as in the k-nearest neighbor algorithm. Because the target function is approximated locally for each query to the system, lazy learning systems can simultaneously solve multiple problems and deal successfully with changes in the problem domain. It is said that the advantage of this system is achieved if the predictions using a single training set are only developed for few objects. This can be demonstrated in the case of the k-NN technique, which is instance-based and function is only estimated locally.

Disadvantages

Theoretical disadvantages with lazy learning include:
There are standard techniques to improve re-computation efficiency so that a particular answer is not recomputed unless the data that impact this answer has changed. In other words, the stored answers are updated incrementally.
This approach, used by large e-commerce or media sites, has long been used in the Entrez portal of the National Center for Biotechnology Information to precompute similarities between the different items in its large datasets: biological sequences, 3-D protein structures, published-article abstracts, etc. Because "find similar" queries are asked so frequently, the NCBI uses highly parallel hardware to perform nightly recomputation. The recomputation is performed only for new entries in the datasets against each other and against existing entries: the similarity between two existing entries need not be recomputed.

Examples of Lazy Learning Methods