Multimedia information retrieval

Multimedia information retrieval is a research discipline of computer science that aims at extracting semantic information from multimedia data sources. Data sources include directly perceivable media such as audio, image and video, indirectly perceivable sources such as text, semantic descriptions, biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:

Methods for the summarization of media content. The result of feature extraction is a description.
Methods for the filtering of media descriptions
Methods for the categorization of media descriptions into classes.
Feature extraction methods

Feature extraction is motivated by the sheer size of multimedia objects as well as their redundancy and, possibly, noisiness. Generally, two possible goals can be achieved by feature extraction:

Summarization of media content. Methods for summarization include in the audio domain, for example, mel-frequency cepstral coefficients, Zero Crossings Rate, Short-Time Energy. In the visual domain, color histograms such as the MPEG-7 Scalable Color Descriptor can be used for summarization.
Detection of patterns by auto-correlation and/or cross-correlation. Patterns are recurring media chunks that can either be detected by comparing chunks over the media dimensions or comparing media chunks to templates. Typical methods include Linear Predictive Coding in the audio/biosignal domain, texture description in the visual domain and n-grams in text information retrieval.
Merging and filtering methods

Multimedia Information Retrieval implies that multiple channels are employed for the understanding of media content. Each of this channels is described by media-specific feature transformations. The resulting descriptions have to be merged to one description per media object. Merging can be performed by simple concatenation if the descriptions are of fixed size. Variable-sized descriptions – as they frequently occur in motion description – have to be normalized to a fixed length first.
Frequently used methods for description filtering include factor analysis, singular value decomposition and the extraction and testing of statistical moments. Advanced concepts such as the Kalman filter are used for merging of descriptions.

Categorization methods

Generally, all forms of machine learning can be employed for the categorization of multimedia descriptions though some methods are more frequently used in one area than another. For example, hidden Markov models are state-of-the-art in speech recognition, while dynamic time warping – a semantically related method – is state-of-the-art in gene sequence alignment. The list of applicable classifiers includes the following:

Metric approaches
Nearest Neighbor methods
Risk Minimization
Density-based Methods
Neural Networks
Heuristics

The selection of the best classifier for a given problem can be performed automatically, for example, using the Weka Data Miner.

Open problems

The quality of MMIR Systems depends heavily on the quality of the training data. Discriminative descriptions can be extracted from media sources in various forms. Machine learning provides categorization methods for all types of data. However, the classifier can only be as good as the given training data. On the other hand, it requires considerable effort to provide class labels for large databases. The future success of MMIR will depend on the provision of such data. The annual TRECVID competition is currently one of the most relevant sources of high-quality ground truth.

Related areas

MMIR provides an overview over methods employed in the areas of information retrieval. Methods of one area are adapted and employed on other types of media. Multimedia content is merged before the classification is performed. MMIR methods are, therefore, usually reused from other areas such as:

The International Journal of Multimedia Information Retrieval documents the development of MMIR as a research discipline that is independent of these areas. See also Handbook of Multimedia Information Retrieval for a complete overview over this research discipline.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Multimedia information retrieval

Feature extraction methods

Merging and filtering methods

Categorization methods

Open problems

Related areas