Optical music recognition

Optical Music Recognition is a field of research that investigates how to computationally read music notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI and MusicXML.
In the past it has, misleadingly, also been called Music OCR. Due to significant differences, this term should no longer be used.

History

Optical music recognition of printed sheet music started in the late 1960s at MIT when the first image scanners became affordable for research institutes. Due to the limited memory of early computers, the first attempts were limited to only a few measures of music.
In 1984, a Japanese research group from Waseda University developed a specialized robot, called WABOT, which was capable of reading the music sheet in front of it and accompanying a singer on an electric organ.
Early research in OMR was conducted by Ichiro Fujinaga, Nicholas Carter, Kia Ng, David Bainbridge, and Tim Bell. These researchers developed many of the techniques that are still being used today.
The first commercial OMR application, MIDISCAN, was released in 1991 by Musitek Corporation.
The availability of smartphones with good cameras and sufficient computational power, paved the way to mobile solutions where the user takes a picture with the smartphone and the device directly processes the image.

Relation to other fields

Optical music recognition relates to other fields of research, including computer vision, document analysis, and music information retrieval. It is relevant for practicing musicians and composers that could use OMR systems as a means to enter music into the computer and thus ease the process of composing, transcribing, and editing music. In a library, an OMR system could make music scores searchable and for musicologists it would allow to conduct quantitative musicological studies at scale.

OMR vs. OCR

Optical music recognition has frequently been compared to Optical character recognition. The biggest difference is that music notation is a featural writing system. This means that while the alphabet consists of well-defined primitives, it is their configuration – how they are placed and arranged on the staff – that determines the semantics and how it should be interpreted.
The second major distinction is the fact that while an OCR system does not go beyond recognizing letters and words, an OMR system is expected to also recover the semantics of music: The user expects that the vertical position of a note is being translated into the pitch by applying the rules of music notation. Notice that there is no proper equivalent in text recognition. By analogy, recovering the music from an image of a music sheet can be as challenging as recovering the HTML source code from the screenshot of a website.
The third difference comes from the used character set. Although writing systems like Chinese have extraordinarily complex character sets, the character set of primitives for OMR spans a much greater range of sizes, ranging from tiny elements such as a dot to big elements that potentially span an entire page such as a brace. Some symbols have a nearly unrestricted appearance like slurs, that are only defined as more-or-less smooth curves that may be interrupted anywhere.
Finally, music notation involves ubiquitous two-dimensional spatial relationships, whereas text can be read as a one-dimensional stream of information, once the baseline is established.

Approaches to OMR

The process of recognizing music scores is typically broken down into smaller steps that are handled with specialized pattern recognition algorithms.
Many competing approaches have been proposed with most of them sharing a pipeline architecture, where each step in this pipeline performs a certain operation, such as detecting and removing staff lines before moving on to the next stage. A common problem with that approach is that errors and artifacts that were made in one stage are propagated through the system and can heavily affect the performance. For example, if the staff line detection stage fails to correctly identify the existence of the music staffs, subsequent steps will probably ignore that region of the image, leading to missing information in the output.
Optical music recognition is frequently underestimated due to the seemingly easy nature of the problem: If provided with a perfect scan of typeset music, the visual recognition can be solved with a sequence of fairly simple algorithms, such as projections and template matching. However, the process gets significantly harder for poor scans or handwritten music, which many systems fail to recognize altogether. And even if all symbols would have been detected perfectly, it is still challenging to recover the musical semantics due to ambiguities and frequent violations of the rules of music notation. Donald Byrd and Jakob Simonsen argue that OMR is difficult because modern music notation is extremely complex.
Donald Byrd also collected a number of interesting examples as well as extreme examples of music notation that demonstrate the sheer complexity of music notation.

Outputs of OMR systems

Typical applications for OMR systems include the creation of an audible version of the music score. A common way to create such a version is by generating a MIDI file, which can be synthesised into an audio file. MIDI files, though, are not capable of storing engraving information or enharmonic spelling.
If the music scores are recognized with the goal of human readability, the structured encoding has to be recovered, which includes precise information on the layout and engraving. Suitable formats to store this information include MEI and MusicXML.
Apart from those two applications, it might also be interesting to just extract metadata from the image or enable searching. In contrast to the first two applications, a lower level of comprehension of the music score might be sufficient to perform these tasks.

General framework (2001)

In 2001, David Bainbridge and Tim Bell published their work on the challenges of OMR, where they reviewed previous research and extracted a general framework for OMR. Their framework has been used by many systems developed after 2001. The framework has four distinct stages with a heavy emphasis on the visual detection of objects. They noticed that the reconstruction of the musical semantics was often omitted from published articles because the used operations were specific to the output format.

Refined framework (2012)

In 2012, Ana Rebelo et al. surveyed techniques for optical music recognition. They categorized the published research and refined the OMR pipeline into the four stages: Preprocessing, Music symbols recognition, Musical notation reconstruction and Final representation construction. This framework became the de-facto standard for OMR and is still being used today. For each block, they give an overview of techniques that are used to tackle that problem. This publication is the most cited paper on OMR research as of 2019.

Deep learning (since 2016)

With the advent of deep learning, many computer vision problems have shifted from imperative programming with hand-crafted heuristics and feature engineering towards machine learning. In optical music recognition, the staff processing stage, the music object detection stage, as well as the music notation reconstruction stage have seen successful attempts to solve them with deep learning.
Even completely new approaches have been proposed, including solving OMR in an end-to-end fashion with sequence-to-sequence models, that take an image of music scores and directly produce the recognized music in a simplified format.

Notable scientific projects

Staff removal challenge

For systems that were developed before 2016, staff detection and removal posed a significant obstacle. A scientific competition was organized to improve the state of the art and advance the field. Due to excellent results and modern techniques that made the staff removal stage obsolete, this competition was discontinued.
However, the freely available CVC-MUSCIMA dataset that was developed for this challenge is still highly relevant for OMR research as it contains 1000 high-quality images of handwritten music scores, transcribed by 50 different musicians. It has been further extended into the MUSCIMA++ dataset, which contains detailed annotations for 140 out of 1000 pages.

SIMSSA

The Single Interface for Music Score Searching and Analysis project is probably the largest project that attempts to teach computers to recognize musical scores and make them accessible. Several sub-projects have already been successfully completed, including the Liber Usualis and Cantus Ultimus.

TROMPA

Towards Richer Online Music Public-domain Archives is an international research project, sponsored by the European Union that investigates how to make public-domain digital music resources more accessible.

Datasets

The development of OMR systems benefits from test datasets of sufficient size and diversity to ensure the system being developed works under various conditions. However, due to legal reasons and potential copyright violations, it is challenging to compile and publish such a dataset. The most notable datasets for OMR are referenced and summarized by the OMR Datasets project and include the CVC-MUSCIMA, MUSCIMA++, DeepScores, PrIMuS, HOMUS, and SEILS dataset, as well as the Universal Music Symbol Collection.

Software

Academic and open-source software

Many OMR projects have been realized in academia, but only a few of them reached a mature state and were successfully deployed to users. These systems are:

Aruspix
Audiveris
CANTOR
Gamera
DMOS
OpenOMR
Rodan
Commercial software

Most of the commercial desktop applications that were developed in the last 20 years have been shut down again due to the lack of commercial success, leaving only a few vendors that are still developing, maintaining, and selling OMR products.
Some of these products claim extremely high recognition rates with up to 100% accuracy but fail to disclose how those numbers were obtained, making it nearly impossible to verify them and compare different OMR systems.
Apart from the desktop applications, a range of mobile applications have emerged as well, but received mixed reviews on the Google Play store and were probably discontinued. A range of OMR apps can also be found for iPhone and iPad devices in the Apple Store.

capella-scan
ForteScan Light by Fortenotation now Scan Score
MIDI-Connections Scan by MIDI-Connections
MP Scan by Braeburn. Uses SharpEye SDK.
NoteScan bundled with Nightingale
OMeR Add-on for Harmony Assistant and Melody Assistant: Myriad Software
PDFtoMusic
PhotoScore by Neuratron. The Light version of PhotoScore is used in Sibelius. PhotoScore uses the SharpEye SDK.
PlayScore by Organum Limited.
Scorscan by npcImaging. Based on SightReader
SharpEye by Visiv
* VivaldiScan
SmartScore by Musitek. Formerly packaged as "MIDISCAN"..
ScanScore

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...