Maximally stable extremal regions

In computer vision, maximally stable extremal regions are used as a method of blob detection in images. This technique was proposed by :cs:Jiří_Matas|Matas et al. to find correspondences between image elements from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition algorithms.

Terms and definitions

Image is a mapping. Extremal regions are well defined on images if:

is totally ordered.
An adjacency relation is defined.

Region is a contiguous subset of. Note that under this definition the region can contain "holes".
region boundary, which means the boundary of is the set of pixels adjacent to at least one pixel of but not belonging to. Again, in case of regions with "holes", the region boundary is not obliged to be connected subset of .
Extremal region is a region such that either for all or for all . As far as is totally ordered, we can reformulate these conditions as for maximum intensity region and for minimum intensity region, respectively. In this form we can use a notion of a threshold intensity value which separates the region and its boundary.
Maximally stable extremal region Let be a sequence of nested extremal regions. Extremal region is maximally stable if and only if has a local minimum at. is a parameter of the method. In practice, the subscript is not just a sequential index; means "region thresholded by intensity value ", which implies and.
The equation checks for regions that remain stable over a certain number of thresholds. If a region is not significantly larger than a region, region is taken as a maximally stable region.
The concept more simply can be explained by thresholding. All the pixels below a given threshold are 'black' and all those above or equal are 'white'. Given a source image, if a sequence of thresholded result images is generated where each image corresponds to an increasing threshold t, first a white image would be seen, then 'black' spots corresponding to local intensity minima will appear then grow larger. These 'black' spots will eventually merge, until the whole image is black. The set of all connected components in the sequence is the set of all extremal regions. In that sense, the concept of MSER is linked to the one of component tree of the image. The component tree indeed provide an easy way for implementing MSER.

Extremal regions

Extremal regions in this context have two important properties, that the set is closed under...

continuous transformation of image coordinates. This means it is affine invariant and it doesn't matter if the image is warped or skewed.
monotonic transformation of image intensities. The approach is of course sensitive to natural lighting effects as change of day light or moving shadows.
Advantages of MSER

Because the regions are defined exclusively by the intensity function in the region and the outer border, this leads to many key characteristics of the regions which make them useful. Over a large range of thresholds, the local binarization is stable in certain regions, and have the properties listed below.

Invariance to affine transformation of image intensities
Covariance to adjacency preserving transformation on the image domain
Stability: only regions whose support is nearly the same over a range of thresholds is selected.
Multi-scale detection without any smoothing involved, both fine and large structure is detected.

Note, however, that detection of MSERs in a scale pyramid improves repeatability, and number of correspondences across scale changes.

The set of all extremal regions can be enumerated in worst-case, where is the number of pixels in the image.
Comparison to other region detectors

In Mikolajczyk et al., six region detectors are studied. A summary of MSER performance in comparison to the other five follows.

Region density – in comparison to the others MSER offers the most variety detecting about 2600 regions for a textured blur scene and 230 for a light changed scene, and variety is generally considered to be good. Also MSER had a repeatability of 92% for this test.
Region size – MSER tended to detect many small regions, versus large regions which are more likely to be occluded or to not cover a planar part of the scene. Though large regions may be slightly easier to match.
Viewpoint change – MSER outperforms the five other region detectors in both the original images and those with repeated texture motifs.
Scale change – Following Hessian-affine detector, MSER comes in second under a scale change and in-plane rotation.
Blur – MSER proved to be the most sensitive to this type of change in image, which is the only area that this type of detection is lacking in.

Note however that this evaluation did not make use of multi-resolution detection, which has been shown to improve repeatability under blur.

Light change – MSER showed the highest repeatability score for this type of scene, with all the other having good robustness as well.

MSER consistently resulted in the highest score through many tests, proving it to be a reliable region detector.

Implementation

The original algorithm of Matas et al. is in the number of pixels. It proceeds by first sorting the pixels by intensity. This would take time, using BINSORT. After sorting, pixels are marked in the image, and the list of growing and merging connected components and their areas is maintained using the union-find algorithm. This would take time. In practice these steps are very fast. During this process, the area of each connected component as a function of intensity is stored producing a data structure. A merge of two components is viewed as termination of existence of the smaller component and an insertion of all pixels of the smaller component into the larger one. In the extremal regions, the 'maximally stable' ones are those corresponding to thresholds where the relative area change as a function of relative change of threshold is at a local minimum, i.e. the MSER are the parts of the image where local binarization is stable over a large range of thresholds.
The component tree is the set of all connected components of the thresholds of the image, ordered by inclusion. Efficient algorithms for computing it do exist. Thus this structure offers an easy way for implementing MSER.
More recently, Nister and Stewenius have proposed a truly worst-case method in, which is also much faster in practice. This algorithm is similar to the one of Ph. Salembier et al.

Robust wide-baseline algorithm

The purpose of this algorithm is to match MSERs to establish correspondence points between images. First MSER regions are computed on the intensity image and on the inverted image. Measurement regions are selected at multiple scales: the size of the actual region, 1.5x, 2x, and 3x scaled convex hull of the region. Matching is accomplished in a robust manner, so it is better to increase the distinctiveness of large regions without being severely affected by clutter or non-planarity of the region's pre-image. A measurement taken from an almost planar patch of the scene with stable invariant description are called a 'good measurement'. Unstable ones or those on non-planar surfaces or discontinuities are called 'corrupted measurements'. The robust similarity is computed:
For each on region regions from the other image with the corresponding i-th measurement nearest to are found and a vote is cast suggesting correspondence of A and each of. Votes are summed over all measurements, and using probability analysis, 'good measurements' can be picked out as the 'corrupt measurements' will likely spread their votes randomly. By applying RANSAC to the centers of gravity of the regions, a rough epipolar geometry can be computed. An affine transformation between pairs of potentially corresponding regions is computed, and correspondences define it up to a rotation, which is then determined by epipolar lines. The regions are then filtered, and the ones with correlation of their transformed images above a threshold are chosen. RANSAC is applied again with a more narrow threshold, and the final epipolar geometry is estimated by the eight-point algorithm.
This algorithm can be tested here :

Use in text detection

The MSER algorithm has been used in text detection by Chen by combining MSER with Canny edges. Canny edges are used to help cope with the weakness of MSER to blur. MSER is first applied to the image in question to determine the character regions. To enhance the MSER regions any pixels outside the boundaries formed by Canny edges are removed. The separation of the later provided by the edges greatly increase the usability of MSER in the extraction of blurred text.
An alternative use of MSER in text detection is the work by Shi using a graph model. This method again applies MSER to the image to generate preliminary regions. These are then used to construct a graph model based on the position distance and color distance between each MSER, which is treated as a node. Next the nodes are separated into foreground and background using cost functions. One cost function is to relate the distance from the node to the foreground and background. The other penalizes nodes for being significantly different from its neighbor. When these are minimized the graph is then cut to separate the text nodes from the non-text nodes. To enable text detection in a general scene, Neumann uses the MSER algorithm in a variety of projections. In addition to the greyscale intensity projection, he uses the red, blue, and green color channels to detect text regions that are color distinct but not necessarily distinct in greyscale intensity. This method allows for detection of more text than solely using the MSER+ and MSER- functions discussed above.

Extensions and adaptations

The MSER algorithm has been adapted to colour images, by replacing thresholding of the intensity function with agglomerative clustering, based on colour gradients.
The MSER algorithm can be used to detect regions based on color as opposed to intensity. This is done by Chavez by creating an intensity function for red, green, and blue in the HSV color space. The MSER algorithm is then run five times; over the three color pseudo-intensities and then over the grey scale intensities using the standard MSER+ and MSER- functions.
The MSER algorithm can be used to track colour objects, by performing MSER detection on the Mahalanobis distance to a colour distribution.
By detecting MSERs in multiple resolutions, robustness to blur, and scale change can be improved.
Other applications

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...