Outline of object recognition

The following outline is provided as an overview of and topical guide to object recognition:
Object recognition - technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.

Approaches based on CAD-like object models

Edge detection
Primal sketch
Marr, Mohan and Nevatia
Lowe
Olivier Faugeras
Recognition by parts
Generalized cylinders
Geons
Dickinson, Forsyth and Ponce
Appearance-based methods
Use example images of the objects to perform recognition
Objects look different under varying conditions:
* Changes in lighting or color
* Changes in viewing direction
* Changes in size / shape
A single exemplar is unlikely to succeed reliably. However, it is impossible to represent all appearances of an object.
Edge matching
Uses edge detection techniques, such as the Canny edge detection, to find edges.
Changes in lighting and color usually don't have much effect on image edges
Strategy:
# Detect edges in template and image
# Compare edges images to find the template
# Must consider range of possible template positions
Measurements:
* Good – count the number of overlapping edges. Not robust to changes in shape
* Better – count the number of template edge pixels with some distance of an edge in the search image
* Best – determine probability distribution of distance to nearest edge in search image. Estimate likelihood of each template position generating image
Divide-and-Conquer search
Strategy:
* Consider all positions as a set
* Determine lower bound on score at best position in cell
* If bound is too large, prune cell
* If bound is not too large, divide cell into subcells and try each subcell recursively
* Process stops when cell is “small enough”
Unlike multi-resolution search, this technique is guaranteed to find all matches that meet the criterion
Finding the Bound:
* To find the lower bound on the best score, look at score for the template position represented by the center of the cell
* Subtract maximum change from the “center” position for any other position in cell
Complexities arise from determining bounds on distance
Greyscale matching
Edges are robust to illumination changes, however they throw away a lot of information
Must compute pixel distance as a function of both pixel position and pixel intensity
Can be applied to color also
Gradient matching
Another way to be robust to illumination changes without throwing away as much information is to compare image gradients
Matching is performed like matching greyscale images
Simple alternative: Use correlation
Histograms of receptive field responses
Avoids explicit point correspondences
Relations between different image points implicitly coded in the receptive field responses
Swain and Ballard, Schiele and Crowley, Linde and Lindeberg
Large modelbases
One approach to efficiently searching the database for a specific image to use eigenvectors of the templates
Modelbases are a collection of geometric models of the objects that should be recognised
Feature-based methods
a search is used to find feasible matches between object features and image features.
the primary constraint is that a single position of the object must account for all of the feasible matches.
methods that extract features from the objects to be recognized and the images to be searched.
* surface patches
* corners
* linear edges
Interpretation trees
A method for searching for feasible matches, is to search through a tree.
Each node in the tree represents a set of matches.
* Root node represents empty set
* Each other node is the union of the matches in the parent node and one additional match.
* Wildcard is used for features with no match
Nodes are “pruned” when the set of matches is infeasible.
* A pruned node has no children
Historically significant and still used, but less commonly
Hypothesize and test
General Idea:
* Hypothesize a correspondence between a collection of image features and a collection of object features
* Then use this to generate a hypothesis about the projection from the object coordinate frame to the image frame
* Use this projection hypothesis to generate a rendering of the object. This step is usually known as backprojection
* Compare the rendering to the image, and, if the two are sufficiently similar, accept the hypothesis
Obtaining Hypothesis:
* There are a variety of different ways of generating hypotheses.
* When camera intrinsic parameters are known, the hypothesis is equivalent to a hypothetical position and orientation – pose – for the object.
* Utilize geometric constraints
* Construct a correspondence for small sets of object features to every correctly sized subset of image points.
Three basic approaches:
* Obtaining Hypotheses by Pose Consistency
* Obtaining Hypotheses by Pose Clustering
* Obtaining Hypotheses by Using Invariants
Expense search that is also redundant, but can be improved using Randomization and/or Grouping
* Randomization
** Examining small sets of image features until likelihood of missing object becomes small
** For each set of image features, all possible matching sets of model features must be considered.
** Formula:
**: ^k = Z
*** W = the fraction of image points that are “good”
*** c = the number of correspondences necessary
*** k = the number of trials
*** Z = the probability of every trial using one incorrect correspondences
* Grouping
** If we can determine groups of points that are likely to come from the same object, we can reduce the number of hypotheses that need to be examined
Pose consistency
Also called Alignment, since the object is being aligned to the image
Correspondences between image features and model features are not independent – Geometric constraints
A small number of correspondences yields the object position – the others must be consistent with this
General Idea:
* If we hypothesize a match between a sufficiently large group of image features and a sufficiently large group of object features, then we can recover the missing camera parameters from this hypothesis
Strategy:
* Generate hypotheses using small number of correspondences
* Project other model features into image and verify additional correspondences
Use the smallest number of correspondences necessary to achieve discrete object poses
[Pose clustering]
General Idea:
* Each object leads to many correct sets of correspondences, each of which has the same pose
* Vote on pose. Use an accumulator array that represents pose space for each object
* This is essentially a Hough transform
Strategy:
* For each object, set up an accumulator array that represents pose space – each element in the accumulator array corresponds to a “bucket” in pose space.
* Then take each image frame group, and hypothesize a correspondence between it and every frame group on every object
* For each of these correspondences, determine pose parameters and make an entry in the accumulator array for the current object at the pose value.
* If there are large numbers of votes in any object's accumulator array, this can be interpreted as evidence for the presence of that object at that pose.
* The evidence can be checked using a verification method
Note that this method uses sets of correspondences, rather than individual correspondences
* Implementation is easier, since each set yields a small number of possible object poses.
Improvement
* The noise resistance of this method can be improved by not counting votes for objects at poses where the vote is obviously unreliable
: § For example, in cases where, if the object was at that pose, the object frame group would be invisible.
* These improvements are sufficient to yield working systems
Invariance">Invariant (physics)">Invariance
There are geometric properties that are invariant to camera transformations
Most easily developed for images of planar objects, but can be applied to other cases as well
[Geometric hashing]
An algorithm that uses geometric invariants to vote for object hypotheses
Similar to pose clustering, however instead of voting on pose, we are now voting on geometry
A technique originally developed for matching geometric features against a database of such features
Widely used for pattern-matching, CAD/CAM, and medical imaging.
It is difficult to choose the size of the buckets
It is hard to be sure what “enough” means. Therefore, there may be some danger that the table will get clogged.
[Scale-invariant feature transform] (SIFT)
Keypoints of objects are first extracted from a set of reference images and stored in a database
An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors.
Lowe
[Speeded Up Robust Features] (SURF)
A robust image detector & descriptor
The standard version is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT
Based on sums of approximated 2D Haar wavelet responses and made efficient use of integral images.
Bay et al.
Bag of words representations

Genetic algorithm

s can operate without prior knowledge of a given dataset and can develop recognition procedures without human intervention. A recent project achieved 100 percent accuracy on the benchmark motorbike, face, airplane and car image datasets from Caltech and 99.4 percent accuracy on fish species image datasets.

Other approaches

3D reconstruction
3D object recognition
Biologically inspired object recognition
Artificial neural networks and Deep Learning especially convolutional neural networks
Context
Explicit and implicit 3D object models
Fast indexing
Global scene representations
Gradient histograms
Stochastic grammars
Intraclass transfer learning
Object categorization from image search
Reflectance
Shape-from-shading
Template matching
Texture
Topic models
Unsupervised learning
Window-based detection
Deformable Part Model
Bingham distribution
Applications

Object recognition methods has the following applications:

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Outline of object recognition

Approaches based on CAD-like object models

Recognition by parts

Appearance-based methods

Edge matching

Divide-and-Conquer search

Greyscale matching

Gradient matching

Histograms of receptive field responses

Large modelbases

Feature-based methods

Interpretation trees

Hypothesize and test

Pose consistency

[Pose clustering]

Invariance">Invariant (physics)">Invariance

[Geometric hashing]

[Scale-invariant feature transform] (SIFT)

[Speeded Up Robust Features] (SURF)

Bag of words representations

Genetic algorithm

Other approaches

Applications

Surveys