Natural scene perception

Natural scene perception refers to the process by which an agent visually takes in and interprets scenes that it typically encounters in natural modes of operation. This process has been modeled in several different ways that are guided by different concepts.

Debate over role of attention

One major dividing line between theories that explain natural scene perception is the role of attention. Some theories maintain the need for focused attention, while others claim that focused attention is not involved.
Focused attention played a partial role in early models of natural scene perception. Such models involved two stages of visual processing. According to these models, the first stage is attention free and registers low level features such as brightness gradients, motion and orientation in a parallel manner. Meanwhile, the second stage requires focused attention. It registers high-level object descriptions, has limited capacity and operates serially. These models have been empirically informed by studies demonstrating change blindness, inattentional blindness and attentional blink. Such studies show that when one's visual focused attention is engaged by a task, significant changes in one's environment that are not directly pertinent to the task can escape awareness. It was generally thought that natural scene perception was similarly susceptible to change blindness, inattentional blindness and attentional blink, and that these psychological phenomena occurred because engaging in a task diverts attentional resources that would otherwise be used for natural scene perception.

Evidence against the need for focused attention

The attention-free hypothesis soon emerged to challenge early models. The initial basis for the attention-free hypothesis was the finding that in visual search, basic visual features of objects immediately and automatically pop out to the person doing the visual search. Further experiments seemed to support this: Potter showed that high-order representations can be accessed rapidly from natural scenes presented at rates of up to 10 per second. Additionally, Thorpe, Fize & Marlot discovered that humans and primates can categorize natural images rapidly and accurately even after brief exposures. The basic idea in these studies is that exposure to each individual scene is too brief for attentional processes to occur, yet human beings are able to interpret and categorize these scenes.
Weaker versions of the attention-free hypothesis have also been targeted at specific components of the natural scene perception process instead of the process as a whole. Kihara & Takeda limit their claim to saying that it is the integration of spatial frequency-based information in natural scenes that is attention free. This claim is based on a study of theirs which used attention-demanding tasks to examine participants' abilities to accurately categorize images that were filtered to have a wide range of spatial frequencies. The logic behind this experiment was that if integration of visual information across spatial frequencies is preattentive, then attention-demanding tasks should not affect performance in the categorization task. This was indeed found to be the case.

More recent evidence reasserting the need for focused attention

A recent study by Cohen, Alvarez & Nakayama calls into question the validity of evidence supporting the attention-free hypothesis. They found that participants did display inattentional blindness while doing certain kinds of multiple-object tracking and rapid serial visual presentation tasks. Furthermore, Cohen et al. found that participants' natural scene perception was impaired under dual-task conditions, but that this dual-task impairment happened only when participants' primary task was sufficiently demanding. The authors concluded that previous studies showing the absence of a need for focused attention did not use tasks that were demanding enough to fully engage attention.
In the Cohen et al. study, the MOT task involved viewing eight black moving discs presented against a changing background that consisted of randomly colored checkerboard masks. Four of these discs were picked out and participants were instructed to track these four discs. The RSVP task involved viewing a stream of letters and digits presented against a series of changing checkerboards, and counting the number of times a digit was presented. In both experiments, the critical trial involved a natural scene suddenly replacing the second last checkerboard, and participants were immediately afterwards asked whether they had noticed anything different, as well as presented with six questions to determine whether they had categorized the scene. The dual-task condition simply involved participants performing the MOT task mentioned above and a scene-classification task simultaneously. The authors varied the difficulty of the task by increasing or decreasing the speed of the moving discs.

Models

These are some of the models that have been proposed for the purpose of explaining natural scene perception.

Evans' & Treisman's hypothesis

Evans & Treisman proposed a hypothesis that humans rapidly detect disjunctive sets of unbound features of target categories in a parallel manner, and then use these features to discriminate between scenes that do or do not contain the target without necessarily fully identifying it. An example of such a feature would be outstretched wings that can be used to tell whether or not a bird is in a picture, even before the system has identified an object as a bird. Evans & Treisman propose that natural scene perception involves a first pass through the visual processing hierarchy up to the nodes in a visual identification network, and then optional revisiting of earlier levels for more detailed analysis. During the 'first pass' stage, the system forms a global representation of the natural scene that includes the layout of global boundaries and potential objects. During the 'revisiting' stage, focused attention is employed to select local objects of interest in a serial manner, and then bind their features to their representations.
This hypothesis is consistent with the results of their study in which participants were instructed to detect animal targets in RSVP sequences, and then report their identities and locations. While participants were able to detect the targets in most trials, they were often subsequently unable to identify or localize them. Furthermore, when two targets were presented in quick succession, participants displayed a significant attentional blink when required to identify the targets, but the attentional blink was mostly eliminated among participants only required to only detect them. Evans & Treisman explain these results by with the hypothesis that the attentional blink occurs because the identification stage requires attentional resources, while the detection stage does not.

Ultra-rapid visual categorization

Ultra-rapid visual categorization is a model proposing an automatic feedforward mechanism that forms high-level object representations in parallel without focused attention. In this model, the mechanism cannot be sped up by training. Evidence for a feedforward mechanism can be found in studies that have shown that many neurons are already highly selective at the beginning of a visual response, thus suggesting that feedback mechanisms are not required for response selectivity to increase. Furthermore, recent fMRI and ERP studies have shown that masked visual stimuli that participants do not consciously perceive can significantly modulate activity in the motor system, thus suggesting somewhat sophisticated visual processing.
VanRullen ran simulations showing that the feedforward propagation of one wave of spikes through high-level neurons, generated in response to a stimulus, could be enough for crude recognition and categorization that occurs in 150 ms or less.

Neural-object file theory

Xu & Chun propose the neural-object file theory, which posits that the human visual system initially selects a fixed number of roughly four objects from a crowded scene based on their spatial information before encoding their details. Under this framework, object individuation is generally controlled by the inferior intra-parietal sulcus, while object identification involves the superior IPS and higher-level visual areas. At the object individuation stage, object representations are coarse and contain minimal feature information. However, once these object representations have been 'set up' during the object individuation stage they can be elaborated on over time during the object identification stage, during which additional featural and identity information is received.
The neural-object file theory deals with the issue of attention by proposing two different processing systems. One of them tracks the overall hierarchical structure of the visual display and is attention-free, while the other processes current objects of attentional selection. The current hypothesis is that the parahippocampal place area plays a role in shifting visual attention to different parts of a scene and incorporating information from multiple frames in order to form an integrated representation of the scene.
The separation between object individuation and identification in the neural object-file theory is supported by evidence such as that from Xu's & Chun's fMRI study. In this study, they examined posterior brain mechanisms that supported visual short-term memory. The fMRI showed that representations in the inferior IPS were fixed to roughly four objects regardless of object complexity, but representations in the superior IPS and lateral occipital complex varied according to complexity.

Natural scene statistics

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...