HIGH-LEVEL SCENE PERCEPTION (original) (raw)
Related papers
Attention and Scene Understanding
Neurobiology of Attention, 2005
This paper presents a simplified, introductory view of how visual attention may contribute to and integrate within the broader framework of visual scene understanding. Several key components are identified which cooperate with attention during the analysis of complex dynamic visual inputs, namely rapid computation of scene gist and layout, localized object recognition and tracking at attended locations, working memory that holds a representation of currently relevant targets, and longterm memory of known world entities and their inter-relationships. Evidence from neurobiology and psychophysics is provided to support the proposed architecture.
Perceptual effects of scene context on object identification
Psychological Research, 1990
In a number of studies the context provided by a real-world scene has been claimed to have a mandatory, perceptual effect on the identification of individual objects in such a scene. This claim has provided a basis for challenging widely accepted data-driven models of visual perception in order to advocate alternative models with an outspoken top-down character. The present paper offers a review of the evidence to demonstrate that the observed scene-context effects may be the product of post-perceptual and task-dependent guessing strategies. A new research paradigm providing an on-line measure of genuine perceptual effects of context on object identification is proposed. First-fixation durations for objects incidentally fixated during the free exploration of real-world scenes are shown to increase when the objects are improbable in the scene or violate certain aspects of their typical spatial appearance in it. These effects of contextual violations are shown to emerge only at later stages of scene exploration, contrary to the notion of schema-driven scene perception effective from the very first scene fixation. In addition, evidence is reported in support of the existence of a facilitatory component in scene-context effects. This is taken to indicate that the context directly affects the ease of perceptual object processing and does not merely serve as a framework for checking the plausibility of the output of perceptual processes. Finally, our findings are situated against other contrasting results. Some future research questions are highlighted.
Consciousness and Cognition, 2008
Eye movements were recorded during the display of two images of a real-world scene that were inspected to determine whether they were the same or not (a comparative visual search task). In the displays where the pictures were different, one object had been changed, and this object was sometimes taken from another scene and was incongruent with the gist. The experiment established that incongruous objects attract eye fixations earlier than the congruous counterparts, but that this effect is not apparent until the picture has been displayed for several seconds. By controlling the visual saliency of the objects the experiment eliminates the possibility that the incongruency effect is dependent upon the conspicuity of the changed objects. A model of scene perception is suggested whereby attention is unnecessary for the partial recognition of an object that delivers sufficient information about its visual characteristics for the viewer to know that the object is improbable in that particular scene, and in which full identification requires foveal inspection.
The influence of scene context on object recognition is independent of attentional focus
Frontiers in Psychology, 2013
Humans can quickly and accurately recognize objects within briefly presented natural scenes. Previous work has provided evidence that scene context contributes to this process, demonstrating improved naming of objects that were presented in semantically consistent scenes (e.g., a sandcastle on a beach) relative to semantically inconsistent scenes (e.g., a sandcastle on a football field). The current study was aimed at investigating which processes underlie the scene consistency effect. Specifically, we tested: (1) whether the effect is due to increased visual feature and/or shape overlap for consistent relative to inconsistent scene-object pairs; and (2) whether the effect is mediated by attention to the background scene. Experiment 1 replicated the scene consistency effect of a previous report . Using a new, carefully controlled stimulus set, Experiment 2 showed that the scene consistency effect could not be explained by lowlevel feature or shape overlap between scenes and target objects. Experiments 3a and 3b investigated whether focused attention modulates the scene consistency effect. By using a location cueing manipulation, participants were correctly informed about the location of the target object on a proportion of trials, allowing focused attention to be deployed toward the target object. Importantly, the effect of scene consistency on target object recognition was independent of spatial attention, and was observed both when attention was focused on the target object and when attention was focused on the background scene. These results indicate that a semantically consistent scene context benefits object recognition independently of the focus of attention. We suggest that the scene consistency effect is primarily driven by global scene properties, or "scene gist" , that can be processed with minimal attentional resources.
Eye Movements and Visual Encoding During Scene Perception
Psychological Science, 2009
The amount of time viewers could process a scene during eye fixations was varied by a mask that appeared at a certain point in each eye fixation. The scene did not reappear until the viewer made an eye movement. The main finding in the studies was that in order to normally process a scene, viewers needed to see the scene for at least 150 ms during each eye fixation. This result is surprising because viewers can extract the gist of a scene from a brief 40-to 100-ms exposure. It also stands in marked contrast to reading, as readers need only to view the words in the text for 50 to 60 ms to read normally. Thus, although the same neural mechanisms control eye movements in scene perception and reading, the cognitive processes associated with each task drive processing in different ways. The neural mechanisms that underlie oculomotor activity do not vary as a function of the task viewers engage in; there is not one oculomotor system for looking at scenes, another for visual search, and another for reading. Eye movements are essential in these tasks because the eyes must be placed on the part of the scene or text viewers want to process in detail in foveal vision (Henderson, 2003; Rayner, 1998, in press). Does the oculomotor system react in the same way to stimuli in these different tasks? In the present studies, we utilized a gaze-contingent display change paradigm (Henderson &
Attention–memory interactions in scene perception
Spatial Vision, 2006
The perception of natural scenes relies on the integration of pre-existing knowledge with the immediate results of attentional processing, and what can be remembered from a scene depends in turn on how that scene is perceived and understood. However, there are conflicting results in the literature as to whether people are more likely to remember those objects that are consistent with the scene or those that are not. Moreover, whether any discrepancy between the likelihood of remembering schema-consistent or schema-inconsistent objects should be attributed to the schematic effects on attention or on memory remains unclear. To address this issue, the current study attempted to directly manipulate attention allocation by requiring participants to look at (i) schema-consistent objects, (ii) schema-inconsistent objects, or (iii) to share attention equally across both. Regardless of the differential allocation of attention or object fixation, schema-consistent objects were better recalled whereas recognition was independent of schema-consistency, but depended on task instruction. These results suggest that attention is important both for remembering low-level object properties, and information whose retrieval is not supported by the currently active schema. Specific knowledge of the scenes being viewed can result in the recall of non-fixated objects, but without such knowledge attention is required to encode sufficient detail for subsequent recognition. Our results demonstrate therefore that attention is not critical for the retrieval of objects that are consistent with a scene's schematic content.
The Role of Fixation Position in Detecting Scene Changes Across Saccades
Psychological Science, 1999
Target objects presented within color images of naturalistic scenes were deleted or rotated during a saccade to or from the target object or to a control region of the scene. Despite instructions to memorize the details of the scenes and to monitor for object changes, viewers frequently failed to notice the changes. However, the failure to detect change was mediated by three other important factors: First, accuracy generally increased as the distance between the changing region and the fixation immediately before or after the change decreased. Second, changes were sometimes initially missed, but subsequently noticed when the changed region was later refixated. Third, when an object disappeared from a scene, detection of that disappearance was greatly improved when the deletion occurred during the saccade toward that object. These results suggest that fixation position and saccade direction play an important role in determining whether changes will be detected. It appears that more i...
Visual scene memory and the guidance of saccadic eye movements
Vision Research, 2001
An unresolved question is how much information can be remembered from visual scenes when they are inspected by saccadic eye movements. Subjects used saccadic eye movements to scan a computer-generated scene, and afterwards, recalled as many objects as they could. Scene memory was quite good: it improved with display duration, it persisted over time long after the display was removed, and it continued to accumulate with additional viewings of the same display (Melcher, D. The persistance of memory for scenes. Nature 412, 401). The occurrence of saccadic eye movements was important to ensure good recall performance, even though subjects often recalled non-fixated objects. Inter-saccadic intervals increased with display duration, showing an influence of duration on global scanning strategy. The choice of saccadic target was predicted by a Random Selection with Distance Weighting (RSDW) model, in which the target for each saccade is selected at random from all available objects, weighted according to distance from fixation, regardless of which objects had previously been fixated. The results show that the visual memory that was reflected in the recall reports was not utilized for the immediate decision about where to look in the scene. Visual memory can be excellent, but it is not always reflected in oculomotor measures, perhaps because the cost of rapid on-line memory retrieval is too great.