D. Kersten - Academia.edu (original) (raw)
Papers by D. Kersten
How do observers recognize faces despite dramatic image variations that arise from changes in ill... more How do observers recognize faces despite dramatic image variations that arise from changes in illumination? This paper examines 1) whether face recognition is sensitive to illumination direction, and 2) whether cast shadows improve performance by providing information about illumination, or hinder performance by introducing spurious edges. In Experiment 1, observers judged whether 2 sequentially-presented faces, illuminated from the same or different directions, were the same or different individuals. Cast shadows were present for half of the observers. Performance was impaired by a change in the illumination direction and by the presence of shadows. In Experiment 2, observers learned to name 8 faces under one illumination direction (left/right) and one cast-shadow condition (present/absent); they were later tested under novel illumination and shadow conditions. Performance declined for unfamiliar illumination directions, but not for unfamiliar shadow conditions. The finding that fa...
Advances in neural information …, 2004
This paper compares the ability of human observers to detect target image curves with that of an ... more This paper compares the ability of human observers to detect target image curves with that of an ideal observer. The target curves are sampled from a generative model which specifies (probabilistically) the geometry and local intensity properties of the curve. The ideal observer performs Bayesian inference on the generative model using MAP estimation. Varying the probability model for the curve geometry enables us investigate whether human performance is best for target curves that obey specific shape statistics, in particular those observed on natural shapes. Experiments are performed with data on both rectangular and hexagonal lattices. Our results show that human observers' performance approaches that of the ideal observer and are, in general, closest to the ideal for conditions where the target curve tends to be straight or similar to natural statistics on curves. This suggests a bias of human observers towards straight curves and natural statistics. à ÙÑ Ò Ã Ð results for Clumping and No Clumping are summarized in figure (4E & H). The average reward difference, ¡Ö Ö Ð Ö ÙÑ Ò , results for Clumping and No Clumping are summarized in figure (4F & I). Both performance measures give consistent results for the Clumping data suggesting that humans are best when detecting the straightest lines (È´×ØÖ Øµ ¼). But the situation is more complicated for the No Clumping case where human observers show preferences for È´×ØÖ Øµ ¼ or È´×ØÖ Øµ ¼ .
Workshop on Statistical and …, 1999
A full Bayesian approach to vision requires consideration of potential interactions between all t... more A full Bayesian approach to vision requires consideration of potential interactions between all the variables in both the scene and image. A complete model of the interactions, however, would seem computationally intractable because of the large dimensionality of image measurements and scene properties. As a consequence, both experimental studies and theoretical models of human vision have relied on an assumption of modularity in which a particular scene property, such as object depth, is estimated from a restricted set of image measurements, such as image size. The computational problem is not hopeless, however, and can be surmounted by restricting the task and taking advantage of the statistical structure of the problem. In a Bayesian context, modularity falls out of the conditional independencies in the joint distribution of scenes and images p(S, I). By conditioning the joint distribution with respect to particular inference tasks, further modularity is possible while preserving optimal cue combination. We illustrate the problem of modularity and cue combination for the perception of depth from two highly disparate cues, cast shadow position and image size. While strong modularity would suggest ad hoc or no cue combination, we find that the performance of human subjects is better predicted by near-optimal cue combination.
PLoS Computational Biology, 2010
Perception is fundamentally underconstrained because different combinations of object properties ... more Perception is fundamentally underconstrained because different combinations of object properties can generate the same sensory information. To disambiguate sensory information into estimates of scene properties, our brains incorporate prior knowledge and additional ''auxiliary'' (i.e., not directly relevant to desired scene property) sensory information to constrain perceptual interpretations. For example, knowing the distance to an object helps in perceiving its size. The literature contains few demonstrations of the use of prior knowledge and auxiliary information in combined visual and haptic disambiguation and almost no examination of haptic disambiguation of vision beyond ''bistable'' stimuli. Previous studies have reported humans integrate multiple unambiguous sensations to perceive single, continuous object properties, like size or position. Here we test whether humans use visual and haptic information, individually and jointly, to disambiguate size from distance. We presented participants with a ball moving in depth with a changing diameter. Because no unambiguous distance information is available under monocular viewing, participants rely on prior assumptions about the ball's distance to disambiguate their-size percept. Presenting auxiliary binocular and/or haptic distance information augments participants' prior distance assumptions and improves their size judgment accuracy-though binocular cues were trusted more than haptic. Our results suggest both visual and haptic distance information disambiguate size perception, and we interpret these results in the context of probabilistic perceptual reasoning.
Pattern Recognition, 2011
We propose a method for rapidly classifying surface reflectance directly from the output of spati... more We propose a method for rapidly classifying surface reflectance directly from the output of spatiotemporal filters applied to an image sequence of rotating objects. Using image data from only a single frame, we compute histograms of image velocities and classify these as being generated by a specular or a diffusely reflecting object. Exploiting characteristics of material-specific image velocities we show that our classification approach can predict the reflectance of novel 3D objects, as well as human perception.
Current Biology, 2011
Many critical perceptual judgments, from telling whether fruit is ripe to determining whether the... more Many critical perceptual judgments, from telling whether fruit is ripe to determining whether the ground is slippery, involve estimating the material properties of surfaces. Very
Direction dependent occipital and parietal activity during the perception of optic flows simulati... more Direction dependent occipital and parietal activity during the perception of optic flows simulating eccentric headings.
Perception, 1997
Phenomenally strong visual illusions are described in which the motion of an object‘s cast shadow... more Phenomenally strong visual illusions are described in which the motion of an object‘s cast shadow determines the perceived 3-D trajectory of the object. Simply adjusting the motion of a shadow is sufficient to induce dramatically different apparent trajectories of the object casting the shadow. Psychophysical results obtained with the use of 3-D graphics are reported which show that: (i) the information provided by the motion of an object's shadow overrides other strong sources of information and perceptual biases, such as the assumption of constant object size and a general viewpoint; (ii) the natural constraint of shadow darkness plays a role in the interpretation of a moving image patch as a shadow, but under some conditions even unnatural light shadows can induce apparent motion in depth of an object; (iii) when shadow motion is caused by a moving light source, the visual system incorrectly interprets the shadow motion as consistent with a moving object, rather than a moving...
Philosophical Transactions of the Royal Society B: Biological Sciences, 1997
The central problems of vision are often divided into object identification and localization. Obj... more The central problems of vision are often divided into object identification and localization. Object identification, at least at fine levels of discrimination, may require the application of top–down knowledge to resolve ambiguous image information. Utilizing top–down knowledge, however, may require the initial rapid access of abstract object categories based on low–level image cues. Does object localization require a different set of operating principles than object identification or is category determination also part of the perception of depth and spatial layout? Three–dimensional graphics movies of objects and their cast shadows are used to argue that identifying perceptual categories is important for determining the relative depths of objects. Processes that can identify the causal class (e.g. the kind of material) that generates the image data can provide information to determine the spatial relationships between surfaces. Changes in the blurriness of an edge may be characteri...
It is well known that the human vsual system can reconstruct depth from simple random-dot dsplays... more It is well known that the human vsual system can reconstruct depth from simple random-dot dsplays given motion 'Information. This fact has lent support to the notion that structure from stereo and motion system's rely on low-level primitives or tokens such as edges, derived from image intensities. In contrast, the 'udgment of surface attributes such as transparency or opacity is often considered to be a higher-level visual process that would make use of low-level stereo or motion 'Information and perhaps attention or later recognition to tease apart the transparent from the opaque parts. This is exemplified by the lack of computational studies dealing with transparency, compared with the at least limited success of a number of algorithms to solve structure from motion or stereo. In this study, we describe a new illusion and some results that question the above view by showing that depth from transparency and opacity can override the rigidity bias in perceiving depth from motion. This provides support for the 'idea that the brain's computation of the surface material attribute of transparency may have to be done either before, or in parallel with the computation of structure from motion.
Poster presented at the Annual Meeting of The Association for Research in Vision and Ophthalmology, Jun 2, 2006
Purpose: How do observers recognize objects despite dramatic image variations that arise from cha... more Purpose: How do observers recognize objects despite dramatic image variations that arise from changes in illumination? Some evidence suggests that changes in illumination direction influence object recognition (Kersten et al., ARVO 1995). We examine whether illumination dependency extends to face recognition. A corollary issue is whether cast shadows improve performance by providing information about light source direction, or hinder performance by introducing spurious edges that must be discounted prior to ...
Journal of Vision
We measured perceptual judgments of category, material attributes, affordances, and similarity to... more We measured perceptual judgments of category, material attributes, affordances, and similarity to investigate the perceptual dimensions underlying the visual representation of a broad class of natural dynamic flows (sea waves, smoke, and windblown foliage). The dynamic flows were looped 3-s movies windowed with circular apertures of two sizes to manipulate the level of spatial context. In low levels of spatial context (smaller apertures), human observers' judgments of material attributes and affordances were inaccurate, with estimates biased toward assumptions that the flows resulted from objects that were rigid, ''pick-up-able,'' and not penetrable. The similarity arrangements showed dynamic flow clusters based partly on material, but dominated by color appearance. In high levels of spatial context (large apertures), observers reliably estimated material categories and their attributes. The similarity arrangements were based primarily on categories related to external, physical causes. Representational similarity analysis suggests that while shallow dimensions like color sometimes account for inferences of physical causes in the low-context condition, shallow dimensions cannot fully account for these inferences in the high-context condition. For the current broad data set of dynamic flows, the perceptual dimensions that best account for the similarity arrangements in the highcontext condition are related to the intermolecular bond strength of a material's underlying physical structure. These arrangements are also best related to affordances that underlie common motor activities. Thus, the visual system appears to use an efficient strategy to resolve flow ambiguity; vision will sometimes rely on local, image-based, statistical properties that can support reliable inference of external physical causes, and other times it uses deeper causal knowledge to interpret and use flow information to the extent that it is useful for everyday action decisions.
The light reaching the eye confounds the proportion of light reflected from surfaces in the envir... more The light reaching the eye confounds the proportion of light reflected from surfaces in the environment with their illumination. To achieve constancy in perceived surface reflectance (lightness) across variations in illumination, the visual system must infer the relative contribution of reflectance to the incoming luminance signals. Previous studies have shown that contour and stereo cues to surface shape can affect the lightness of sawtooth luminance profiles. Here, we investigated whether cues to surface shape provided solely by motion (via the kinetic depth effect) can similarly influence lightness. Human observers judged the relative brightness of patches contained within abutting surfaces with identical luminance ramps. We found that the reported brightness differences were significantly lower when the kinetic depth effect supported the impression of curved surfaces, compared to similar conditions without the kinetic depth effect. This demonstrates the capacity of the visual sy...
How do observers recognize faces despite dramatic image variations that arise from changes in ill... more How do observers recognize faces despite dramatic image variations that arise from changes in illumination? This paper examines 1) whether face recognition is sensitive to illumination direction, and 2) whether cast shadows improve performance by providing information about illumination, or hinder performance by introducing spurious edges. In Experiment 1, observers judged whether 2 sequentially-presented faces, illuminated from the same or different directions, were the same or different individuals. Cast shadows were present for half of the observers. Performance was impaired by a change in the illumination direction and by the presence of shadows. In Experiment 2, observers learned to name 8 faces under one illumination direction (left/right) and one cast-shadow condition (present/absent); they were later tested under novel illumination and shadow conditions. Performance declined for unfamiliar illumination directions, but not for unfamiliar shadow conditions. The finding that fa...
Advances in neural information …, 2004
This paper compares the ability of human observers to detect target image curves with that of an ... more This paper compares the ability of human observers to detect target image curves with that of an ideal observer. The target curves are sampled from a generative model which specifies (probabilistically) the geometry and local intensity properties of the curve. The ideal observer performs Bayesian inference on the generative model using MAP estimation. Varying the probability model for the curve geometry enables us investigate whether human performance is best for target curves that obey specific shape statistics, in particular those observed on natural shapes. Experiments are performed with data on both rectangular and hexagonal lattices. Our results show that human observers' performance approaches that of the ideal observer and are, in general, closest to the ideal for conditions where the target curve tends to be straight or similar to natural statistics on curves. This suggests a bias of human observers towards straight curves and natural statistics. à ÙÑ Ò Ã Ð results for Clumping and No Clumping are summarized in figure (4E & H). The average reward difference, ¡Ö Ö Ð Ö ÙÑ Ò , results for Clumping and No Clumping are summarized in figure (4F & I). Both performance measures give consistent results for the Clumping data suggesting that humans are best when detecting the straightest lines (È´×ØÖ Øµ ¼). But the situation is more complicated for the No Clumping case where human observers show preferences for È´×ØÖ Øµ ¼ or È´×ØÖ Øµ ¼ .
Workshop on Statistical and …, 1999
A full Bayesian approach to vision requires consideration of potential interactions between all t... more A full Bayesian approach to vision requires consideration of potential interactions between all the variables in both the scene and image. A complete model of the interactions, however, would seem computationally intractable because of the large dimensionality of image measurements and scene properties. As a consequence, both experimental studies and theoretical models of human vision have relied on an assumption of modularity in which a particular scene property, such as object depth, is estimated from a restricted set of image measurements, such as image size. The computational problem is not hopeless, however, and can be surmounted by restricting the task and taking advantage of the statistical structure of the problem. In a Bayesian context, modularity falls out of the conditional independencies in the joint distribution of scenes and images p(S, I). By conditioning the joint distribution with respect to particular inference tasks, further modularity is possible while preserving optimal cue combination. We illustrate the problem of modularity and cue combination for the perception of depth from two highly disparate cues, cast shadow position and image size. While strong modularity would suggest ad hoc or no cue combination, we find that the performance of human subjects is better predicted by near-optimal cue combination.
PLoS Computational Biology, 2010
Perception is fundamentally underconstrained because different combinations of object properties ... more Perception is fundamentally underconstrained because different combinations of object properties can generate the same sensory information. To disambiguate sensory information into estimates of scene properties, our brains incorporate prior knowledge and additional ''auxiliary'' (i.e., not directly relevant to desired scene property) sensory information to constrain perceptual interpretations. For example, knowing the distance to an object helps in perceiving its size. The literature contains few demonstrations of the use of prior knowledge and auxiliary information in combined visual and haptic disambiguation and almost no examination of haptic disambiguation of vision beyond ''bistable'' stimuli. Previous studies have reported humans integrate multiple unambiguous sensations to perceive single, continuous object properties, like size or position. Here we test whether humans use visual and haptic information, individually and jointly, to disambiguate size from distance. We presented participants with a ball moving in depth with a changing diameter. Because no unambiguous distance information is available under monocular viewing, participants rely on prior assumptions about the ball's distance to disambiguate their-size percept. Presenting auxiliary binocular and/or haptic distance information augments participants' prior distance assumptions and improves their size judgment accuracy-though binocular cues were trusted more than haptic. Our results suggest both visual and haptic distance information disambiguate size perception, and we interpret these results in the context of probabilistic perceptual reasoning.
Pattern Recognition, 2011
We propose a method for rapidly classifying surface reflectance directly from the output of spati... more We propose a method for rapidly classifying surface reflectance directly from the output of spatiotemporal filters applied to an image sequence of rotating objects. Using image data from only a single frame, we compute histograms of image velocities and classify these as being generated by a specular or a diffusely reflecting object. Exploiting characteristics of material-specific image velocities we show that our classification approach can predict the reflectance of novel 3D objects, as well as human perception.
Current Biology, 2011
Many critical perceptual judgments, from telling whether fruit is ripe to determining whether the... more Many critical perceptual judgments, from telling whether fruit is ripe to determining whether the ground is slippery, involve estimating the material properties of surfaces. Very
Direction dependent occipital and parietal activity during the perception of optic flows simulati... more Direction dependent occipital and parietal activity during the perception of optic flows simulating eccentric headings.
Perception, 1997
Phenomenally strong visual illusions are described in which the motion of an object‘s cast shadow... more Phenomenally strong visual illusions are described in which the motion of an object‘s cast shadow determines the perceived 3-D trajectory of the object. Simply adjusting the motion of a shadow is sufficient to induce dramatically different apparent trajectories of the object casting the shadow. Psychophysical results obtained with the use of 3-D graphics are reported which show that: (i) the information provided by the motion of an object's shadow overrides other strong sources of information and perceptual biases, such as the assumption of constant object size and a general viewpoint; (ii) the natural constraint of shadow darkness plays a role in the interpretation of a moving image patch as a shadow, but under some conditions even unnatural light shadows can induce apparent motion in depth of an object; (iii) when shadow motion is caused by a moving light source, the visual system incorrectly interprets the shadow motion as consistent with a moving object, rather than a moving...
Philosophical Transactions of the Royal Society B: Biological Sciences, 1997
The central problems of vision are often divided into object identification and localization. Obj... more The central problems of vision are often divided into object identification and localization. Object identification, at least at fine levels of discrimination, may require the application of top–down knowledge to resolve ambiguous image information. Utilizing top–down knowledge, however, may require the initial rapid access of abstract object categories based on low–level image cues. Does object localization require a different set of operating principles than object identification or is category determination also part of the perception of depth and spatial layout? Three–dimensional graphics movies of objects and their cast shadows are used to argue that identifying perceptual categories is important for determining the relative depths of objects. Processes that can identify the causal class (e.g. the kind of material) that generates the image data can provide information to determine the spatial relationships between surfaces. Changes in the blurriness of an edge may be characteri...
It is well known that the human vsual system can reconstruct depth from simple random-dot dsplays... more It is well known that the human vsual system can reconstruct depth from simple random-dot dsplays given motion 'Information. This fact has lent support to the notion that structure from stereo and motion system's rely on low-level primitives or tokens such as edges, derived from image intensities. In contrast, the 'udgment of surface attributes such as transparency or opacity is often considered to be a higher-level visual process that would make use of low-level stereo or motion 'Information and perhaps attention or later recognition to tease apart the transparent from the opaque parts. This is exemplified by the lack of computational studies dealing with transparency, compared with the at least limited success of a number of algorithms to solve structure from motion or stereo. In this study, we describe a new illusion and some results that question the above view by showing that depth from transparency and opacity can override the rigidity bias in perceiving depth from motion. This provides support for the 'idea that the brain's computation of the surface material attribute of transparency may have to be done either before, or in parallel with the computation of structure from motion.
Poster presented at the Annual Meeting of The Association for Research in Vision and Ophthalmology, Jun 2, 2006
Purpose: How do observers recognize objects despite dramatic image variations that arise from cha... more Purpose: How do observers recognize objects despite dramatic image variations that arise from changes in illumination? Some evidence suggests that changes in illumination direction influence object recognition (Kersten et al., ARVO 1995). We examine whether illumination dependency extends to face recognition. A corollary issue is whether cast shadows improve performance by providing information about light source direction, or hinder performance by introducing spurious edges that must be discounted prior to ...
Journal of Vision
We measured perceptual judgments of category, material attributes, affordances, and similarity to... more We measured perceptual judgments of category, material attributes, affordances, and similarity to investigate the perceptual dimensions underlying the visual representation of a broad class of natural dynamic flows (sea waves, smoke, and windblown foliage). The dynamic flows were looped 3-s movies windowed with circular apertures of two sizes to manipulate the level of spatial context. In low levels of spatial context (smaller apertures), human observers' judgments of material attributes and affordances were inaccurate, with estimates biased toward assumptions that the flows resulted from objects that were rigid, ''pick-up-able,'' and not penetrable. The similarity arrangements showed dynamic flow clusters based partly on material, but dominated by color appearance. In high levels of spatial context (large apertures), observers reliably estimated material categories and their attributes. The similarity arrangements were based primarily on categories related to external, physical causes. Representational similarity analysis suggests that while shallow dimensions like color sometimes account for inferences of physical causes in the low-context condition, shallow dimensions cannot fully account for these inferences in the high-context condition. For the current broad data set of dynamic flows, the perceptual dimensions that best account for the similarity arrangements in the highcontext condition are related to the intermolecular bond strength of a material's underlying physical structure. These arrangements are also best related to affordances that underlie common motor activities. Thus, the visual system appears to use an efficient strategy to resolve flow ambiguity; vision will sometimes rely on local, image-based, statistical properties that can support reliable inference of external physical causes, and other times it uses deeper causal knowledge to interpret and use flow information to the extent that it is useful for everyday action decisions.
The light reaching the eye confounds the proportion of light reflected from surfaces in the envir... more The light reaching the eye confounds the proportion of light reflected from surfaces in the environment with their illumination. To achieve constancy in perceived surface reflectance (lightness) across variations in illumination, the visual system must infer the relative contribution of reflectance to the incoming luminance signals. Previous studies have shown that contour and stereo cues to surface shape can affect the lightness of sawtooth luminance profiles. Here, we investigated whether cues to surface shape provided solely by motion (via the kinetic depth effect) can similarly influence lightness. Human observers judged the relative brightness of patches contained within abutting surfaces with identical luminance ramps. We found that the reported brightness differences were significantly lower when the kinetic depth effect supported the impression of curved surfaces, compared to similar conditions without the kinetic depth effect. This demonstrates the capacity of the visual sy...