Jordan, T.R., & Bevan, K.M. (1997). Seeing and hearing rotated faces: Influences of facial orientation on visual and audiovisual speech recognition. Journal of Experimental Psychology: Human Perception and Performance, 23, 388-403. (original) (raw)
Related papers
Psychophysics of the McGurk and other audiovisual speech integration effects
Journal of Experimental Psychology: Human Perception and Performance, 2011
When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called McGurk effect), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for the distribution of types of perceptual responses to 384 different stimuli from four talkers. The measures included mutual information, the presented acoustic signal versus the acoustic signal recorded with the presented video, and the correlation between the presented acoustic and video stimuli. In Experiment 1, open-set perceptual responses were obtained for acoustic /bA/ or /lA/ dubbed to video /bA, dA, gA, vA, zA, lA, wA, ΔA/. The talker, the video syllable, and the acoustic syllable significantly influenced the type of response. In Experiment 2, the best predictors of response category proportions were a subset of the physical stimulus measures, with the variance accounted for in the perceptual response category proportions between 17% and 52%. That audiovisual stimulus relationships can account for response distributions supports the possibility that internal representations are based on modality-specific stimulus relationships.
Visual Cues in Speech Perception
This report presents a summarised view of some research papers on visual cues in speech perception, especially the work of Harry McGurk and the McGurk effect. The report does not present any research done by the author, nor does it pretend to cover the whole research area. It is merely a summary of some well-known science reports in this area. The report includes a broader chapter on research done with consideration to the McGurk effect, i.e. research on speech perception done within the last twenty years. Some of the articles that are summarised in the report were presented quite recently and therefore cover some of the most recent studies done in speech perception.
Visual recalibration of auditory speech identification: A McGurk aftereffect
Psychological Science, 2003
The kinds of aftereffects, indicative of cross-modal recalibration, that are observed after exposure to spatially incongruent inputs from different sensory modalities have not been demonstrated so far for identity incongruence. We show that exposure to incongruent audiovisual speech (producing the well-known McGurk effect) can recalibrate auditory speech identification. In Experiment 1, exposure to an ambiguous sound intermediate between /aba/ and /ada/ dubbed onto a video of a face articulating either /aba/ or /ada/ increased the proportion of /aba/ or /ada/ responses, respectively, during subsequent sound identification trials. Experiment 2 demonstrated the same recalibration effect or the opposite one, fewer /aba/ or /ada/ responses, revealing selective speech adaptation, depending on whether the ambiguous sound or a congruent nonambiguous one was used during exposure. In separate forced-choice identification trials, bimodal stimulus pairs producing these contrasting effects were identically categorized, which makes a role of postperceptual factors in the generation of the effects unlikely.
Brain activity during audiovisual speech perception: an fMRI study of the McGurk effect
Neuroreport, 2003
fMRI was used to assess the relationship between brain activation and the degree of audiovisual integration of speech information during a phoneme categorization task. Twelve subjects heard a speaker say the syllable /aba/ paired either with video of the speaker saying the same consonant or a di¡erent one (/ava/). In order to manipulate the degree of audiovisual integration, the audio was either synchronous or 7 400 ms out of phase with the visual stimulus. Subjects reported whether they heard the consonant /b/ or another consonant; fewer /b/ responses when the audio and visual stimuli were mismatched indicated higher levels of visual in£uence on speech perception (McGurk e¡ect). Active brain regions during presentation of the incongruent stimuli included the superior temporal and inferior frontal gyrus, as well as extrastriate, premotor and posterior parietal cortex. A regression analysis related the strength of the McGurk e¡ect to levels of brain activation. Paradoxically, higher numbers of /b/ responses were positively correlated with activation in the left occipito-temporal junction, an area often associated with processing visual motion. This activation suggests that auditory information modulates visual processing to a¡ect perception.
Perception & Psychophysics, 1991
Studies of the McGurk effect have shown that when discrepant phonetic information is delivered to the auditory and visual modalities, the information is combined into a new percept not originally presented to either modality. In typical experiments, the auditory and visual speech signals are generated by the same talker. The present experiment examined whether a discrepancy in the gender of the talker between the auditory and visual signals would influence the magnitude of the McGurk effect. A male talker's voice was dubbed onto a videotape containing a female talker's face, and vice versa. The gender-incongruent videotapes were compared with gendercongruent videotapes, in which a male talker's voice was dubbed onto a male face and a female talker's voice was dubbed onto a female face. Even though there was a clear incompatibility in talker characteristics between the auditory and visual signals on the incongruent videotapes, the resulting magnitude of the McGurk effect was not significantly different for the incongruent as opposed to the congruent videotapes. The results indicate that the mechanism for integrating speech information from the auditory and the visual modalities is not disrupted by a gender incompatibility even when it is perceptually apparent. The findings are compatible with the theoretical notion that information about voice characteristics of the talker is extracted and used to normalize the speech signal at an early stage of phonetic processing, prior to the integration of the auditory and the visual information.