Visual Influences on Perception of Speech and Nonspeech Vocal-Tract Events (original) (raw)
Related papers
Changes in the McGurk Effect across phonetic contexts
2010
To investigate the processes underlying audiovisual speech perception, the McGurk illusion was examined across a range of phonetic contexts. Two major changes were found. First, the frequency of illusory /g/ fusion percepts increased relative to the frequency of illusory /d/ fusion percepts as vowel context was shifted from /i/ to / / to /u/. This trend could not be explained by biases present in perception of the unimodal visual stimuli. However, the change found in the McGurk fusion effect across vowel environments did correspond systematically with changes in second formant frequency patterns across contexts. Second, the order of consonants in illusory combination percepts was found to depend on syllable type. This may be due to differences occurring across syllable contexts in the timecourses of inputs from the two modalities as delaying the auditory track of a vowel-consonant stimulus resulted in a change in the order of consonants perceived. Taken together, these results suggest that the speech perception system either fuses audiovisual inputs into a visually compatible percept with a similar second formant pattern to that of the acoustic stimulus or interleaves the information from different modalities, at a phonemic or subphonemic level, based on their relative arrival times.
Visual Cues in Speech Perception
This report presents a summarised view of some research papers on visual cues in speech perception, especially the work of Harry McGurk and the McGurk effect. The report does not present any research done by the author, nor does it pretend to cover the whole research area. It is merely a summary of some well-known science reports in this area. The report includes a broader chapter on research done with consideration to the McGurk effect, i.e. research on speech perception done within the last twenty years. Some of the articles that are summarised in the report were presented quite recently and therefore cover some of the most recent studies done in speech perception.
Brain Sciences
In the McGurk effect, perception of a spoken consonant is altered when an auditory (A) syllable is presented with an incongruent visual (V) syllable (e.g., A/pa/V/ka/ is often heard as /ka/ or /ta/). The McGurk effect provides a measure for visual influence on speech perception, becoming stronger the lower the proportion of auditory correct responses. Cross-language effects are studied to understand processing differences between one’s own and foreign languages. Regarding the McGurk effect, it has sometimes been found to be stronger with foreign speakers. However, other studies have shown the opposite, or no difference between languages. Most studies have compared English with other languages. We investigated cross-language effects with native Finnish and Japanese speakers and listeners. Both groups of listeners had 49 participants. The stimuli (/ka/, /pa/, /ta/) were uttered by two female and male Finnish and Japanese speakers and presented in A, V and AV modality, including a McGu...
A comparison of the McGurk effect for spoken and sung syllables
Attention, Perception, & Psychophysics, 2010
The importance of visual cues in speech perception is illustrated by the McGurk effect, whereby a speaker's facial movements affect speech perception. The goal of the present study was to evaluate whether the McGurk effect is also observed for sung syllables. Participants heard and saw sung instances of the syllables /ba/ and /ga/ and then judged the syllable they perceived. Audio-visual stimuli were congruent or incongruent (e.g., auditory /ba/ presented with visual /ga/). The stimuli were presented as spoken, sung in an ascending and descending triad (C E G G E C), and sung in an ascending and descending triad that returned to a semitone above the tonic (C E G G E C # ). Results revealed no differences in the proportion of fusion responses between spoken and sung conditions confirming that cross-modal phonemic information is integrated similarly in speech and song.
A comparison of the McGurk effect in speech and song
The importance of visual cues in speech perception is illustrated by the McGurk effect, whereby a speaker's facial movements affect speech perception. The goal of the present study was to evaluate whether the McGurk effect is also observed for sung syllables. Participants heard and saw sung instances of the syllables /ba/ and /ga/ and then judged the syllable they perceived. Audio-visual stimuli were congruent or incongruent (e.g., auditory /ba/ presented with visual /ga/). The stimuli were presented as spoken, sung in an ascending and descending triad (C E G G E C), and sung in an ascending and descending triad that returned to a semitone above the tonic (C E G G E C # ). Results revealed no differences in the proportion of fusion responses between spoken and sung conditions confirming that cross-modal phonemic information is integrated similarly in speech and song.
When a /Bi/g/Gi/g becomes a /Di/g: Explorations of the McGurk effect in speech perception
Australian Journal of Psychology
Although the McGurk Effect is a well researched illusory phenomenon arising from discrepant auditory and visual speech information little is known about the influence of lexical processes on this phenomenon. Thus, we investigated the McGurk Effect using three letter consonant-vowel-consonant real word and pseudoword pairs with an audiovisual discrepancy positioned at either stimulus onset or offset. The results demonstrated that the frequency of illusions was similar for real words and pseudowords when the discrepancy was at stimulus onset but was significantly lower for real words when the audiovisual discrepancy was positioned at stimulus offset. Positioning of audiovisual discrepancy was not important for accurate auditory perception of pseudowords. These results suggest that the McGurk illusion is the result of audiovisual integration that occurs early in perception prior to word identification and that these early audiovisual integrative processes are modulated by lexical knowledge.