Concurrent vowel identification. I. Effects of relative amplitude and F[sub 0] difference (original) (raw)

Concurrent vowel identification. I. Effects of relative amplitude and F0 difference

The Journal of the Acoustical Society of America, 1997

Subjects identified concurrent synthetic vowel pairs that differed in relative amplitude and fundamental frequency (F 0). Subjects were allowed to report one or two vowels for each stimulus, rather than forced to report two vowels as was the case in previously reported experiments of the same type. At all relative amplitudes, identification was better at a fundamental frequency difference (⌬F 0) of 6% than at 0%, but the effect was larger when the target vowel amplitude was below that of the competing vowel ͑Ϫ10 or Ϫ20 dB͒. The existence of a ⌬F 0 effect when the target is weak relative to the competing vowel is interpreted as evidence that segregation occurs according to a mechanism of cancellation based on the harmonic structure of the competing vowel. Enhancement of the target based on its own harmonic structure is unlikely, given the difficulty of estimating the fundamental frequency of a weak target. Details of the pattern of identification as a function of amplitude and vowel pair were found to be incompatible with a current model of vowel segregation.

The perceptual segregation of simultaneous vowels with harmonic, shifted, or random components

Attention, Perception, & Psychophysics, 1993

This experiment was an investigation of the ability of listeners to identify the constituents of double vowels (pairs of synthetic vowels, presented concurrently and binaurally). Three variables were manipulated: (1) the size of the difference inFO between the constituents (0, '/z, and 6 semitones); (2) the frequency relations among the sinusoids making up the constituents: harmonic, shifted (spaced equally in frequency but not integer multiples ofthe FO), and random; and (3) the relationship between the FO contours imposed on the constituents: steady state, gliding inpara1l el, or gliding in opposite directions. It was assumed that, in the case of the gliding contours, the harmonics of each vowel would "trace out" their spectral envelope and potentially improve the definition ofthe formant locations. It was also assumed that the application of different FO contours would introduce differences in the direction of harmonic movement (common fate), thus aiding the perceptual segregation of the two vowels. The major findings were the following: (1) For harmonic constituents, a difference in FO leads to improved identification performance. Neither tracing nor common-fate differences add to the effect of pitch differences.-(2-)-For-shifted constituents, a difference between the spacing ofthe constituents also leads to improved performance. Formant tracing and common fate contribute some further improvement.X3) For random constituents, tracing does not contribute, but common fate does.

Identification of concurrent harmonic and inharmonic vowels: A test of the theory of harmonic cancellation and enhancement

The Journal of the Acoustical Society of America, 1995

The improvement of identification accuracy of concurrent vowels with differences in fundamental frequency (AF 0) is usually attributed to mechanisms that exploit harmonic structure. To decide whether identification is aided primarily by selecting the target vowel on the basis of its harmonic structure ("harmonic enhancement") or removing the interfering vowel on tile basis of its harmonic structure ("harmonic cancellation"), pairs of synthetic vowels, each of which was either harmonic or inharmonic, were presented to listeners for identification. Responses for each vowel were scored according to the vowel's harmonicity and that of the vowel that accompanied it. For a given target, identification was better by about 3% for a harmonic ground unless the tinget was also harmonic with the same F 0. This supports the cancellation hypothesis. Identification was worse for harmonic than for inharmonic targets by 3%-8%. This does not support the enhancement hypothesis. When both vowels were harmonic, identification was better by about 6% when the F0's differed by 1/2 semitone, consistent with previous experiments. Results are interpreted in terms of harmonic enhancement and harmonic cancellation, and alternative explanations such as waveform interaction are considered.

Mechanisms for the Perception of Concurrent Vowels

Previous research has shown that the perceptual segregation of concurrent vowels is improved when there is a difference in fundamental frequency (F0) between them. Two theories, termed F0-guided segregation and glottal pulse asynchrony (GPA), have been advanced to explain this "∆F0 effect". A previous study found no consistent effect of GPA. However it is argued that this may have been because the auditory system uses both strategies, in which case a common F0 may cause the two vowels to be heard as one regardless of GPA. To overcome this potentially confounding influence, vowels with irregularly timed glottal pulses (and thus no well defined F0) were used to investigate the role of GPA. The results show that GPA still has no significant effect on recognition rates. Remarkably however, these irregularly excited vowels gave recognition rates that were equal to or significantly greater than their periodic counterparts, suggesting that F0-guided segregation is not required to explain the ∆F0 effect.

A note on hidden factors in vowel perception experiments

The Journal of the Acoustical Society of America, 1990

The article ‘‘Static, dynamic, and relational properties in vowel perception,’’ by Terrance M. Nearey [J. Acoust. Soc. Am. 85, 2088–2113 (1989)], contains a report on a vowel perception experiment that reproduces the previous finding that the perceptual identification of vowels is influenced by both extrinsic, contextual factors and by intrinsic factors such as F0. The effect of F0 is, however, interpreted to be smaller than claimed in previous research. This appears to be due to an incorrect analysis of the role of context. It is shown how the results are to be interpreted taking into account how the listener’s expectations are affected by a preceding context. The results are compatible with predictions on the basis of a relational theory of speech perception for cases in which intrinsic and extrinsic factors do not counteract each other.

Contextual Effects In Vowel Perception II: Evidence for Two Processing Mechanisms

Perception & Psychophysics, 1980

Recent experiments have indicated that contrast effects can be obtained with vowels by anchoring a test series with one of the endpoint vowels. These contextual effects cannot be attributed to feature detector fatigue or to the induction of an overt response bias. In the present studies, anchored ABX discrimination functions and signal detection analyses of identification data Ibefore and after anchoring} for an [i]-[I] vowel series were used to demonstrate that [i] and [I] anchoring produce contrast effects by affecting different perceptual mechanisms. The effects of [i] anchoring were to increase within-[’:] category sensitivity, while [I] anchoring shifted criterion placements. When vowels were placed in CVC syllables to reduce available auditory memory, there was a significant decrease in the size of the [I]-anchor contrast effects. The magnitude of the Ill-anchor effect was unaffected by the reduction in vowel information available in auditory memory. These results suggest that [i] and [I] anchors affect mechanisms at different levels of processing. The [i] anchoring results may reflect normalization processes in speech perception that operate at an early level of perceptual processing, while the [I] anchoring results represent changes in response criterion mediated by auditory memory for vowel information

Categorial discrimination of vowels produced in syllable context and in isolation

1985

An innovative experimental paradigm that avoids certain problems of response bias in speech perception studies is presented. The paradigm was tested in a replication of an important finding in the perception of American English vowels. The problem was the relative identifiability of vowels in different syllable contexts, Itl-vowel-/t (TVT) and isolated vowels (V). The traditional ABX discrimination procedure was converted to a categorial discrimination task by having the three stimuli on each trial spoken by different people. This task requires a match according to vowel category, not acoustic identity. The technique eliminates the response-alternative problems of keyword identification tasks. Although overall error rates were low, the original findings were replicated: Listeners were more accurate when discriminating some vowels in TVT than in V syllables. Results are interpreted as support for a theory that considers dynamic acoustic information important for vowel perception.

Concurrent Vowel Identification and Speech Perception in Noise in Individuals With Cochlear Hearing Loss

Acta Acustica united with Acustica, 2013

Previous studies have indicated that individuals with cochlear hearing loss perform poorly in concurrent vowel identification task. This indicates that individuals with cochlear hearing loss do not use F0 cues to segregate two acoustic streams, as much as normal hearing individuals do. However, which of vowel features (place, tongue height etc.,) are better transmitted when the F0s of two concurrently presented vowels are varied is not known. Moreover, the contribution of stream segregation abilities in understanding of speech in the presence of competing signal is also not clear. Hence, this study aimed to evaluate the relationship between concurrent vowel identification scores and speech perception abilities in noise in individuals with cochlear hearing loss and compared that to normal hearing individuals. Specifically, we measured the identification of vowels /e/, /i/, /o/ and /u/ when vowel /a/ was presented simultaneously as a competing stimulus in 14 individuals with normal hearing and 15 individuals with cochlear hearing loss. Vowel identification scores were measured in four conditions: with 0 semitone difference between fundamental frequencies of concurrent vowels, with 1 semitone, 2 semitone and 4 semitone difference in the fundamental frequencies of concurrent vowels. Signal to noise ratio required to identify 50% of presented speech (SNR-50) in presence of four talker babble was also measured using standardized sentence list. Furthermore, relationship between concurrent vowel identification and SNR-50 was also evaluated. The results of the present study showed that individuals with cochlear hearing loss had poorer concurrent vowel identification scores compared to the normal hearing group, especially in 0 semitone difference condition. There were significant correlations between concurrent vowel identification scores and SNR-50 indicating, differences in the F0 are one of the most robust cues for stream segregation.

Pitches of concurrent vowels

Journal of The Acoustical Society of America, 1994

When two vowels are presented simultaneously, listeners can report their phonemic identities more accurately if their fundamental frequencies ͑F 0 's͒ are different rather than the same. If the F 0 difference (⌬F 0 ) is large, listeners hear two vowels on different pitches; if the ⌬F 0 is small the vowels are identified less accurately and they do not evoke different pitches. The present study used a matching task to obtain judgments of the pitches evoked by ''double vowels'' created from pairwise combinations of steady-state synthetic vowels /i/, /Ä/, /u/, /,/, and /Ñ/. One F 0 was always 100 Hz; the other F 0 was either 0, 0.25, 0.5, 1, 2, or 4 semitones higher. Experienced listeners adjusted the F 0 of a tone complex to assign pitch matches to 50-ms or 200-ms double vowels. For ⌬F 0 's up to two semitones, listeners' matches formed a single cluster in the frequency region spanned by the two F 0 's. When the ⌬F 0 was 4 semitones, the matches generally formed two clusters close to the F 0 of each vowel, suggesting that listeners perceive two distinct pitches when the ⌬F 0 is 4 semitones but only one clear pitch ͑possibly accompanied by one or more weaker pitches͒ with smaller ⌬F 0 's. When the duration was reduced from 200 ms to 50 ms, only a subset of the vowel pairs with a ⌬F 0 of 4 semitones produced a bimodal distribution of matches. In general, 50-ms stimuli were matched less consistently than their 200-ms counterparts, indicating that the pitches of concurrent vowels emerge less clearly when the stimuli are brief. Comparisons of pitch and vowel identification data revealed a moderate correlation between match intervals ͑defined as the absolute frequency difference between first and second pitch matches͒ and identification accuracy for the 200-ms stimuli with the largest ⌬F 0 of 4 semitones. The link between match intervals and vowel identification was weak or absent in conditions where the stimuli evoked only one pitch.

Perceiving vowels from uniform spectra: Phonetic exploration of an auditory aftereffect

Perception & Psychophysics, 1984

A carefully spoken vowel can generally be identified from the pattern of peaks and valleys in the envelope of its short-term power spectrum, and such patterning is usually necessary for the identification of the vowel. The present experiments demonstrate that segments of sound with uniform spectra, devoid of peaks and valleys, can be identified reliably as vowels under certain circumstances. In Experiment 1, 1,000 msec of a segment whose spectrum contained peaks in place of valleys and vice versa (i.e., the complement of a vowel) preceded a 25-msec spectral amplitude transition, during which the valleys became filled, leading into a 250-msec segment with a uniform spectrum. The segment with the uniform spectrum was identified as the vowel whose complement had preceded it. Experiment 2 showed that this effect was eliminated if the duration of the complement was less than 150 msec, if more than 500 msec of silence separated the uniform spectrum from the complement, or if the uniform spectrum and the complement were presented to different ears. This third result and comparisons with parameters of auditory aftereffects obtained by others with nonspeech stimuli suggest that the effect is rooted in peripheral adaptation processes and that central processes responsible for selective attention and perceptual grouping play only a minor role at most. Experiment 3 demonstrated that valleys in the spectral structure of a complement need be only 2 dB deep to generate the effect. The effect should therefore serve to enhance changes in spectral structure in natural speech and to alleviate the consequences of uneven frequency responses in communication channels.