Mechanisms for the Perception of Concurrent Vowels (original) (raw)
Related papers
The perceptual segregation of simultaneous vowels with harmonic, shifted, or random components
Attention, Perception, & Psychophysics, 1993
This experiment was an investigation of the ability of listeners to identify the constituents of double vowels (pairs of synthetic vowels, presented concurrently and binaurally). Three variables were manipulated: (1) the size of the difference inFO between the constituents (0, '/z, and 6 semitones); (2) the frequency relations among the sinusoids making up the constituents: harmonic, shifted (spaced equally in frequency but not integer multiples ofthe FO), and random; and (3) the relationship between the FO contours imposed on the constituents: steady state, gliding inpara1l el, or gliding in opposite directions. It was assumed that, in the case of the gliding contours, the harmonics of each vowel would "trace out" their spectral envelope and potentially improve the definition ofthe formant locations. It was also assumed that the application of different FO contours would introduce differences in the direction of harmonic movement (common fate), thus aiding the perceptual segregation of the two vowels. The major findings were the following: (1) For harmonic constituents, a difference in FO leads to improved identification performance. Neither tracing nor common-fate differences add to the effect of pitch differences.-(2-)-For-shifted constituents, a difference between the spacing ofthe constituents also leads to improved performance. Formant tracing and common fate contribute some further improvement.X3) For random constituents, tracing does not contribute, but common fate does.
Concurrent vowel identification. I. Effects of relative amplitude and F0 difference
The Journal of the Acoustical Society of America, 1997
Subjects identified concurrent synthetic vowel pairs that differed in relative amplitude and fundamental frequency (F 0). Subjects were allowed to report one or two vowels for each stimulus, rather than forced to report two vowels as was the case in previously reported experiments of the same type. At all relative amplitudes, identification was better at a fundamental frequency difference (⌬F 0) of 6% than at 0%, but the effect was larger when the target vowel amplitude was below that of the competing vowel ͑Ϫ10 or Ϫ20 dB͒. The existence of a ⌬F 0 effect when the target is weak relative to the competing vowel is interpreted as evidence that segregation occurs according to a mechanism of cancellation based on the harmonic structure of the competing vowel. Enhancement of the target based on its own harmonic structure is unlikely, given the difficulty of estimating the fundamental frequency of a weak target. Details of the pattern of identification as a function of amplitude and vowel pair were found to be incompatible with a current model of vowel segregation.
Concurrent vowel identification. I. Effects of relative amplitude and F[sub 0] difference
The Journal of the Acoustical Society of America, 1997
Subjects identified concurrent synthetic vowel pairs that differed in relative amplitude and fundamental frequency (F 0 ). Subjects were allowed to report one or two vowels for each stimulus, rather than forced to report two vowels as was the case in previously reported experiments of the same type. At all relative amplitudes, identification was better at a fundamental frequency difference (⌬F 0 ) of 6% than at 0%, but the effect was larger when the target vowel amplitude was below that of the competing vowel ͑Ϫ10 or Ϫ20 dB͒. The existence of a ⌬F 0 effect when the target is weak relative to the competing vowel is interpreted as evidence that segregation occurs according to a mechanism of cancellation based on the harmonic structure of the competing vowel. Enhancement of the target based on its own harmonic structure is unlikely, given the difficulty of estimating the fundamental frequency of a weak target. Details of the pattern of identification as a function of amplitude and vowel pair were found to be incompatible with a current model of vowel segregation.
Attention, Perception, & Psychophysics, 1989
In the experiments reported here, we attempted to find out more about how the auditory system is able to separate two simultaneous harmonic sounds. Previous research (Halikia & Bregman, 1984a Scheffers, 1983a) had indicated that a difference in fundamental frequency (F0) between two simultaneous vowel sounds improves their separate identification. In the present experiments, we looked at the effect of F0s that changed as a function of time. In Experiment 1, pairs of unfiltered or filtered pulse trains were used. Some were steady-state, and others had gliding F0s; different F0 separations were also used. The subjects had to indicate whether they had heard one or two sounds. The results showed that increased F0 differences and gliding F0s facilitated the perceptual separation of simultaneous sounds. In Experiments 2 and 3, simultaneous synthesized vowels were used on frequency contours that were steady-state, gliding in parallel (parallel glides), or gliding in opposite directions (crossing glides). The results showed that crossing glides led to significantly better vowel identification than did steady-state F0s. Also, in certain cases, crossing glides were more effective than parallel glides. The superior effect of the crossing glides could be due to the common frequency modulation of the harmonics within each component of the vowel pair and the consequent decorrelation of the harmonics between the two simultaneous vowels.
Journal of The Acoustical Society of America, 1994
When two vowels are presented simultaneously, listeners can report their phonemic identities more accurately if their fundamental frequencies ͑F 0 's͒ are different rather than the same. If the F 0 difference (⌬F 0 ) is large, listeners hear two vowels on different pitches; if the ⌬F 0 is small the vowels are identified less accurately and they do not evoke different pitches. The present study used a matching task to obtain judgments of the pitches evoked by ''double vowels'' created from pairwise combinations of steady-state synthetic vowels /i/, /Ä/, /u/, /,/, and /Ñ/. One F 0 was always 100 Hz; the other F 0 was either 0, 0.25, 0.5, 1, 2, or 4 semitones higher. Experienced listeners adjusted the F 0 of a tone complex to assign pitch matches to 50-ms or 200-ms double vowels. For ⌬F 0 's up to two semitones, listeners' matches formed a single cluster in the frequency region spanned by the two F 0 's. When the ⌬F 0 was 4 semitones, the matches generally formed two clusters close to the F 0 of each vowel, suggesting that listeners perceive two distinct pitches when the ⌬F 0 is 4 semitones but only one clear pitch ͑possibly accompanied by one or more weaker pitches͒ with smaller ⌬F 0 's. When the duration was reduced from 200 ms to 50 ms, only a subset of the vowel pairs with a ⌬F 0 of 4 semitones produced a bimodal distribution of matches. In general, 50-ms stimuli were matched less consistently than their 200-ms counterparts, indicating that the pitches of concurrent vowels emerge less clearly when the stimuli are brief. Comparisons of pitch and vowel identification data revealed a moderate correlation between match intervals ͑defined as the absolute frequency difference between first and second pitch matches͒ and identification accuracy for the 200-ms stimuli with the largest ⌬F 0 of 4 semitones. The link between match intervals and vowel identification was weak or absent in conditions where the stimuli evoked only one pitch.
Perceiving vowels from uniform spectra: Phonetic exploration of an auditory aftereffect
Perception & Psychophysics, 1984
A carefully spoken vowel can generally be identified from the pattern of peaks and valleys in the envelope of its short-term power spectrum, and such patterning is usually necessary for the identification of the vowel. The present experiments demonstrate that segments of sound with uniform spectra, devoid of peaks and valleys, can be identified reliably as vowels under certain circumstances. In Experiment 1, 1,000 msec of a segment whose spectrum contained peaks in place of valleys and vice versa (i.e., the complement of a vowel) preceded a 25-msec spectral amplitude transition, during which the valleys became filled, leading into a 250-msec segment with a uniform spectrum. The segment with the uniform spectrum was identified as the vowel whose complement had preceded it. Experiment 2 showed that this effect was eliminated if the duration of the complement was less than 150 msec, if more than 500 msec of silence separated the uniform spectrum from the complement, or if the uniform spectrum and the complement were presented to different ears. This third result and comparisons with parameters of auditory aftereffects obtained by others with nonspeech stimuli suggest that the effect is rooted in peripheral adaptation processes and that central processes responsible for selective attention and perceptual grouping play only a minor role at most. Experiment 3 demonstrated that valleys in the spectral structure of a complement need be only 2 dB deep to generate the effect. The effect should therefore serve to enhance changes in spectral structure in natural speech and to alleviate the consequences of uneven frequency responses in communication channels.
The Journal of the Acoustical Society of America, 1995
The improvement of identification accuracy of concurrent vowels with differences in fundamental frequency (AF 0) is usually attributed to mechanisms that exploit harmonic structure. To decide whether identification is aided primarily by selecting the target vowel on the basis of its harmonic structure ("harmonic enhancement") or removing the interfering vowel on tile basis of its harmonic structure ("harmonic cancellation"), pairs of synthetic vowels, each of which was either harmonic or inharmonic, were presented to listeners for identification. Responses for each vowel were scored according to the vowel's harmonicity and that of the vowel that accompanied it. For a given target, identification was better by about 3% for a harmonic ground unless the tinget was also harmonic with the same F 0. This supports the cancellation hypothesis. Identification was worse for harmonic than for inharmonic targets by 3%-8%. This does not support the enhancement hypothesis. When both vowels were harmonic, identification was better by about 6% when the F0's differed by 1/2 semitone, consistent with previous experiments. Results are interpreted in terms of harmonic enhancement and harmonic cancellation, and alternative explanations such as waveform interaction are considered.
Vowel recognition at fundamental frequencies up to 1 kHz reveals point vowels as acoustic landmarks
The Journal of the Acoustical Society of America, 2017
The phonological function of vowels can be maintained at fundamental frequencies (f o) up to 880 Hz [Friedrichs et al. (2015). J. Acoust. Soc. Am. 138 , EL36-EL42]. Here, we test the influence of talker variability and multiple response options on vowel recognition at high f o s. The stimuli (n=264) consisted of eight isolated vowels (/i y e ø E a o u/) produced by three female native German talkers at eleven f o s within a range of 220-1046 Hz. In a closed-set identification task, 21 listeners were presented excised 700-ms vowel nuclei with quasi-flat f o contours and resonance trajectories. The results show that listeners can identify the point vowels /i a u/ at f o s up to almost 1 kHz, with a significant decrease for the vowels /y E/ and a drop to chance level for the vowels /e ø o/ towards the upper f o s. Auditory excitation patterns reveal highly differentiable representations for /i a u/ that can be used as landmarks for vowel category perception at high f o s. These results suggest that theories of vowel perception based on overall spectral shape will provide a fuller account of vowel perception than those based solely on formant frequency patterns.