The influence of different timbre attributes on the perceptual segregation of complex-tone sequences (original) (raw)

Effects of time intervals and tone durations on auditory stream segregation

Perception & Psychophysics, 2000

Adult listeners rated the difficulty of hearing a single coherent stream in a sequence of high (H) and low (L) tones that alternated in a repetitive galloping pattern (HLH-HLH-HLH ...). They could hear the gallop when the sequence was perceived as a single stream, but when it segregated into two substreams, they heard H-H-... in one stream and L-L-... in the other. The onset-to-onset time of the tones, their duration, the interstimulus interval (lSI) between tones of the same frequency, and the frequency separation between H and L tones were varied. Subjects' ratings on a 7-point scale showed that the well-known effect of speed's increasing stream segregation is primarily due to its effect on the lSI between tones in the same frequency region. This has implications for several theories of streaming. When a sequence of tones, alternating between two frequency ranges, is speeded up, the tendency for the high and low tones to form separate auditory streams is increased. It has been proposed by Bregman (1990) that tones group by their proximity on a frequency-by-time surface. An increase of speed brings the tones closer together in time but does not reduce their frequency separations. This brings the consecutive tones of the same frequency closer together on the frequency-by-time surface, while leaving those of different frequencies almost as far away as they were before. This new proximity favors the grouping of a tone with the next one in the same frequency range even if the two tones are not consecutive, because the alternative grouping (with the tone that comes right after it but is ofa different frequency) requires grouping across a longer distance. So we see that temporal distance is very important. But what is the best way to measure temporal distance? The effect of speed could be due to a change in any of the four types of time intervals shown in Figure 1, which all become shorter when the speed is increased. (Note: SOA means stimulus onset asychrony-i.e., onset-to-onset time, and lSI is the label for interstimulus interval-offset-to-onset time. (1) SOA for consecutive tones in the same frequency range (SOAwithin). Note that in Figure 1 there are two different intervals of this type, one for each frequency, sincethe low tones occur less frequently than the high ones in the galloping pattern. (2) lSI for consecutive tones in the same frequency range (lSI-within). Again, there are two different intervals of this type, since the low tones occur less Support from the Natural Sciences and Engineering Research Council of Canada (Experiment I) and NIMH (Experiment 2) is gratefully acknowledged. We are also grateful for Lisa Weaver's assistance. Correspondence concerning this article should be addressed to A.

Accenting and detection of timing variations in tone sequences: Different kinds of accents have different effects

Perception & Psychophysics, 2001

The effect of intensity and pitch accents on the perception of timing was examined in two experiments using a signal detection procedure. Analyses of sensitivity and response bias revealed opposite effects of intensity and pitch accents under similar conditions. Time intervals preceding intensity accents were perceived as longer, but time intervals preceding pitch accents were perceived as shorter. These results showed that listeners found it easier to detect timing variations that were contrary to expectations, as compared with variations that were consistent with expectations. In the present case, listeners should have expected shorter time intervals before intensity accents and longer intervals before pitch accents. The fact that the effects were observed with stimuli that had minimal musical structure demonstrated the contribution of psychoacoustic factors to such phenomena.

Effects of similarity in bandwidth on the auditory sequential streaming of two-tone complexes

PERCEPTION-LONDON-, 1999

We investigated the perceptual grouping of sequentially presented soundsöauditory stream segregation. It is well established that sounds heard as more similar in quality, or timbre, are more likely to be grouped into the same auditory stream. However, it is often unclear exactly what acoustic factors determine timbre. In this study, we presented various sequences of simple sounds, each comprising two frequency components (two-tone complexes), and measured their perceptual grouping. We varied only one parameter between trials, the intercomponent separation for some of the complexes, and examined the effects on stream segregation. Four hypotheses are presented that might predict the extent of streaming. Specifically, least streaming might be expected when the sounds were most similar in either (1) the frequency regions in which they have energy (maximum spectral overlap), (2) their auditory bandwidths, (3) their relative bandwidths, or (4) the rate at which the two components beat together (intermodulation rate). It was found that least streaming occurred when sounds were most similar in either their auditory or their relative bandwidths. Although these two hypotheses could not be distinguished, the results were clearly different from those predicted by hypotheses and . The implications for models of stream segregation are discussed.

Effects of two acoustic continua on the within-category perceptual structure of tones.

The present study investigated effects of two acoustic continua on the within-category perceptual structure of Putonghua Tone 2 and Tone 3. These two tones were simulated with tokens varying along two acoustic continua about F0 contour: the timing of F0 turning point and falling of F0. Three different syllable durations were tested with voice quality under control. Multidimensional scaling analyses were applied to investigate relative influence of phonetic identification and category goodness on the perceptual dissimilarity of synthesized tonal tokens. The result revealed that Tone 3 has later F0 turning point and greater F0 falling than Tone 2, which confirms former findings. The new finding is that perceptual representation of these two tones categories is different in their internal structures. Best tokens disperse within categories and are usually not unique. Perceptual space involving Tone 2 tokens shrink but that involving Tone 3 doesn't. Goodness rating contributes significantly to the dissimilarity scaling across Tone 2 tokens but not Tone 3 tokens.

Frequency discrimination of complex tones; assessing the role of component resolvability and temporal fine structure

The Journal of the Acoustical Society of America, 2006

Thresholds for discriminating the fundamental frequency ͑F0͒ of a complex tone, F0DLs, are small when low harmonics are present, but increase when the number of the lowest harmonic, N, is above eight. To assess whether the relatively small F0DLs for N in the range 8-10 are based on ͑partly͒ resolved harmonics or on temporal fine structure information, F0DLs were measured as a function of N for tones with three successive harmonics which were added either in cosine or alternating phase. The center frequency was 2000 Hz, and N was varied by changing the mean F0. A background noise was used to mask combination tones. The value of F0 was roved across trials to force subjects to make within-trial comparisons. N was roved by ±1 for every stimulus, to prevent subjects from using excitation pattern cues. F0DLs were not influenced by component phase for N = 6 or 7, but were smaller for cosine than for alternating phase once N exceeded 7, suggesting that temporal fine structure plays a role in this range. When the center frequency was increased to 5000 Hz, performance was much worse for low N, suggesting that phase locking is important for obtaining low F0DLs with resolved harmonics.

The perceptual segregation of simultaneous auditory signals: Pulse train segregation and vowel segregation

Attention, Perception, & Psychophysics, 1989

In the experiments reported here, we attempted to find out more about how the auditory system is able to separate two simultaneous harmonic sounds. Previous research (Halikia & Bregman, 1984a Scheffers, 1983a) had indicated that a difference in fundamental frequency (F0) between two simultaneous vowel sounds improves their separate identification. In the present experiments, we looked at the effect of F0s that changed as a function of time. In Experiment 1, pairs of unfiltered or filtered pulse trains were used. Some were steady-state, and others had gliding F0s; different F0 separations were also used. The subjects had to indicate whether they had heard one or two sounds. The results showed that increased F0 differences and gliding F0s facilitated the perceptual separation of simultaneous sounds. In Experiments 2 and 3, simultaneous synthesized vowels were used on frequency contours that were steady-state, gliding in parallel (parallel glides), or gliding in opposite directions (crossing glides). The results showed that crossing glides led to significantly better vowel identification than did steady-state F0s. Also, in certain cases, crossing glides were more effective than parallel glides. The superior effect of the crossing glides could be due to the common frequency modulation of the harmonics within each component of the vowel pair and the consequent decorrelation of the harmonics between the two simultaneous vowels.

Effect of harmonic rank on sequential sound segregation

Hearing research, 2018

The ability to segregate sounds from different sound sources is thought to depend on the perceptual salience of differences between the sounds, such as differences in frequency or fundamental frequency (F0). F0 discrimination of complex tones is better for tones with low harmonics than for tones that only contain high harmonics, suggesting greater pitch salience for the former. This leads to the expectation that the sequential stream segregation (streaming) of complex tones should be better for tones with low harmonics than for tones with only high harmonics. However, the results of previous studies are conflicting about whether this is the case. The goals of this study were to determine the effect of harmonic rank on streaming and to establish whether streaming is related to F0 discrimination. Thirteen young normal-hearing participants were tested. Streaming was assessed for pure tones and complex tones containing harmonics with various ranks using sequences of ABA triplets, where ...

The Effect of Instrumental Timbre on Interval Discrimination

PLoS ONE, 2013

We tested non-musicians and musicians in an auditory psychophysical experiment to assess the effects of timbre manipulation on pitch-interval discrimination. Both groups were asked to indicate the larger of two presented intervals, comprised of four sequentially presented pitches; the second or fourth stimulus within a trial was either a sinusoidal (or ''pure''), flute, piano, or synthetic voice tone, while the remaining three stimuli were all pure tones. The intervaldiscrimination tasks were administered parametrically to assess performance across varying pitch distances between intervals (''interval-differences''). Irrespective of timbre, musicians displayed a steady improvement across intervaldifferences, while non-musicians only demonstrated enhanced interval discrimination at an interval-difference of 100 cents (one semitone in Western music). Surprisingly, the best discrimination performance across both groups was observed with pure-tone intervals, followed by intervals containing a piano tone. More specifically, we observed that: 1) timbre changes within a trial affect interval discrimination; and 2) the broad spectral characteristics of an instrumental timbre may influence perceived pitch or interval magnitude and make interval discrimination more difficult.

Differential processing of terminal tone parts within structured and non-structured tones

Neuroscience Letters, 2007

Recent studies utilizing the mismatch negativity (MMN) event-related potential (ERP) revealed that when a repetitive sequence of sinusoidal tones is presented, the occasional insertion of a short deviation into some of the tones leads to the elicitation of an MMN only if it occurs during the initial 300 ms, but not beyond. In contrast, deviations occurring in speech sounds elicit MMN even beyond 300 ms. We conducted two experiments to resolve this conflict. We hypothesised that an additional transient within an otherwise unstructured tone may overcome this limitation. First, we tested for MMN to a deviance at the terminal part of a 650 ms tone which did or did not contain a gap. Only when the tone included the gap, MMN was obtained. Second, we compared the gap condition with two noise conditions, in which the gap was replaced by modulated white noise. The noise conditions differed with respect to the saliency of the perceived interruption of the tone. In all three conditions, MMN was elicited. These results demonstrate that structuring a sinusoidal tone by a gap or a noise interval is sufficient to regain MMN. It is suggested that the introduction of an additional transient triggers a new integration window overcoming the temporal constraints of automatic tone representation. This resolves the seeming contradiction between MMN studies using tonal and speech sounds.