Songbirds tune their vocal tract to the fundamental frequency of their song - PubMed (original) (raw)

Songbirds tune their vocal tract to the fundamental frequency of their song

Tobias Riede et al. Proc Natl Acad Sci U S A. 2006.

Abstract

In human speech, the sound generated by the larynx is modified by articulatory movements of the upper vocal tract, which acts as a variable resonant filter concentrating energy near particular frequencies, or formants, essential in speech recognition. Despite its potential importance in vocal communication, little is known about the presence of tunable vocal tract filters in other vertebrates. The tonal quality of much birdsong, in which upper harmonics have relatively little energy, depends on filtering of the vocal source, but the nature of this filter is controversial. Current hypotheses treat the songbird vocal tract as a rigid tube with a resonance that is modulated by the end-correction of a variable beak opening. Through x-ray cinematography of singing birds, we show that birdsong is accompanied by cyclical movements of the hyoid skeleton and changes in the diameter of the cranial end of the esophagus that maintain an inverse relationship between the volume of the oropharyngeal cavity and esophagus and the song's fundamental frequency. A computational acoustic model indicates that this song-related motor pattern tunes the major resonance of the oropharyngeal-esophageal cavity to actively track the song's fundamental frequency.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: No conflicts declared.

Figures

Fig. 1.

Fig. 1.

Each song syllable is accompanied by coordinated movements of the larynx and cornua that maintain an inverse relationship between the size of the OEC and the song’s _f_0. (A) Lateral view of cardinal showing the dorsoventral movement (LV) of the larynx from the middle of the second cervical vertebra and its craniocaudal movement (LH) from the dorsal edge of the beak-skull transition. (B) Ventrodorsal view showing distance between lateral cornua of hyoid apparatus (Cornua). (C) Movements of larynx, cornua, and beak during upward sweeping syllable 1. (D) Laryngeal movements during syllables 3 and 4. When _f_0 was less than ≈2 kHz, beak gape was usually too small to measure on the fluoroscopic images and, although slightly open, was recorded as zero. The peak _f_0 of syllables 1 and 4, but not of syllables 2 or 3, was nearly always accompanied by an increased gape. Frequencies of _f_0 are superimposed (black) on LV and Cornua distances. LV and LH, syllables 1, 3, and 4 from bird 345; Cornua, syllable 1 from bird 407.

Fig. 2.

Fig. 2.

Relationship between _f_0 and dimensions of the supralaryngeal vocal tract for syllables 1–4. Movement of larynx (LV) and cornua (Cornua) is plotted as percent of their maximum movement during the cycle associated with each syllable. Linear regression lines with 95% confidence interval are shown. Data for some relationships are fitted better by exponential regression lines. Beak gape is plotted as mean + 1 SD of values in successive 500-Hz bins. n ] number of data points (number of syllables). Each graph is based on a single bird and is consistent with corresponding data from other individuals. The clumping of LV and Cornua data points in the upper left corner of plots for syllable 1 reflects the fact that _f_0 was low for most of the syllable’s duration so most video frames occurred during this portion of the syllable when hyoid movement was near its maximum. Because of its short duration, syllable 4 was captured by only two or three video frames. Only syllables that contained three video frames are included in the data and this only happened if the first frame coincided with the initial high _f_0 and the last frame was near the end of the down-sweep, giving the appearance of a “floor” or “ceiling” effect in some of the plots. The open triangle in LV movement of syllable 2 is not included in the regression line. This data point is from the last video frame of syllable 2, which was followed by a different syllable type starting at 3 kHz. We hypothesize that the bird began to increase the volume of its oropharyngeal cavity before the end of syllable 2, in preparation for the lower initial _f_0 of the next syllable.

Fig. 3.

Fig. 3.

Three-dimensional reconstructions of the OEC during syllable 1 and predicted vocal tract resonance curves at various vocal tract volumes. (A) At the beginning of syllable 1 (≈1.5 kHz), the OEC extends into the cranial end of the esophagus and attains a volume of 2 ml. (B) At the end of syllable 1 (≈5 kHz), the esophagus has collapsed and the volume of the cavity is reduced to 0.6 ml. We selected this frequently produced syllable because its long duration provided a more detailed measure, than did shorter syllables, of the relationship between vocal tract shape and _f_0. (C) Predicted resonance curves for the OEC of syllable 1 at volumes of 2 ml and small beak gape (solid purple); 1.2 ml and intermediate beak gape (solid dark blue), and 0.6 ml with wide beak gape (solid light blue). In other syllables, vocal tract resonance (dashed curves) could track _f_0 between 5 and 9 kHz by further decreasing cavity volume to 0.5 (green), 0.4, 0.3, or 0.2 (red) ml while holding other parameters, including a wide beak gape, glottal opening, and tracheal length constant. Arrows indicate tracheal resonances.

Fig. 4.

Fig. 4.

Sound levels of first, second, and third harmonics in an upward sweeping cardinal syllable. (A) Predicted resonances of a trachea (modeled as a stopped tube 44 mm long) superimposed on a schematic song syllable. (B) Sound level of first, second, and third harmonic, measured every 30 ms during a 310-ms-long FM upsweeping syllable type 2. Assuming the major resonance of the vocal tract tracks the f_0, these curves plot the peak sound level of a series of resonance curves such as those shown in Fig. 3_C. The _f_0 is 23 and 33 dB, respectively, above the mean levels of 2_f_0 and 3_f_0. Allowing for assumed differences in source level (see text), this observation suggests the OEC primary resonance may be responsible for ≈11–14 dB of the sound level difference between the _f_0 and its higher harmonics. Tracheal formants are associated with peaks in 3_f_0 near valleys in 2_f_0, as expected for the quarter wavelength resonance of a stopped tube and predicted by the model.

Similar articles

Cited by

References

    1. Nowicki S. Nature. 1987;325:53–55. - PubMed
    1. Fant G. Acoustic Theory of Speech Production. The Hague: Mouton; 1970.
    1. Westneat M. W., Long J., John H., Hoese W., Nowicki S. J. Exp. Biol. 1993;182:147–171. - PubMed
    1. Hoese W. J., Podos J., Boetticher N. C., Nowicki S. J. Exp. Biol. 2000;203:1845–1855. - PubMed
    1. Goller F., Mallinckrodt M. J., Torti S. D. J. Neurobiol. 2004;59:289–303. - PubMed

Publication types

MeSH terms

LinkOut - more resources