Speech Acoustics Research Papers - Academia.edu (original) (raw)

We quantified the recovery of voice following a 2-hour vocal loading exercise (oral reading). Eighty-six adult participants tracked their voice recovery using short vocal tasks and perceptual ratings after an initial vocal loading... more

We quantified the recovery of voice following a 2-hour vocal loading exercise (oral reading). Eighty-six adult participants tracked their voice recovery using short vocal tasks and perceptual ratings after an initial vocal loading exercise and for the following 2 days. Short-term recovery was apparent, with 90% recovery within 4 to 6 hours and full recovery at 12 to 18 hours. Recovery was shown to be similar to a dermal wound healing trajectory. The new recovery trajectory highlighted by the vocal loading exercise in the current study is called a vocal recovery trajectory. By comparing vocal fatigue to dermal wound healing, this trajectory is parallel to a chronic wound healing trajectory (as opposed to an acute wound healing trajectory). This parallel suggests that vocal fatigue from the daily use of the voice could be treated as a chronic wound, with the healing and repair mechanisms in a state of constant repair. In addition, there is likely a vocal fatigue threshold at which poi...

We investigate the use of the Riemannianoptimization method over the flag manifold in subspace ICA problems such as in-dependent subspace analysis (ISA) and complex ICA. In the ISA experiment, we use the Riemannian approach over the flag... more

We investigate the use of the Riemannianoptimization method over the flag manifold in subspace ICA problems such as in-dependent subspace analysis (ISA) and complex ICA. In the ISA experiment, we use the Riemannian approach over the flag manifold together ...

The introduction, in the late 70s, of the first digital spectrograph (DSP Sonograph) by Kay Elemetrics has improved the possibilities of spectroacoustic voice analysis in the clinical field. Thanks to the marketing, in 1993, of the Multi... more

The introduction, in the late 70s, of the first digital spectrograph (DSP Sonograph) by Kay Elemetrics has improved the possibilities of spectroacoustic voice analysis in the clinical field. Thanks to the marketing, in 1993, of the Multi Dimensional Voice Program (MDVP) advanced system, it is now possible to analyse 33 quantitative voice parameters which, in turn, allow evaluation of fundamental frequency, amplitude and spectral energy balance and the presence of any sonority gap and diplophony. Despite its potentials, the above-mentioned system is not widely used yet, partly on account of the lack of a standard procedure. Indeed, there are still only a few case reports in the literature taking into consideration prescriptive aspects related both to procedure and analysis. This study aims to provide the results of amplitude perturbation parameter analysis in euphonic adult patients. In our opinion, these are the most significant parameters in determining the severity of a phonation ...

Purpose To determine what research evidence exists to support the use of voice measures in the clinical assessment of patients with voice disorders. Method The American Speech-Language-Hearing Association (ASHA) National Center for... more

Purpose To determine what research evidence exists to support the use of voice measures in the clinical assessment of patients with voice disorders. Method The American Speech-Language-Hearing Association (ASHA) National Center for Evidence-Based Practice in Communication Disorders staff searched 29 databases for peer-reviewed English-language articles between January 1930 and April 2009 that included key words pertaining to objective and subjective voice measures, voice disorders, and diagnostic accuracy. The identified articles were systematically assessed by an ASHA-appointed committee employing a modification of the critical appraisal of diagnostic evidence rating system. Results One hundred articles met the search criteria. The majority of studies investigated acoustic measures (60%) and focused on how well a test method identified the presence or absence of a voice disorder (78%). Only 17 of the 100 articles were judged to contain adequate evidence for the measures studied to ...

For the assessment of speech privacy and security, a uniformly weighted 1/3-octave-band signal-to-noise ratio, clipped to-32 dB, has been found to be a good indicator of speech intelligibility, cadence and audibility. A parameter required... more

For the assessment of speech privacy and security, a uniformly weighted 1/3-octave-band signal-to-noise ratio, clipped to-32 dB, has been found to be a good indicator of speech intelligibility, cadence and audibility. A parameter required in the determination of the ratio for a listener in an adjoining room to the speech source is the average sound transmission loss of the separating construction. An estimation model for the average sound transmission loss of steel stud partitions has been developed and will be discussed.

Cast shadows cause serious problems in the functionality of vision-based applications, such as video surveillance, traffic monitoring and various other applications. Accurate detection and removal of cast shadows is a challenging task.... more

Cast shadows cause serious problems in the functionality of vision-based applications, such as video surveillance, traffic monitoring and various other applications. Accurate detection and removal of cast shadows is a challenging task. Common shadow detection techniques ...

The study is part of a series of studies which examine the acoustic correlates of lexical stress in several typologically different languages, in three speech styles: spontaneous speech, phrase reading, and wordlist reading. This study... more

The study is part of a series of studies which examine the acoustic correlates of lexical stress in several typologically different languages, in three speech styles: spontaneous speech, phrase reading, and wordlist reading. This study focuses on Czech, a language with stress fixed on the first syllable of a prosodic word, with no contrastive function at the level of individual words. The acoustic parameters examined here are F0-level, F0-variation, Duration, Sound Pressure Level, and Spectral Emphasis. Values for over 6,000 vowels were analyzed. Unlike the other languages examined so far, lexical stress in Czech is not manifested by clear prominence markings on the first, stressed syllable: the stressed syllable is neither higher, realized with greater F0 variation, longer; nor does it have a higher SPL or higher Spectral Emphasis. There are slight, but insignificant tendencies pointing to a delayed rise, that is, to higher values of some of the acoustic parameters on the second, post-stressed syllable. Since lexical stress does not serve a contrastive function in Czech, the absence of acoustic marking on the stressed syllable is not surprising.

Previous research has shown that the weighting of, or attention to, acoustic cues at the level of the segment changes over the course of development (Nittrouer & Miller, 1997; Nittrouer, Manning & Meyer, 1993). In this paper we examined... more

Previous research has shown that the weighting of, or attention to, acoustic cues at the level of the segment changes over the course of development (Nittrouer & Miller, 1997; Nittrouer, Manning & Meyer, 1993). In this paper we examined changes over the course of development in weighting of acoustic cues at the suprasegmental level. Specifically, we tested English-learning 4-month-olds’ performance on a clause segmentation task when each of three acoustic cues to clausal units was neutralized and contrasted it with performance on a Baseline condition where no cues were manipulated. Comparison with the reported performance of 6-month-olds on the same task (Seidl, 2007) reveals that 4-month-olds weight prosodic cues to clausal boundaries differently than 6-month-olds, relying more heavily on all three correlates of clausal boundaries (pause, pitch and vowel duration) than 6-month-olds do, who rely primarily on pitch. We interpret this as evidence that 4-month-olds use a holistic processing strategy, while 6-month-olds may already be able to attend separately to isolated cues in the input stream and may, furthermore, be able to exploit a language-specific cue weighting. Thus, in a way similar to that in other cognitive domains, infants begin as holistic auditory scene processors and are only later able to process individual auditory cues.

This study broaches in a novel way the analysis of cognitive impairment characteristic of the early stages of Alzheimer's Disease (AD). Specifically, we attempt to determine the acoustic speech parameters that are sensitive to the... more

This study broaches in a novel way the analysis of cognitive impairment characteristic of the early stages of Alzheimer's Disease (AD). Specifically, we attempt to determine the acoustic speech parameters that are sensitive to the onset of the disease, and their association with the language deficit characteristic of AD. Speech analysis was carried out on 21 elderly patients with AD using Praat software, which analyzes the acoustic components of speech. The data obtained were subjected to stepwise regression, using the overall scores obtained in the test as the criterion variable, and the scores on the frequency, amplitude and periodicity variables as predictors of performance. We found that the percentage of voiceless segments explains a significant portion of the variance in the overall scores obtained in the neuropsychological test. This component seems to be related mainly to the patient's ability in phonological fluency. This finding could permit the creation of a diagn...

In this article, we consider the binary partitioned approach with pattern index information, propose an neural network array (NNA) that performs the pattern recognition task by combining the binary partitioned approach with decision... more

In this article, we consider the binary partitioned approach with pattern index information, propose an neural network array (NNA) that performs the pattern recognition task by combining the binary partitioned approach with decision trees, and verify that the NNA can not only reduce the computation cost of training and recognition but also reduce the classification error rate. Speaker identification with the radial basis function neural network array (RBFNNA) is discussed in detail as an application of the NNA.

In the meeting case scenario, audio is often recorded using Multiple Distance Microphones (MDM) in a non-intrusive manner. Typically a beamforming is performed in order to obtain a single enhanced signal out of the multiple channels. This... more

In the meeting case scenario, audio is often recorded using Multiple Distance Microphones (MDM) in a non-intrusive manner. Typically a beamforming is performed in order to obtain a single enhanced signal out of the multiple channels. This paper investigate s the use of mutual information for selecting the channel subset that produces the lowest error in a diarization system. Conventional

Irony, as "quotation" and "fencing game," consists of an interactive script, grounded on a focal event "trigger," in which the dialogic comment shows the ironist's intention through an antiphrastic... more

Irony, as "quotation" and "fencing game," consists of an interactive script, grounded on a focal event "trigger," in which the dialogic comment shows the ironist's intention through an antiphrastic process and syncoding to "hit" the victim of the irony (blame by praise or praise by blame). Through acoustic analysis of the suprasegmental profiles of standard phrases inserted into inductors expressly composed and read by 50 naive subjects, the presence and nature of significant differences between sarcastic and kind irony in low- and high-context utterances (contextualization effect--Experiments 1 and 2) have been verified. It has also been observed that, where more "specific weight" is given to the linguistic stream (corrective irony hypothesis), a markedness of suprasegmental features emerges (correctivity effect--Experiment 3). Finally, comparison between sarcastic irony and blame and between kind irony and praise shows that the...

Non-word (NW) repetition in children with specific language impairment (SLI) is a skill related to, but genetically separate from, grammatical ability. Prosodic structure of the syllables may bridge the gap between these two abilities. A... more

Non-word (NW) repetition in children with specific language impairment (SLI) is a skill related to, but genetically separate from, grammatical ability. Prosodic structure of the syllables may bridge the gap between these two abilities. A NW repetition task was compared in a group of 15 preschool Italian children with SLI (ranged in age from 3;11 to 5;8) and 15 younger typically developing children (aged from 2;11 to 3;7) matched for mean length of utterance (TD-MLU). Grammatical ability was tested through a probe for direct-object clitic pronouns which is one of the most useful clinical markers in the Italian language. In NW repetition, children with SLI deleted more syllables than the TD-MLU children. The omission of weak syllables in a pre-stress position was a significant predictor of the omission of clitic pronouns. The present study shows that the link between grammar and NW is due to a prosodic characteristic that is more universally challenging in children with SLI.

Speckle is a granular noise that inherently exists in all types of coherent imaging systems. The presence of speckle in an image reduces the resolution of the image and the detectability of the target. Many speckle reduction algorithms... more

Speckle is a granular noise that inherently exists in all types of coherent imaging systems. The presence of speckle in an image reduces the resolution of the image and the detectability of the target. Many speckle reduction algorithms assume speckle noise is multiplicative. We instead model the speckle according to the exact physical process of coherent image formation. Thus, the

Vocal intensity is studied as a function of fundamental frequency and lung pressure. A combination of analytical and empirical models is used to predict sound pressure levels from glottal waveforms of five professional tenors and twenty... more

Vocal intensity is studied as a function of fundamental frequency and lung pressure. A combination of analytical and empirical models is used to predict sound pressure levels from glottal waveforms of five professional tenors and twenty five normal control subjects. The glottal waveforms were obtained by inverse filtering the mouth flow. Empirical models describe features of the glottal flow waveform (peak flow, peak flow derivative, open quotient, and speed quotient) in terms of lung pressure and phonation threshold pressure, a key variable that incorporates the Fo dependence of many of the features of the glottal flow. The analytical model describes the contributions to sound pressure levels SPL by the vocal tract. Results show that SPL increases with Fo at a rate of 8-9 dB/octave provided that lung pressure is raised proportional to phonation threshold pressure. The SPL also increases at a rate of 8-9 dB per doubling of excess pressure over threshold, a new quantity that assumes considerable importance in vocal intensity calculations. For the same excess pressure over threshold, the professional tenors produced 10-12 dB greater intensity than the male nonsingers, primarily because their peak airflow was much higher for the same pressure. A simple set of rules is devised for predicting SPL from source waveforms.