Hugo Quené | Utrecht University (original) (raw)

Uploads

Papers by Hugo Quené

2nd International Conference on Spoken Language Processing (ICSLP 1992)

HAL (Le Centre pour la Communication Scientifique Directe), Oct 1, 2021

International audienc

Conference of the International Speech Communication Association, 1998

Words in connected speech are often assimilated to subsequent words. Some property of that upcomi... more Words in connected speech are often assimilated to subsequent words. Some property of that upcoming word may then be determined in advance; these advance assimilatory cues may facilitate perception of that word. A gating experiment was conducted in Dutch, studying anticipatory voice assimilation between plosives, in 24 two-word combinations. In Dutch, voicing in a word-final plosive can only be caused by anticipatory assimilation to the next, voiced initial plosive, e.g. "rie[db]lint". Voiced and unvoiced variants of final and initial plosives were cross-spliced. Responses for assimilated, voiced-final stimuli show a strong bias to voiced-initial responses, as predicted. Even at longer gates in the hybrid condition "rie[dp]lint", after hearing the unvoiced initial plosive, listeners often came up with a voicedinitial response, with high confidence. Hence, advance phonological 'voiced-initial' cues were often stronger than acoustic 'unvoiced-initial' cues. These gating results suggest that listeners use advance assimilatory cues in word perception.

Nederlandse Taalkunde, 2021

In Moroccan Dutch, /s/ has been claimed to be pronounced as retracted [s] (towards /ʃ/) in certai... more In Moroccan Dutch, /s/ has been claimed to be pronounced as retracted [s] (towards /ʃ/) in certain consonant clusters. Recently, retracted s-pronunciation has also been attested in endogenous Dutch. We tested empirically whether Moroccan Dutch [s] is indeed more retracted than endogenous Dutch [s] in relevant clusters. Additionally, we tested whether the inter-speaker variation of /s/ is smaller between Moroccan Dutch speakers than between endogenous Dutch speakers, as expected if retraction of /s/ would be used as identity marker in in-group conversations in Moroccan Dutch. The [s] realizations of 21 young, male Moroccan Dutch and 21 endogenous Dutch speakers were analyzed. Analyses of the spectral centre of gravity (CoG) show that both groups of speakers had more retracted pronunciations of [s] in typically retracting contexts than in typically non-retracting contexts. However, Moroccan Dutch speakers had higher CoG in both contexts than endogenous Dutch speakers, refuting the str...

Informatie, 1989

Prosodische marker in цеп, zoals pau/erin [> en accentuer ing, helpcn lu isle га a rs bij hei ... more Prosodische marker in цеп, zoals pau/erin [> en accentuer ing, helpcn lu isle га a rs bij hei begrijpen van spraak. In dit ariikel hekijken we in hoeverre hel mogchjk is om de/. e markeringen (voor pau/es en accenten) aan le brcngen in lek s ten. Daartoc inaketi we gchruik van de prosodiache/. insslrueluur, die we probe re n te achterhalen met minimale syntaci foche analyse. Vervolgens gaan we in up de limiten die dit algori! me nog maakt. en op de loruinngelijkheden nin die fuulen op tc lossen.

The Journal of the Acoustical Society of America

It has been known for a long time and a wide variety of languages that vowel fundamental frequenc... more It has been known for a long time and a wide variety of languages that vowel fundamental frequency (F0) following voiceless obstruents tends to be significantly higher than F0 following voiced obstruents. There has been a long-standing debate about the cause of this phenomenon. Some evidence in previous work is more compatible with an articulatory account of this effect, while others support the auditory enhancement account. This paper investigates these consonant-related F0 perturbations in Dutch after initial fricatives (/v, f/) and stops (/b, p/), as compared to after the nasal /m/. Dutch is particularly interesting because it is a “true voicing” language, and because fricatives are currently undergoing a process of devoicing. Results show that F0 was raised after voiceless, but largely unaffected after voiced obstruents. Fricative voicing in /v/ and F0 level tend to covary: the less voicing in /v/, the higher F0 at onset. There was no trace of an active gesture to explicitly low...

In this paper we describe two experiments exploring possible reasons for earlier conflicting resu... more In this paper we describe two experiments exploring possible reasons for earlier conflicting results concerning the so-called word-onset effect in interactional segmental speech errors. Experiment 1 elicits errors in pairs of CVC real words with the SLIP technique. No word-onset effect is found. Experiment 2 is a tongue-twister experiment with lists of four disyllabic words. A significant wordonset effect is found. The conflicting results are not resolved. We also found that intervocalic consonants hardly ever interact with initial and final consonants, and that words sharing a stress pattern are a major factor in generating interactional errors.

The within-word and within-utterance time course of internal and external self-monitoring is inve... more The within-word and within-utterance time course of internal and external self-monitoring is investigated in a four-word tongue twister experiment eliciting interactional word initial and word medial segmental errors and their repairs. It is found that detection rate for both internal and external self-monitoring decreases from early to late both within words and within utterances. Also, offset-to-repair times are more often of 0 ms in initial than in medial consonants.

This workshop will introduce the R programming environment for statistical analysis. Contrary to ... more This workshop will introduce the R programming environment for statistical analysis. Contrary to SPSS which is procedure-oriented (commands are verbs, e.g. “compute”), R is object-oriented (objects are nouns, e.g. “factor”). In this workshop, we will try to ease the learning curve of using R for your data analysis. Experience with statistical software is NOT required! We will use data simulation as well as real data sets, to explore topics like t-tests, chi-square tests, and logistic regression. We will also show how R produces publication-quality figures. If time permits, we will also explore how to extend R with your own routines for analyzing and/or plotting data. You are encouraged to bring your own data set, if accompanied by a “codebook” file specifying variables (columns), data codes, etc. (Both files must be in plain ASCII).

Interspeech 2011, 2011

If English is used intensively as a lingua franca in a multilanguage community, do speakers then ... more If English is used intensively as a lingua franca in a multilanguage community, do speakers then converge towards a single common accent? This speech corpus allows for longitudinal study to investigate the question of convergence by means of repeated speech recordings of students at an English-language college over a period of 5 years. We describe the content and collection of the corpus and the type of research that is envisaged, as well as tools used to manage and analyze the recordings, including automatic phone recognition for prosodic analyses; and intelligibility experiments using the SRT method.

Experimental linguistics and phonetics, 2014

The Journal of the Acoustical Society of America, 2014

Does an audible frown or smile affect speech comprehension? Previous research suggests that a spo... more Does an audible frown or smile affect speech comprehension? Previous research suggests that a spoken word is recognized faster if its audible affect (frown or smile) matches its semantic valence. In the present study, listeners' task was to evaluate the valence of spoken affective sentences. Formants were raised or lowered using LPC to convey an audible smile or frown gesture co-produced with the stimulus speech. A crucial factor was the talker's perspective in the event being described verbally, in either first or third person. With first-person sentences, listeners may relate the talker's affective state (simulated by formant shift) to the valence of the utterance. For example, in “I have received a prize,” a smiling articulation is congruent with the talker having experienced a happy event. However, with third-person sentences (“he has received a prize”), listeners cannot relate the talker's affective state to the described event. (In this example, the talker'...

Language Learning & Language Teaching, 2010

Analysis and Synthesis of Speech

In many text-to-speech systems, sentence prosody is derived by string-oriented rules, which are o... more In many text-to-speech systems, sentence prosody is derived by string-oriented rules, which are often rather ad hoc and linguistically unsound. By contrast, a system which employs a syn tactic parser allows for highly general and computationaily simple procedures defining the mapping between syntax and prosody. The program described in this paper, PROS2, is a first attempt towards a "next generation". First, the input sentence is parsed by applying phrase structure rules. Next, the syntactic representation is mapped onto a metrical tree, augmented with "focus" markers. The metrical tree provides an abstract characterization of accent and phrasing. While the program, at the time of writing, is still at the prototype stage, it holds considerable promise for further development and can be expanded in several directions.

The Journal of the Acoustical Society of America, 2012

Previous reports on the relationship between clear speech acoustic changes and the clear speech i... more Previous reports on the relationship between clear speech acoustic changes and the clear speech intelligibility benefit for vowels have used an “extreme groups” design, comparing talkers who produced a large clear speech benefit to talkers who produced little or no clear speech benefit. In Ferguson and Kewley-Port (2007), 12 talkers from the Ferguson Clear Speech Database (Ferguson, 2004) were assigned to groups based on the vowel identification performance of young normal-hearing listeners, while Ferguson (2010) chose 20 talkers based on the performance of elderly hearing-impaired listeners. The present investigation is employing mixed-effects models to examine relationships among acoustic and perceptual data obtained for vowels produced by all 41 talkers of the Ferguson database. Acoustic data for the 1640 vowel tokens (41 talkers X 10 vowels X 2 tokens X two speaking styles) include vowel duration, vowel space, and several different measures of dynamic formant movement. Perceptua...

English World-Wide, 2013

It has been asserted that a common European variety of English is currently emerging. This so-cal... more It has been asserted that a common European variety of English is currently emerging. This so-called “European English” is claimed to be the result of convergence among non-native English speakers, and to reflect a gradual abandonment of Inner Circle norms, which are deemed to be increasingly irrelevant to non-native speakers’ communicative needs. Evidence is so far lacking that Europeans judge each other’s proficiency in English by anything other than native-speaker standards — particularly as regards pronunciation. Nonetheless, it would be interesting to establish whether European non-native speakers of English demonstrated convergence when evaluating the pronunciation of fellow Europeans, and in this respect deviated significantly not only from Inner Circle English native speakers but also from non-European judges. To investigate this possibility, a large-scale Internet survey was carried out in which different groups of users of English (native and non-native, European and non-E...

Without intonation, this causal relation might be interpreted in multiple ways. The speaker might... more Without intonation, this causal relation might be interpreted in multiple ways. The speaker might reason,“There would be no parking either on one of those sides”. This conclusion is then based on the argument,“It's street sweeping day, or something”. This causal relation is interpreted as such by the speaker herself. We can describe such a causal relation as relatively subjective (cf. eg Pander Maat & Sanders 2001, Pander Maat & Degand 2001a, Pit 2003, Verhagen 2005a).

Linguistics in the Netherlands, 1989

1. Introduction [n recent years, considerable progress has been made in the construction of lingu... more 1. Introduction [n recent years, considerable progress has been made in the construction of linguistic components of text-to-speech conversion systems for Dutch, especially in the domains of grapheme-to^ phoneme conversion, both rule-based and lexicon-based (Kerkhoff, Wester, and Boves 1984; Berendsen, Langeweg, and Van Leeuwen 1986; Daelemans 1987; Lammens 1987; Baart and Heemskerk 1988). Furthermore, F0-contours can be automatically generated on the basis of the gTammar of Dutch intonation ('t Han ...

2nd International Conference on Spoken Language Processing (ICSLP 1992)

HAL (Le Centre pour la Communication Scientifique Directe), Oct 1, 2021

International audienc

Conference of the International Speech Communication Association, 1998

Nederlandse Taalkunde, 2021

Informatie, 1989

The Journal of the Acoustical Society of America

Interspeech 2011, 2011

Experimental linguistics and phonetics, 2014

The Journal of the Acoustical Society of America, 2014

Language Learning & Language Teaching, 2010

Analysis and Synthesis of Speech

The Journal of the Acoustical Society of America, 2012

English World-Wide, 2013

Linguistics in the Netherlands, 1989

This tutorial will introduce the R programming environment for statistical analysis. Contrary to ... more This tutorial will introduce the R programming environment for statistical analysis. Contrary to SPSS which is procedure-oriented (commands are verbs, e.g. “compute”), R is object-oriented (objects are nouns, e.g. “factor”). In this workshop, we will try to ease the learning curve of using R for your data analysis. Experience with statistical software is NOT required! We will use data simulation as well as real data sets, to explore topics like t-tests, χ2 tests, and regression. We will also show how R produces publication-quality figures.

Abstract This workshop will introduce the R programming environment for statistical analysis. Con... more Abstract This workshop will introduce the R programming environment for statistical analysis. Contrary to SPSS which is procedure-oriented (commands are verbs, eg “compute”), R is object-oriented (objects are nouns, eg “factor”). In this workshop, we will try to ease the learning curve of using R for your data analysis. Experience with statistical software is NOT required! We will use data simulation as well as real data sets, to explore topics like t-tests, chi-square tests, and logistic regression.

We test some predictions derived from a computational implementation (Hartsuiker & Kolk, 2001) of... more We test some predictions derived from a computational implementation (Hartsuiker & Kolk, 2001) of the dual perceptual loop theory of self-monitoring (Levelt 1989). Main predictions are the following: (1) Distributions for both word initial and word medial consonant errors of error-to-cutoff times are truncated close to zero ms, with the virtual peak of the truncated distributions corresponding to zero ms; (2) Distributions of cutoff-to-repair times for both initial and medial consonant errors are censored at zero ms, with no difference between initial and medial consonant errors in the estimated peaks. (3) Within both spoken words and spoken utterances rate of error detection decreases from earlier to later. These predictions were tested in a four-word tongue-twister experiment eliciting interactional segmental speech errors in initial and medial consonants in two-syllable CVCVC Dutch words.
Results show that (1) Error-to-cutoff times are truncated close to zero ms, but the peak of the distribution does not correspond to zero ms, but to the low end of the distribution. This implies that after internal error detection interrupting takes more time than speech initiation. (2) Cutoff-to-repair times are censored at zero ms, but the relative number of cutoff-to-repair times of zero ms is significantly lower for medial than for initial errors. This means that on average repairing takes longer for medial than for initial errors. (3) Rate of error detection is lower for medial than for initial consonant errors and decreases from the first to the fourth word in the utterance. This suggests that selective attention for self-monitoring decreases from earlier to later both within words and within utterances. These findings confirm some major properties of the Hartsuiker and Kolk implementation of Levelt’s model, but also show that in quantitative details the model can be improved on.

Smiling while talking can be perceived not only visually but also audibly. Several acoustic-phone... more Smiling while talking can be perceived not only visually but also audibly. Several acoustic-phonetic properties have been found to cue smiling in the acoustic signal. The aim of this study was to validate properties associated with smiled speech using a natural video corpus. The realisations of monophthongs of the same words spoken with and without a visible smile were compared. The results show a significant increase of intensity (for all words), of F2 (for words with the round vowel /o:/) and of F0 (for all words except the backchannel marker ja) in the smiled condition.

Consonants in word onsets are more often than other consonants involved in interactional speech e... more Consonants in word onsets are more often than other consonants involved in interactional speech errors. This has been explained from the process of speech preparation or from the higher degree of activation of initial versus other consonants, or from phonotactic constraints on speech errors. Here we report a tongue-twister experiment showing (a) that words in each other’s immediate context produce more interactional errors if the words share their stress patterns than if they do not, (b) a considerable and highly significant word-onset effect that cannot be explained from phonotactic constraints on speech errors. The latter effect is explained as a frequency effect.

Disfluencies, such as uh and uhm, are known to help the listener in speech comprehension. For ins... more Disfluencies, such as uh and uhm, are known to help the listener in speech comprehension. For instance, disfluencies may elicit prediction of less accessible referents and may trigger listeners’ attention to the following word. However, recent work suggests differential processing of disfluencies in native and non-native speech. The current study investigated whether the beneficial effects of disfluencies on listeners’ attention are modulated by the (non-)native identity of the speaker. Using the Change Detection Paradigm, we investigated listeners’ recall accuracy for words presented in disfluent and fluent contexts, in native and non-native speech. We observed beneficial effects of both native and non-native disfluencies on listeners’ recall accuracy, suggesting that native and non-native disfluencies trigger listeners’ attention in a similar fashion.

In this paper we describe two experiments exploring possible for reasons for earlier conflicting ... more In this paper we describe two experiments exploring possible for reasons for earlier conflicting results concerning the so-called word-onset effect in interactional segmental speech errors. Experiment 1 elicits errors in pairs of CVC real words with the SLIP technique. No word-onset effect is found. Experiment 2 is a tongue-twister experiment with lists of four disyllabic words. A significant word-onset effect is found. The conflicting results are not resolved. We also found that intervocalic consonants hardly ever interact with initial and final consonants, and that words sharing a stress pattern are a major factor in generating interactional errors.

Listeners understand a spoken sentence faster, if the talker’s phonetic affect is congruent with ... more Listeners understand a spoken sentence faster, if the talker’s phonetic affect is congruent with the sentence meaning.This only applies, however, to 1st-person sentences (I...). In 3rd- person sentences (He/She...), phonetic-semantic congruence has no effect, i.e., listeners do not use phonetic affect to predict sentence meaning. Pragmatic and phonetic cues are combined immediately in sentence comprehension.

Most segmental speech errors probably are articulatory blends of competing segments. Perceptual c... more Most segmental speech errors probably are articulatory blends of competing segments. Perceptual consequences were studied in listeners' reactions to misspoken segments. 291 speech fragments containing misspoken initial consonants plus 291 correct control fragments, all stemming from earlier SLIP experiments, were presented for identification to listeners. Results show that misidentifications (i.e. deviations from an earlier auditory transcription) are rare (3%), but reaction times to correctly identified fragments systematically reflect differences between correct controls, undetected, early detected and late detected speech errors, leading to the following speculative conclusions: (1) segmental errors begin their life in inner speech as full substitutions, and competition with correct target segments often is slightly delayed; (2) in early interruptions speech is initiated before competing target segments are activated, but then rapidly interrupted after error detection; (3) late detected errors reflect conflict-based monitoring of articulation or monitoring overt speech.

Smiling during talking yields speech with higher formants, and hence larger formant dispersion. P... more Smiling during talking yields speech with higher formants, and hence larger formant dispersion. Previous studies have shown that motor resonance during perception of words related to smiling can activate muscles responsible for the smiling action. If word perception causes smiling activation for such smile-related words, then this motor resonance may occur also during production, resulting in larger formant dispersion in these smile-related words. This paper reports on formant measurements from tokens of the Corpus of Spoken Dutch. Formant values of smile-related word tokens were compared to semantically different but phonetically similar word tokens. Results suggest that formant dispersion is indeed larger in smile-related words than in control words, although the predicted difference was observed only for female speakers. These findings suggest that motor resonance originating from a word’s meaning may affect the articulatory and acoustic realization of affective spoken words. Female speakers tend to produce the word smile with a smile.

Abstract Dutch language users often insert an inflectional schwa after an adverb, in certain gram... more Abstract Dutch language users often insert an inflectional schwa after an adverb, in certain grammatical constructions. The main hypothesis here is that this insertion, which is often ungrammatical, is driven by speakers' tendency towards regular speech rhythm, which overrides the fine grammatical nuances conveyed by absence of inflection. This rhythmicity hypothesis was investigated in a huge text corpus, viz. all web pages written in Dutch.

XVIth Congress of Phonetic …, Jan 1, 2007

Dominant models of speech production have employed linguistic components with a high degree of fu... more Dominant models of speech production have employed linguistic components with a high degree of functional specialization, positing that phonological processing occurs only after semantic and syntactic processing has been completed. It has also been claimed, however, that processing at the phonological level can affect syntactic structure. The experimental results presented here show effects on Dutch syntactic structure that appear to be related to separate factors of definiteness and prosodic weight.

ABSTRACT Words in connected speech are often assimilated to subsequent words. Some property of th... more ABSTRACT Words in connected speech are often assimilated to subsequent words. Some property of that upcoming word may then be determined in advance; these advance assimilatory cues may facilitate perception of that word. A gating experiment was conducted in Dutch, studying anticipatory voice assimilation between plosives, in 24 two-word combinations. In Dutch, voicing in a word-final plosive can only be caused by anticipatory assimilation to the next, voiced initial plosive, eg “rie [db] lint”.

This study investigates whether subjects use a strategy of word recognition in a rhyme-monitoring... more This study investigates whether subjects use a strategy of word recognition in a rhyme-monitoring task. Results suggest that this is indeed the case. In addition, however, the task introduces an effect of phonological priming of the cue word.

0. Abstract Several studies (Gibson, Pearlmutter, Canseco-Gonzalez & Hickok 1996, Walter & Hemfor... more 0. Abstract Several studies (Gibson, Pearlmutter, Canseco-Gonzalez & Hickok 1996, Walter & Hemforth 1998, Wijnen 1998) have shown that attaching a relative clause to the second NP in a threefold ambiguous context (NP1-prep-NP2-prep-NP3-RC) is strongly dispreferred. Gibson et al.(1996) argue that RC attachment is determined by two competing structural parsing principles, Predicate Proximity and Recency Preference.

ABSTRACT If speakers repeat a phrase, their speech tends to be highly rhythmical. In a similar ta... more ABSTRACT If speakers repeat a phrase, their speech tends to be highly rhythmical. In a similar task without repetition, no such rhythmicity was found. This study investigates whether stress predictability, varied between tasks, affects speech rhythm. Results show that a regular speech rhythm emerges in repeated phrases (with highly predictable stress pattern), but not in other tasks without repetition (medium and low predictability of stress).

ABSTRACT In a classical SLIP task spoonerisms are elicited with either a lexical or a nonlexical ... more ABSTRACT In a classical SLIP task spoonerisms are elicited with either a lexical or a nonlexical outcome. If the frequency of a particular class of responses is affected by the lexicality of the expected spoonerisms, this indicates that many such responses have replaced elicited spoonerisms in inner speech. This is shown in early interrupted speech errors and in completed speech errors that deviate from the elicited spoonerisms. Keywords: Speech errors, lexical bias, feedback, self-monitoring, inner speech.

ABSTRACT Even when words are normally assimilated in connected speech, listeners can recognise th... more ABSTRACT Even when words are normally assimilated in connected speech, listeners can recognise them easily. Word recognition may be so robust against assimilation, because listeners expect certain assimilation phenomena (given speech rate and style). From this hypothesis, we predict that listeners have more difficulty in recognising (unexpectedly) un-assimilated words than assimilated ones. This difficulty would even be greater when listening to fast speech.

The ESCA Workshop on Speech Synthesis, Jan 1, 1991

ABSTRACT A phrase-level comparison of three transcriptions shows 8% deviation between naturally p... more ABSTRACT A phrase-level comparison of three transcriptions shows 8% deviation between naturally produced and theoretically predicted accentuations, which is mainly due to variation in pragmatic factors. Deviations between these two templates and the output of an accentuation algorithm are only slightly larger (9% and 11%). Hence, part of the latter deviations may fall within the limits of freedom for pragmatic factors. Some remaining errors could be solved by improving the&amp;#x27;given&amp;#x27;/&amp;#x27;new&amp;#x27;distinction.

Quantitative Research Cheat Sheet