Antje Schweitzer - Profile on Academia.edu (original) (raw)

Papers by Antje Schweitzer

ICPhS, 2011

We present a method for investigating the temporal alignment of intonation events by parametrizin... more We present a method for investigating the temporal alignment of intonation events by parametrizing F0 contours. Results for three German single-speaker corpora and one American English multi-speaker corpus show that the speakers generally avoid to place peaks in syllable onsets. We suggest that this is a quantal effect [9] which results from the fact that syllable onsets are boundaries in tonal production.

Automatic speech understanding and speech synthesis, two of the major speech processing applicati... more Automatic speech understanding and speech synthesis, two of the major speech processing applications, impose strikingly different constraints and requirements on prosodic models. The prevalent models of prosody and intonation fail to offer a unified solution to these conflicting constraints. As a consequence, prosodic models have been applied only occasionally in end-toend automatic speech understanding systems; in contrast, they have been applied extensively in speech synthesis systems. In this paper we want to discuss the reasons for this state of affairs as well as possible strategies to overcome the shortcomings of the use of prosodic modelling in automatic speech processing.

Akustische Korrelate wahrgenommener Persönlichkeitsmerkmale und Stimmattraktivität

TUDpress, Dresden, 2017

Prosodic Theory and Practice, 2022

This paper describes current and future contents of the (GErman COnversations) conversations data... more This paper describes current and future contents of the (GErman COnversations) conversations database and promotes investigating the role of attention in phonetics research. GECO is freely available for non-commercial use. It consists of conversations of high-audio quality between female subjects, together with results of personality tests of each participant, and participants’ ratings of each other and of the conversation. To our knowledge it is currently the largest German database of this type. This corpus will be doubled in size by adding more dialogs in the next two years, and these new speech data will be complemented by results of several attention tests. Some of these tests will follow established test paradigms, but we also suggest a new, less artificial paradigm for testing attention. We describe the existing GECO corpus as well as the future additions including the proposed test in this paper.

This thesis explores perception and production of prosody by way of corpus experiments. Following... more This thesis explores perception and production of prosody by way of corpus experiments. Following Dogil/Mobius (2001) I suggest to apply Guenther and Perkell's speech production model for the segmental domain (Guenther 1995; Guenther et al. 1998; Perkell et al. 2001) to the prosodic domain. I suggest that Guenther and Perkell's model is compatible with exemplar theory (e.g. Lacerda 1995; Goldinger 1996, 1997, 1998; Johnson 1997; Pierrehumbert 2001, 2003), and that the target regions can be derived in an exemplar-theoretic fashion: They are implicitly derived from the range of values that is observed for stored exemplars in the relevant dimensions. I assume that the prosodic categories are the categories posited by GToBI(S) (Mayer 1995) in adaptation of the Tone Sequence Model (Pierrehumbert 1980) to German. As for the dimensions of the target regions pertaining to these categories, I suggest a measure of local speech rate, viz. duration z-scores, as the temporal dimension, a...

The acoustic properties of word stress have been explored in a number of studies. However, there ... more The acoustic properties of word stress have been explored in a number of studies. However, there is little research on German word stress, and even less on its realization in spontaneous speech. This paper tests whether parameters that have been found to implement word stress in mostly laboratory speech are also employed in a corpus of German spontaneous speech. Specifically, we consider spectral tilt, syllable duration and pitch. While the results for syllable duration conform with the prevalent finding that stressed syllables have a higher duration, we find no significant effect of pitch. In the case of spectral tilt however, we observe contradicting results, depending on the way we quantify tilt.

This contribution presents an approach to automatic prosodic annotation which emphasizes the ling... more This contribution presents an approach to automatic prosodic annotation which emphasizes the linguistic motivation and perceptual relevance of the features used for classifying the prosodic categories. The analyses and experiments presented here were conducted on a 2.5 hours German news-like corpus which had been manually annotated using GToBI(S) (Mayer, 1995). GToBI(S) is an adaptation of American English ToBI (Silverman et al., 1992; Beckman and Ayers, 1994) to German.

We present GRAIN (German RAdio INterviews) as part of the SFB732 Silver Standard Collection. GRAI... more We present GRAIN (German RAdio INterviews) as part of the SFB732 Silver Standard Collection. GRAIN contains German radio interviews and is annotated on multiple linguistic layers. The data has been processed with state-of-the-art tools for text and speech and therefore represents a resource for text-based linguistic research as well as speech science. While there is a gold standard part with manual annotations, the (much larger) silver standard part (which is growing as the radio station releases more interviews) relies completely on automatic annotations. We explicitly release different versions of annotations for the same layers (e.g. morpho-syntax) with the aim to combine and compare multiple layers in order to derive confidence estimations for the annotations. Therefore, parts of the data where the output of several tools match can be considered clear-cut cases, while mismatches hint at areas of interest which are potentially challenging or where rare phenomena can be found.

Speech Prosody 2018, 2018

Pitch accent detection often makes use of both acoustic and lexical features based on the fact th... more Pitch accent detection often makes use of both acoustic and lexical features based on the fact that pitch accents tend to correlate with certain words. In this paper, we extend a pitch accent detector that involves a convolutional neural network to include word embeddings, which are state-of-the-art vector representations of words. We examine the effect these features have on within-corpus and cross-corpus experiments on three English datasets. The results show that while word embeddings can improve the performance in corpus-dependent experiments, they also have the potential to make generalization to unseen data more challenging.

Speech Prosody 2018, 2018

In the present study, a corpus of short German sentences collected in a shadowing task was examin... more In the present study, a corpus of short German sentences collected in a shadowing task was examined with respect to pitch accent realization. The pitch accents were parameterized with the PaIntE model, which describes the f 0 contour of intonation events concerning their height, slope, and temporal alignment. Convergence was quantified as decrease in Euclidean distance, and hence increase in similarity, between the PaIntE parameter vectors. This was assessed for three stimulus types: natural speech, diphone based speech synthesis, or hidden Markov model (HMM) based speech synthesis. The factors tested in the analysis were experimental phase-was the sentence uttered before or while shadowing the model, accent type-a distinction was made between prenuclear and nuclear pitch accents, and sex of speaker & shadowed model. For the natural and HMM stimuli, Euclidean distance decreased in the shadowing task. This convergence effect did not depend on the accent type. However, prenuclear pitch accents showed generally lower values in Euclidean distance than nuclear pitch accents. Whether the sex of the speaker and the shadowed model matched did not explain any variance in the data. For the diphone stimuli, no convergence of pitch accents was observed.

Interspeech 2017, 2017

This study investigates how acoustic and lexical properties of spontaneous speech in dialogs affe... more This study investigates how acoustic and lexical properties of spontaneous speech in dialogs affect perceived social attractiveness in terms of speaker likeability, friendliness, competence, and self-confidence. We analyze a database of longer spontaneous dialogs between German female speakers and the mutual ratings that dialog partners assigned to one another after every conversation. Thus the ratings reflect long-term impressions based on dialog behavior. Using linear mixed models, we investigate both classical acoustic-prosodic and lexical parameters as well as parameters that capture the degree of speakers' adaptation, or "convergence", of these parameters to each other. Specifically we find that likeability is correlated with the speaker's lexical convergence as well as with her convergence in f0 peak height. Friendliness is significantly related to variation in intensity. For competence, the proportion of positive words in the dialog, variation in shimmer, and overall phonetic convergence are significant correlates. Self-confidence finally is related to several prosodic, phonetic, and lexical adaptation parameters. In some cases, the effect depends on whether interlocutors also had eye contact during their conversation. Taken together, these findings provide evidence that in addition to classical parameters, convergence parameters play an important role in the mutual perception of social attractiveness.

Interspeech 2017, 2017

In this paper we look at convergence and divergence in intonation in the context of social qualit... more In this paper we look at convergence and divergence in intonation in the context of social qualities. Specifically we examine pitch accent realisations in the GECO corpus of German conversations. Pitch accents are represented as 6-dimensional vectors where each dimension corresponds to a characteristic of the accent's shape. Convergence/divergence is then measured by calculating the distance between pitch accent realisations of conversational partners. A decrease of distance values over time indicates convergence, an increase divergence. The corpus comprises dialogue sessions in two modalities: partners either saw each other during the conversation or not. Linear mixed model analyses show convergence as well as divergence effects in the realisations of H*L accents. This convergence/divergence is strongly related to the modality and to how much speakers like their partners: generally, seeing the partner comes with divergence, whereas when the dialogue partners cannot see each other, there is convergence. The effect varies, however, depending on the extent to which a speaker likes their partner. Less liking entails a greater change in the realisations over timestronger divergence when partners could see each other, and stronger convergence when they could not.

Interspeech 2016, 2016

We motivate and test an exemplar-theoretic view of phonetic convergence, in which convergence eff... more We motivate and test an exemplar-theoretic view of phonetic convergence, in which convergence effects arise because exemplars just perceived in a conversation are stored in a speaker's memory, and used subsequently in speech production. Most exemplar models assume that production targets are established using stored exemplars, taking into account their frequency-and recency-influenced level of activation. Thus, convergence effects are expected to arise because the exemplars just perceived from a partner have a comparably high activation. However, in the case of frequent exemplars, this effect should be countered by the high frequency of already stored, older exemplars. We test this assumption by examining speech rate convergence in spontaneous speech by female German speakers. We fit two linear mixed models, calculating speech rate on the basis of either infrequent, or frequent, syllables, and predict a speaker's speech rate in a phrase by the partner's speech rate in the preceding phrase. As anticipated, we find a significant main effect indicating convergence only for the infrequent syllables. We also find an unexpected significant interaction of the partner's speech rate and the speaker's assessment of the partner in terms of likeability, indicating divergence, but again, only for the infrequent case.

Interspeech 2016, 2016

We investigate tone recognition in Vietnamese across gender and dialects. In addition to well-kno... more We investigate tone recognition in Vietnamese across gender and dialects. In addition to well-known parameters such as single fundamental frequency (F0) values and energy features, we explore the impact of harmonicity on recognition accuracy, as well as that of the PaIntE parameters, which quantify the shape of the F0 contour over complete syllables instead of providing more local single values. Using these new features for tone recognition in the GlobalPhone database, we observe significant improvements of approx. 1% in recognition accuracy when adding harmonicity, and of another approx. 4% when adding the PaIntE parameters. Furthermore, we analyze the influence of gender and dialect on recognition accuracy. The results show that it is easier to recognize tones for female than for male speakers, and easier for the Northern dialect than for the Southern dialect. Moreover, we achieve reasonable results testing models across gender, while the performance drops strongly when testing across dialects.

Prosody and Meaning

We propose that words and short phrases in English can also be stored with intonation contours th... more We propose that words and short phrases in English can also be stored with intonation contours they frequently occur with, along with their discourse meanings. We present a corpus study investigating intonational collocations, i .e. pairings of lexical phrases and accent contours, in a corpus of conversational speech, Switchboard. We developed a novel method to identify similar accent contours, using automatic pitch parameterisation (Möhler and Conkie, 1998) and clustering. We found that intonational collocations are widespread, accounting for up to 34% of all tokens, and 76% of frequent lexical types in our data. This shows prima facie evidence that word-contour pairs can be stored. A qualitative analysis of frequent intonational collocations showed they are used with very specific discourse meanings, consistent with our claim that they are stored. For each, lower frequency collocates with related discourse meanings could be identified, consistent with the exemplar-theoretic prediction that these discourse meanings can spread by analogy. Finally, we present the results of a perception experiment showing that frequent words collocated with a particular accent type, and lower frequency words related to the frequent collocates, are judged to sound more natural than low frequency, unrelated collocates of that accent type. This is consistent with the claim that frequency-based collocation is part of grammar, and affects language expectations (cf. Bybee and Eddington, 2006) .

An examplar-based hybrid model of phonetic adaptation

Exploring the relationship between intonation and the lexicon: Evidence for lexicalised storage of intonation

Speech Communication, 2015

ABSTRACT In Germanic languages like English and German, intonation is usually thought to be ‘post... more ABSTRACT In Germanic languages like English and German, intonation is usually thought to be ‘post-lexical’. That is, it is usually assumed that the choice of intonation contour and the form of the realised contour itself are largely independent of the words used. We present three corpus experiments which show clear evidence of lexical storage of intonation, contrary to these assumptions. Specifically, in each experiment, we show that distributional properties of words affect the prosodic realisation of those words, including accent and boundary placement, and the shape of pitch accents. The first experiment looks at the frequency of occurrence of a given word with a particular pitch accent type and its effect on the shape of accents on that word. We found that the more frequently a word and an accent type appear together, the greater the amplitude of the accent. The second experiment investigates the effect of both the absolute and relative frequency of occurrence of a given word with a particular accent type and their effect on the variability of the shape of these accents. We found that while absolute frequency increases the variability in pitch accent shape, relative frequency reduces it. The final experiment looks at the effect of the relative frequency of a word in its lexical (trigram) context on both variability in its prosodic context and on accent shape variability. We found that both kinds of prosodic variability decrease as the relative frequency of the word in its lexical context increases. We argue that all of these findings are expected within an exemplar approach assuming storage of tonal information with lexical items, and discuss the implications of this for the production and mental representation of intonation.

ICPhS, 2011

Akustische Korrelate wahrgenommener Persönlichkeitsmerkmale und Stimmattraktivität

TUDpress, Dresden, 2017

Prosodic Theory and Practice, 2022

Speech Prosody 2018, 2018

Interspeech 2017, 2017

Interspeech 2016, 2016

Prosody and Meaning

An examplar-based hybrid model of phonetic adaptation

Exploring the relationship between intonation and the lexicon: Evidence for lexicalised storage of intonation

Speech Communication, 2015