Uwe Reichel | Hungarian Academy of Sciences (original) (raw)

Papers by Uwe Reichel

Research paper thumbnail of Removing micromelody from fundamental frequency contours

In this paper we describe a new method to diminish microprosodic components of fundamental freque... more In this paper we describe a new method to diminish microprosodic components of fundamental frequency contours by applying weight functions linked to microprosodically classified phone combinations. For vowel segments in obstruent environments our algorithm outperforms standard smoothing algorithms like Moving-Average filtering, Savitzky-Golay filtering or MOMEL in diminishing F0 variations related to microprosodic factors while retaining significant differences related to macroprosody. Index Terms: microprosody, smoothing, intonation 1.

Research paper thumbnail of Automatic correction of part-of-speech corpora

In this study a simple method for automatic correction of part-of- speech corpora is presented, w... more In this study a simple method for automatic correction of part-of- speech corpora is presented, which works as follows: Initially two or more already available part-of-speech taggers are applied on the data. Then a sample of differing outputs is taken to train a classifi er to predict for each difference which of the taggers (if any) delivered t he correct out- put. As classifiers we employed instance-based learning, a C 4.5 decision tree and a Bayesian classifier. Their performances ranged fr om 59.1 % to 67.3 %. Training on the automatically corrected data final ly lead to significant improvements in tagger performance.

Research paper thumbnail of Multi-Tier Annotations in the Verbmobil Corpus

Language Resources and Evaluation, 2002

In very large and diverse scientific projects where as different groups as linguists and engineer... more In very large and diverse scientific projects where as different groups as linguists and engineers with different intentions work on the same signal data or its orthographic transcript and annotate new valuable information, it will not be easy to build a homogeneous corpus. We will describe how this can be achieved, considering the fact that some of these annotations have

Research paper thumbnail of A grammar of intonational units in German digit numbers

Research paper thumbnail of Improving text-based prediction of German phrase level

Research paper thumbnail of Automated Morphological Segmentation and Evaluation

Language Resources and Evaluation, 2000

In this paper we introduce (i) a new method for morphological segmentation of part of speech labe... more In this paper we introduce (i) a new method for morphological segmentation of part of speech labelled German words and (ii) some measures related to the MDL principle for evaluation of morphological segmentations. The segmentation algorithm is capable to discover hierarchical structure and to retrieve new morphemes. It achieved 75 % recall and 99 % precision. Regarding MDL based evaluation,

Research paper thumbnail of Parameterization of F0 register and discontinuity to predict prosodic boundary strength in Hungarian spontaneous speech

This study addresses the questions how to parameterize (1) aspects of fundamental frequency (F0) ... more This study addresses the questions how to parameterize (1) aspects of fundamental frequency (F0) register, i.e. time-varying F0 level and range within prosodic phrases and (2) F0 discontinuities at prosodic boundaries in order to predict perceived prosodic boundary strength in Hungarian spontaneous speech. For F0 register stylization we propose a new fitting procedure for base-, mid-, and toplines that does not require error-prone local peak and valley detection and is robust against disturbing influences of high pitch accents and boundary tones. From these linear stylizations we extracted features which reflect F0 boundary discontinuities and fitted stepwise linear regression and regression tree models to predict perceived boundary strength. In a ten-fold cross-validation the mean correlation between predictions and human judgments amounts up to 0.8.

Research paper thumbnail of Syllable cut and energy contours in vowels: a comparative study of German and Hungarian

Syllable cut is said to be a phonologically distinctive feature in some languages where the diffe... more Syllable cut is said to be a phonologically distinctive feature in some languages where the difference in vowel quantity is accompanied by a difference in vowel quality like in German. There have been several attempts to find the corresponding phonetic correlates for syllable cut, from which the energy measurements of vowels by Spiekermann (2000) proved appropriate for explaining the difference between long, i.e. smoothly, and short, i.e. abruptly cut, vowels: in smoothly cut vowels, a larger number of peaks was counted in the energy contour which were located further back than in abruptly cut segments, and the overall energy was more constant throughout the entire nucleus. On this basis, we intended to compare German as a syllable cut language and Hungarian where the feature was not expected to be relevant. However, the phonetic correlates of syllable cut found in this study do not entirely confirm Spiekermann's results. It seems that the energy features of vowels are more stro...

Research paper thumbnail of Comparing human and machine vowel classification

In this study we compare human ability to identify vowels with a machine learning approach. A per... more In this study we compare human ability to identify vowels with a machine learning approach. A percep-tion experiment for 14 Hungarian vowels in isolation and embedded in a carrier word was accomplished, and a C4.5 decision tree was trained on the same material. A comparison between the identification results of the subjects and the classifier showed that in three of four conditions (isolated vowel quantity and identity, embedded vowel identity) the perfor-mance of the classifier was superior and in one con-dition (embedded vowel quantity) equal to the sub-jects' performance. This outcome can be explained by perceptual limits of the subjects and by stimulus properties. The classifier's performance was signif-icantly weakened by replacing the continuous spec-tral information by binary 3-Bark thresholds as pro-posed in phonetic literature [8]. Parts of the resulting decision trees can be interpreted phonetically, which could qualify this classifier as a tool for phonetic re-sea...

Research paper thumbnail of Parameterization and automatic labeling of Hungarian intonation

In Hungarian intonation research the goal of a common frame-work developed by Varga (2002; [1]) i... more In Hungarian intonation research the goal of a common frame-work developed by Varga (2002; [1]) is to categorize the intonation within the domain of accent groups by character contours. We propose a linear parameterization of a subset of these con-tours derived from polynomial stylization. These parameters were used to train classification trees and support vector ma-chines for contour prediction. Parameter extraction and train-ing was carried out on the original F0 contours of spontaneous speech data as well as on three differently normalized variants suppressing fundamental frequency level and range effects. The highest accuracies were obtained for classification trees and F0 residuals after midline subtraction, but the overall performances were rather poor. Nevertheless, a significant improvement of the results was achieved by a Hidden Markov model to predict the correct label sequence from the partly erroneous classification output.

Research paper thumbnail of Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

We present a Markov part-of-speech tagger for which the P (wjt) emission probabilities of word w ... more We present a Markov part-of-speech tagger for which the P (wjt) emission probabilities of word w given tag t are replaced by a linear interpolation of tag emission probabilities given a list of represen- tations of w. As word representations, string suxes of w are cut o at the local maxima of the Normalized Backward Successor Variety. This procedure allows for the derivation of linguistically meaningful string suf- xes that may relate to certain POS labels. Since no linguistic knowledge is needed, the procedure is language independent. Basic Markov model part-of-speech taggers are signican tly outperformed by our model.

Research paper thumbnail of Syllable cut and energy contour: a contrastive study of German and Hungarian

Syllable cut is said to be a phonologically distinctive feature in some languages where the diffe... more Syllable cut is said to be a phonologically distinctive feature in some languages where the difference in vowel quantity is accompanied by a difference in vowel quality like in German. There have been several attempts to find the corresponding phonetic correlates for syllable cut, from which the energy measurements of vowels by Spiekermann proved appropriate for explaining the difference between long and short vowels. On this basis, we intended to compare German as a syllable cut language and Hungarian where the feature was not expected to be relevant. However, the phonetic correlates of syllable cut found in this study do not entirely confirm Spiekermann's results. It seems that the energy features of vowels are more strongly connected to their duration than to their quality.

Research paper thumbnail of Syllable cut and energy contour in vowels: a comparative study on German and Hungarian

Syllable cut is said to be a phonologically distinctive feature in some languages where the diffe... more Syllable cut is said to be a phonologically distinctive feature in some languages where the difference in vowel quantity is accompanied by a difference in vowel quality like in German. There have been several attempts to find the corresponding phonetic correlates for syllable cut, from which the energy measurements of vowels by Spiekermann proved appropriate for explaining the difference between long, i.e. smoothly, and short, i.e. abruptly cut, vowels: in smoothly cut vowels, a larger number of peaks was counted in the energy contour which were located further back than in abruptly cut segments, and the overall energy was more constant throughout the entire nucleus. On this basis, we intended to compare German as a syllable cut language and Hungarian where the feature was not expected to be relevant. However, the phonetic correlates of syllable cut found in this study do not entirely confirm Spiekermann's results. It seems that the energy features of vowels are more strongly connected to their duration than to their quality.

Research paper thumbnail of Is Hungarian Losing the Vowel Quantity Distinction

Proceedings of the …, 2008

Standard Hungarian contains 14 vowels: [i, i:, y, y:, u, u:, ø, ø:, o, o:, E, e:, 6, a:]. There a... more Standard Hungarian contains 14 vowels: [i, i:, y, y:, u, u:, ø, ø:, o, o:, E, e:, 6, a:]. There are two competing views on the role that quantity plays in the system. According to the phonetic concept, the vowel system includes nine vowels—as there are nine different vowel ...

Research paper thumbnail of Removing micromelody from fundamental frequency contours

In this paper we describe a new method to diminish microprosodic components of fundamental freque... more In this paper we describe a new method to diminish microprosodic components of fundamental frequency contours by applying weight functions linked to microprosodically classified phone combinations. For vowel segments in obstruent environments our algorithm outperforms standard smoothing algorithms like Moving-Average filtering, Savitzky-Golay filtering or MOMEL in diminishing F0 variations related to microprosodic factors while retaining significant differences related to macroprosody. Index Terms: microprosody, smoothing, intonation 1.

Research paper thumbnail of Automatic correction of part-of-speech corpora

In this study a simple method for automatic correction of part-of- speech corpora is presented, w... more In this study a simple method for automatic correction of part-of- speech corpora is presented, which works as follows: Initially two or more already available part-of-speech taggers are applied on the data. Then a sample of differing outputs is taken to train a classifi er to predict for each difference which of the taggers (if any) delivered t he correct out- put. As classifiers we employed instance-based learning, a C 4.5 decision tree and a Bayesian classifier. Their performances ranged fr om 59.1 % to 67.3 %. Training on the automatically corrected data final ly lead to significant improvements in tagger performance.

Research paper thumbnail of Multi-Tier Annotations in the Verbmobil Corpus

Language Resources and Evaluation, 2002

In very large and diverse scientific projects where as different groups as linguists and engineer... more In very large and diverse scientific projects where as different groups as linguists and engineers with different intentions work on the same signal data or its orthographic transcript and annotate new valuable information, it will not be easy to build a homogeneous corpus. We will describe how this can be achieved, considering the fact that some of these annotations have

Research paper thumbnail of A grammar of intonational units in German digit numbers

Research paper thumbnail of Improving text-based prediction of German phrase level

Research paper thumbnail of Automated Morphological Segmentation and Evaluation

Language Resources and Evaluation, 2000

In this paper we introduce (i) a new method for morphological segmentation of part of speech labe... more In this paper we introduce (i) a new method for morphological segmentation of part of speech labelled German words and (ii) some measures related to the MDL principle for evaluation of morphological segmentations. The segmentation algorithm is capable to discover hierarchical structure and to retrieve new morphemes. It achieved 75 % recall and 99 % precision. Regarding MDL based evaluation,

Research paper thumbnail of Parameterization of F0 register and discontinuity to predict prosodic boundary strength in Hungarian spontaneous speech

This study addresses the questions how to parameterize (1) aspects of fundamental frequency (F0) ... more This study addresses the questions how to parameterize (1) aspects of fundamental frequency (F0) register, i.e. time-varying F0 level and range within prosodic phrases and (2) F0 discontinuities at prosodic boundaries in order to predict perceived prosodic boundary strength in Hungarian spontaneous speech. For F0 register stylization we propose a new fitting procedure for base-, mid-, and toplines that does not require error-prone local peak and valley detection and is robust against disturbing influences of high pitch accents and boundary tones. From these linear stylizations we extracted features which reflect F0 boundary discontinuities and fitted stepwise linear regression and regression tree models to predict perceived boundary strength. In a ten-fold cross-validation the mean correlation between predictions and human judgments amounts up to 0.8.

Research paper thumbnail of Syllable cut and energy contours in vowels: a comparative study of German and Hungarian

Syllable cut is said to be a phonologically distinctive feature in some languages where the diffe... more Syllable cut is said to be a phonologically distinctive feature in some languages where the difference in vowel quantity is accompanied by a difference in vowel quality like in German. There have been several attempts to find the corresponding phonetic correlates for syllable cut, from which the energy measurements of vowels by Spiekermann (2000) proved appropriate for explaining the difference between long, i.e. smoothly, and short, i.e. abruptly cut, vowels: in smoothly cut vowels, a larger number of peaks was counted in the energy contour which were located further back than in abruptly cut segments, and the overall energy was more constant throughout the entire nucleus. On this basis, we intended to compare German as a syllable cut language and Hungarian where the feature was not expected to be relevant. However, the phonetic correlates of syllable cut found in this study do not entirely confirm Spiekermann's results. It seems that the energy features of vowels are more stro...

Research paper thumbnail of Comparing human and machine vowel classification

In this study we compare human ability to identify vowels with a machine learning approach. A per... more In this study we compare human ability to identify vowels with a machine learning approach. A percep-tion experiment for 14 Hungarian vowels in isolation and embedded in a carrier word was accomplished, and a C4.5 decision tree was trained on the same material. A comparison between the identification results of the subjects and the classifier showed that in three of four conditions (isolated vowel quantity and identity, embedded vowel identity) the perfor-mance of the classifier was superior and in one con-dition (embedded vowel quantity) equal to the sub-jects' performance. This outcome can be explained by perceptual limits of the subjects and by stimulus properties. The classifier's performance was signif-icantly weakened by replacing the continuous spec-tral information by binary 3-Bark thresholds as pro-posed in phonetic literature [8]. Parts of the resulting decision trees can be interpreted phonetically, which could qualify this classifier as a tool for phonetic re-sea...

Research paper thumbnail of Parameterization and automatic labeling of Hungarian intonation

In Hungarian intonation research the goal of a common frame-work developed by Varga (2002; [1]) i... more In Hungarian intonation research the goal of a common frame-work developed by Varga (2002; [1]) is to categorize the intonation within the domain of accent groups by character contours. We propose a linear parameterization of a subset of these con-tours derived from polynomial stylization. These parameters were used to train classification trees and support vector ma-chines for contour prediction. Parameter extraction and train-ing was carried out on the original F0 contours of spontaneous speech data as well as on three differently normalized variants suppressing fundamental frequency level and range effects. The highest accuracies were obtained for classification trees and F0 residuals after midline subtraction, but the overall performances were rather poor. Nevertheless, a significant improvement of the results was achieved by a Hidden Markov model to predict the correct label sequence from the partly erroneous classification output.

Research paper thumbnail of Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

We present a Markov part-of-speech tagger for which the P (wjt) emission probabilities of word w ... more We present a Markov part-of-speech tagger for which the P (wjt) emission probabilities of word w given tag t are replaced by a linear interpolation of tag emission probabilities given a list of represen- tations of w. As word representations, string suxes of w are cut o at the local maxima of the Normalized Backward Successor Variety. This procedure allows for the derivation of linguistically meaningful string suf- xes that may relate to certain POS labels. Since no linguistic knowledge is needed, the procedure is language independent. Basic Markov model part-of-speech taggers are signican tly outperformed by our model.

Research paper thumbnail of Syllable cut and energy contour: a contrastive study of German and Hungarian

Syllable cut is said to be a phonologically distinctive feature in some languages where the diffe... more Syllable cut is said to be a phonologically distinctive feature in some languages where the difference in vowel quantity is accompanied by a difference in vowel quality like in German. There have been several attempts to find the corresponding phonetic correlates for syllable cut, from which the energy measurements of vowels by Spiekermann proved appropriate for explaining the difference between long and short vowels. On this basis, we intended to compare German as a syllable cut language and Hungarian where the feature was not expected to be relevant. However, the phonetic correlates of syllable cut found in this study do not entirely confirm Spiekermann's results. It seems that the energy features of vowels are more strongly connected to their duration than to their quality.

Research paper thumbnail of Syllable cut and energy contour in vowels: a comparative study on German and Hungarian

Syllable cut is said to be a phonologically distinctive feature in some languages where the diffe... more Syllable cut is said to be a phonologically distinctive feature in some languages where the difference in vowel quantity is accompanied by a difference in vowel quality like in German. There have been several attempts to find the corresponding phonetic correlates for syllable cut, from which the energy measurements of vowels by Spiekermann proved appropriate for explaining the difference between long, i.e. smoothly, and short, i.e. abruptly cut, vowels: in smoothly cut vowels, a larger number of peaks was counted in the energy contour which were located further back than in abruptly cut segments, and the overall energy was more constant throughout the entire nucleus. On this basis, we intended to compare German as a syllable cut language and Hungarian where the feature was not expected to be relevant. However, the phonetic correlates of syllable cut found in this study do not entirely confirm Spiekermann's results. It seems that the energy features of vowels are more strongly connected to their duration than to their quality.

Research paper thumbnail of Is Hungarian Losing the Vowel Quantity Distinction

Proceedings of the …, 2008

Standard Hungarian contains 14 vowels: [i, i:, y, y:, u, u:, ø, ø:, o, o:, E, e:, 6, a:]. There a... more Standard Hungarian contains 14 vowels: [i, i:, y, y:, u, u:, ø, ø:, o, o:, E, e:, 6, a:]. There are two competing views on the role that quantity plays in the system. According to the phonetic concept, the vowel system includes nine vowels—as there are nine different vowel ...