Phonetic Analysis and the Automatic Segmentation and Labeling of Speech Sounds (original) (raw)

On the automatic segmentation of speech signals

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

For large vocabulary and continuous speech recognition, the subword-unit-based approach is a viable alternative to the wholeword-unit-based approach. For preparing a large inventory of subword units, an automatic segmentation is preferrable to manual segmentation as it substantially reduces the work associated with the generation of templates and gives more consistent results. In this paper we discuss some methods for automatically segmenting speech into phonetic units. Three different approaches are described, one based on template matching, one based on detecting the spectral changes that occur at the boundaries between phonetic units and one based on a constrained-clustering vector quantization approach. An evaluation of the performance of the automatic segmentation methods is given.

Automatic phonetic segmentation using boundary models

Interspeech 2013, 2013

This study attempts to improve automatic phonetic segmentation within the HMM framework. Experiments were conducted to investigate the use of phone boundary models, the use of precise phonetic segmentation for training HMMs, and the difference between context-dependent and contextindependent phone models in terms of forced alignment performance. Results show that the combination of special one-state phone boundary models and monophone HMMs can significantly improve forced alignment accuracy. HMM-based forced alignment systems can also benefit from using precise phonetic segmentation for training HMMs. Context-dependent phone models are not better than context-independent models when combined with phone boundary models. The proposed system achieves 93.92% agreement (of phone boundaries) within 20 ms compared to manual segmentation on the TIMIT corpus. This is the best reported result on TIMIT to our knowledge.

Automatic pitch-synchronous phonetic segmentation

2008

This paper deals with an HMM-based automatic phonetic segmentation (APS) system and proposes to increase its performance by employing a pitch-synchronous (PS) coding scheme. Such a coding scheme uses different frames of speech throughout voiced and unvoiced speech regions and enables thus better modelling of each individual phone. The PS coding scheme is shown to outperform the traditionally utilised pitchasynchronous (PA) coding scheme for two corpora of Czech speech (one female and one male) both in the case of a base (not-refined) APS and in the case of a CART-refined APS. Better results were observed for each of the voicing-dependent boundary types (unvoiced-unvoiced, unvoiced-voiced, voicedunvoiced and voiced-voiced).

Phoneme segmentation of speech

18th International Conference on Pattern Recognition (ICPR'06), 2006

In most approaches to speech recognition, the speech signals are segmented using constant-time segmentation, for example into 25 ms blocks. Constant segmentation risks losing information about the phonemes. Different sounds may be merged into single blocks and individual phonemes lost completely.

Acoustic Segmentation and Analysis

This study will analyse the formants and phonological characteristics of unique phonemes in a 10 second utterance through spectrographic analysis. The phonological attributes of various people are very unique and diverse, thus this study will also examine how this particular person’s phonological attributes are similar and differ from standardized phonological descriptions of different phonemes. Additionally, this study will also evaluate the difficulties faced during the examination of the interesting aspects of the spectrogram.

Phonetic segmentation using multiple speech features

International Journal of Speech Technology, 2008

... I. Mporas · T. Ganchev ( ) · N. Fakotakis Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, Rion-Patras 26500, Greece e-mail: tganchev@ieee.org I. Mporas e-mail: imporas@upatras.gr N. Fakotakis e-mail ... φr j (t) = max i ...

A Minimum Boundary Error Framework for Automatic Phonetic Segmentation

Lecture Notes in Computer Science, 2006

This paper presents a novel framework for HMM-based automatic phonetic segmentation that improves the accuracy of placing phone boundaries. In the framework, both training and segmentation approaches are proposed according to the minimum boundary error (MBE) criterion, which tries to minimize the expected boundary errors over a set of possible phonetic alignments. This framework is inspired by the recently proposed minimum phone error (MPE) training approach and the minimum Bayes risk decoding algorithm for automatic speech recognition. To evaluate the proposed MBE framework, we conduct automatic phonetic segmentation experiments on the TIMIT acoustic-phonetic continuous speech corpus. MBE segmentation with MBE-trained models can identify 80.53% of human-labeled phone boundaries within a tolerance of 10 ms, compared to 71.10% identified by conventional ML segmentation with ML-trained models. Moreover, by using the MBE framework, only 7.15% of automatically labeled phone boundaries have errors larger than 20 ms.

Language independent automatic speech segmentation into phoneme-like units on the base of acoustic distinctive features

There are special topics in cognitive info communication where the processing of continuous speech is necessary. These topics often require the segmentation of speech signal into phoneme sized units. This kind of segmentation is necessary, when the desired behavior depends on speech timing, like rhythm or the place of voiced sounds (emotion or mood detection, language learning, acoustic feature visualization). Segmentation systems based on the acoustic-phonetic knowledge of speech could be realized in a language independent way. In this paper we introduce a language independent solution, based on the segmentation of continuous speech into 9 broad phonetic classes. The classification and segmentation was prepared using Hidden Markov Models. Three databases were used to evaluate the segmentation systems: Hungarian MRBA, German KIEL and English TIMIT databases. 80% average recognition result was obtained.

The Segmentation and Labelling of Speech Databases

Introduction Segmentation is the division of a speech file into non-overlapping sections corresponding to physical or linguistic units. Labelling is the assignment of physical or linguistic labels to these units. Both segmentation and labelling form a major part of current work in linguistic databases. 1.1.1 Segmental transcription The term `transcription' may be used to refer to the representation of a text or an utterance as a string of symbols, without any linkage to the acoustic representation of the utterance. This was the pattern followed by speech and text corpus work during the 1980's, such as the prosodically-transcribed Spoken English Corpus (Knowles et al. 1995). These corpora did not link the symbolic representation with the physical acoustic waveform, and hence were not fully machine-readable. A recent project, MARSEC (Roach et al. 1993), has generated these links for the Spoken English Corpus such that it is now a