Recognition of numbers and strings of numbers by using demisyllables: one speaker experiment (original) (raw)
Related papers
A segment-based statistical speech recognition system for isolated/continuous number recognition
1999
This paper presents an overview of the "AMOR" segmentbased speech recognition system developed at the Research Group on Artificial Intelligence of the Hungarian Academy of Sciences. We present the preprocessing method, the features extracted from its output, and how segmentation of the input signal is done based on those features. We also describe the two types of evaluation functions we applied for phoneme recognition, namely a C4.5 and an instance-based learning technique. In our system, the recognition of words from a vocabulary means a special search in a hypothesis space; we present how this search space is handled and the search is performed. Our results demonstrate that for small vocabularies we obtained acceptable recognition database used. It is now a matter of further investigation to see how much these methods could be extended to be applicable to large vocabulary speech recognition.
1995
We present the development and characteristics of a basic ASR system for isolated digits in Spanish, used over the telephone line. Initially we will introduce our first idea, a basic discrete system, and then we will see the improvements we made to increase the recognition rate at a low CPU cost (always considering its practical implementation as a real time system). The most remarkable advances were obtained with: 1) Semicontinuous modelling. It is a more precise modelling, although more time consuming. 2) End-pointing with a Neural network. 3) One pass decoding with noise models. The intention of both 2 and 3 is to alleviate the effects of a wrong end-pointing. 4) Parametrization using perceptual filters in frequency and filtering in the time domain (RASTA-PLP). We wanted to decrease the effect of telephonic noise in our system.
Perceptual Experiment on Number Production for Speaker Identification
2001
Acoustic parameters of the nine Korean numbers were analyzed by Praat, a speech analysis software, and synthesized by SenSynPPC, a Klatt formant synthesizer. The overall intensity, pitch and formant values of the numbers were modified dynamically by a step of 1 dB, 1 Hz and 2.5% respectively. The study explored the sensitivity of listeners to changes in the three acoustic parameters. Twelve male and female subjects listened to 390 pairs of synthesized numbers and judged whether the given pair sounded the same or different. Results showed that subjects perceived the same sound quality within the range of 6.6 dB of intensity variation, 10.5 Hz of pitch variation and 5.9% of the first three formant variation. The male and female groups showed almost the same perceptual ranges. Also, an asymmetrical structure of high and low boundary was observed. The ranges may be applicable to the development of a speaker identification system while the method of synthesis modification may apply to it...
High performance telephone bandwidth speaker independent continuous digit recognition
… Speech Recognition and …, 2001
The development of a high-performance telephonebandwidth speaker independent connected digit recognizer for Italian is described. The CSLU Speech Toolkit was used to develop and implement the hybrid ANN/HMM system, which is trained on contextdependent categories to account for coarticulatory variation. Various front-end processing and system architecture were compared and, when the best features (MFCC with CMS + ∆) and network (4-layer fully connected feed-forward network) were considered, there was a 98.92% word recognition accuracy and a 92.62% sentence recognition accuracy) on a test set of the FIELD continuous digits recognition task.
IEEE Transactions on Signal Processing, 1992
A recognition system for connected digits, which uses a statistical classifier to identify words in speaker-independent continuous speech, is described. To identify words in continuous speech for unspecified speakers, the system uses the multiple similarity method, one of the statistical pattern recognition techniques. For evaluating word strings, the system uses a new scoring method which is independent of the number of words in the strings. It is derived from the a posteriori probability that a subinterval corresponds to a correct word position, given a word similarity value. The system evaluates a word string based on two different kinds of procedures. The first one is the dynamic programming procedure and the second is a procedure based on the parallel search algorithm. This second algorithm is designed to have less computational cost using a word boundary hypothesizer based on the feature of spectral changes. Three kinds of experiments were performed to evaluate the algorithms; they were experiments for the contextual effect of the training data set, for validation of the search algorithm, and for a large quantity of unspecified speakers including 40 males and 40 females. As the recognition results for connected digits (unknown word lengths test), the string recognition rates were 90.1%-95.1% for two, three, four connected digits, where the equivalent word (digit) rates were 97.4%-98.4%.
Performance Analysis of Spoken Arabic Digits Recognition Techniques JEST 201220190528 99668 23l1vt
⎯A performance evaluation of sound recognition techniques in recognizing some spoken Arabic words, namely digits from zero to nine, is proposed. One of the main characteristics of all Arabic digits is polysyllabic words except for zero. The performance analysis is based on different features of phonetic isolated Arabic digits. The main aim of this paper is to compare, analyze, and discuss the outcomes of spoken Arabic digits recognition systems based on three recognition features: the Yule-Walker spectrum features, the Walsh spectrum features, and the Mel frequency Cepstral coefficients (MFCC) features. The MFCC based recognition system achieves the best average correct recognition. On the other hand, the Yule-Walker based recognition system achieves the worst average correct recognition.
On Recognition of Spoken Bengali Numerals
This paper presents a method for recognizing isolated spoken Bengali numerals. Noisy audio samples have been considered as input in this study. Mel frequency cepstral coefficients (MFCC) have been used for extraction of feature from the audio samples. Vector quantization is applied to reduce the dimension of the feature vectors and to generate a vector codebook for the numerals. The classification is based on the dynamic time warping (DTW) and a minimum distance classifier based on Euclidean distance measure. Both the speaker dependent and speaker independent situations have been considered for checking accuracy. Results show the limitations of MFCC based standard speech processing approach in speaker independent spoken digit recognition scenario in the presence of noise.
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
In this paper we present the results obtained when evaluating the Natural Numbers Recognizer of Telefónica I+D over some particular dialects of Spanish from Spain and America. The evaluation was made over two different data sets corresponding to two different situations. A first set includes dialects of Spanish from Spain, that were considered in the training and design of our baseline system, and a second set corresponds to Argentinian Spanish, that was not considered to train the original system. Just because we are interested in a system able to be used by a wide range of users, we tested the possibilities of MAP (Maximum-A-Priori techniques) to adapt the original HMMs in order to represent all the dialects. The experimental results show the capabilities of our recognizer to be used in applications spread over a great number of Spanishspeaking countries.
Pronunciation Modeling in Hungarian Number Recognition
… European Conference on …, 2001
In Hungarian, as more or less in many other languages, a large percent of words and phrases can be pronounced in several, different, but correct ways. Introducing pronunciation alternatives for individual vocabulary elements may improve the efficiency of the recognition. But in connected word recognition tasks the modeling of inter-word phonetic changes has a greater significance. In this paper we introduce a rule-based method for the automatic generation of pronunciation alternatives used first for isolated words and later the method is extended to handle cross-word phonological changes in recognition networks, applying a special approach applicable for the Hungarian language. To evaluate the method it is tested in connected number recognition tests.
Multilingual number transcription for text-to-speech conversion
2013
This paper describes the text normalization module of a text to speech fully-trainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of three main modules: a tokenizer for splitting the text input into a token graph, a phrase-based translation module for token translation, and a post-processing module for removing some tokens. This architecture has been evaluated for number transcription in several languages: English, Spanish and Romanian. Number transcription is an important aspect in the text normalization problem.