Modulation and performance of synchronous demodulation for speech signal detection and dialect intelligibility (original) (raw)

Chapter . I Introduction . to . Audio . and . Speech . Signal . Processing

The development of very efficient digital signal processors has allowed the implementation of high performance signal processing algorithms to solve an important amount of practical problems in several engineering fields, such as telecommunications, in which very efficient algorithms have been developed to storage, transmission, and interference reductions; in the audio field, where signal processing algorithms have been developed to enhancement, restoration, copy right protection of audio materials; in the medical field, where signal processing algorithms have been efficiently used to develop hearing aids systems and speech restoration systems for alaryngeal speech signals. This chapter presents an overview of some successful audio and speech signal processing algorithms, providing to the reader an overview of this important technology, some of which will be analyzed with more detail in the accompanying chapters of this book.

Analysis of Different Aspects of Speech Signal Using Delta Modulation Technique

Speech signal is analog in nature. The maximum frequency component present in human voice signal can be taken approximately as 3-4KHz.In many modern day applications, it is often required to process the speech signal in digital domain. For this reason, digital processing of analog speech signal is highly important. Delta Modulation is a scheme that can be used to implement digital representation of analog speech signal. In this work, the speech signal is modulated and demodulated using a very simple technique, known as Delta Modulation(DM). Moreover, several other aspects like silence removal, Signal to Noise ratio (SNR) calculation and Power Spectral Density(PSD) analysis has also been carried out.

Recognition of Speech Enhanced by Blind Compensation for Artifacts of Single-Sideband Demodulation

This paper concerns the automatic recognition of speech that has been distorted by frequency shifting introduced by a transmitter-receiver frequency mismatch in communications systems using single-sideband (SSB) modulation. The degradation in recognition accuracy depends both on the frequency shift induced by mistuned SSB and additive noise, with a reduction in SNR causing the degradation produced by mistuned SSB to become more profound. We consider the performance of a method for detecting the frequency shifts introduced by SSB; the shifts can be easily corrected if identified correctly. The proposed method provides accurate estimates of SSB-induced frequency shifts over a wide range of SNRs, if at least approximately 80 seconds of speech is available. The use of the algorithm provides almost-complete amelioration of the effects of mistuned SSB even for utterances shorter than 10 seconds, and signal restoration is expected to improve for utterances of longer duration.

Digital Signal Representation of Speech Signal

2014

Delta modulation is a waveform coding techniques which the data rate to alarge extent in data communication ; the problem encountered in delta modulation is the slope overload error , which is inherent in the system. In order for the signal to have good fidelity, the slope-overload error need to be as small as possible. Hence there is need for adaptive techniques to be applied to delta modulation to reduce noise .Adaptive delta modulation reduce the slope overload error at the same time increase the dynamic range and the tracking capabilities of fixed step size delta modulation. The adaptive algorithm adjust the step size (from the range of step size) to the power level of the signal and thus enhance the dynamic range of the coding system. This paper discusses the experiment worked using quantization delta modulation and adaptive modulation and their improvements with each other .

Speech signal processing in ASR&TTS algorithms

Facta universitatis - series: Electronics and Energetics, 2003

Recognition (ASR) has a goal to train computers to understand human speech. On the other side, Text-To-Speech (TTS) synthesis has to teach computers to read any text. These tasks are considered very complex due to great variability of speech signal. But great possibility for ASR&TTS applications is a challenge for many researchers all over the world. Speech signal processing on PC was first performed onspecialized hardware (sound blaster or Computer Telephony Integration (CTI) card) and later for CPU. Audio cards perform analog-to-digital and digital-to-analog conversion, speech signal coding, automatic gain control, adaptive channel equalization, echo canceling, voice activity detection etc. ASR and TTS algorithms are CPU based. They include some specific speech signal processing that requires much calculation. That is why CPUs could not run ASR and TTS in real time until several years ago. Now, Pentium IV PC is able to run several ASR and TTS algorithms simultaneously in acceptable time. Faster R&D and progress in both ASR and TTS are enabled by faster CPUs and larger memory, as well as by quality sound cards and CD writers. Experiences in speech signal modeling and processing in our ASR and TTS algorithms are presented in this paper. II SIGNAL PROCESSING IN AlfaNumASR A. Speech database for ASR Large speech databases are necessary for R&D in ASR. They usually contain words, phrases and sentences, uttered by several hundreds or, sometimes, several thousands of speakers. Their utterances are recorded either via telephone line using a CTI card or in a speech studio or, sometimes in office environment. After recording of a speech database, every record should be listened to, and correct phonetic transcription should be created. To make this job easier and more comfortable, we have created a special software tool which is a part of our C++ Signal Processing Library-slib [1]. Slib is based on the idea to separate a DSP system into blocks with predefined functions, inputs and outputs. There are FIFO buffers between the blocks, whose size is chosen optimally according to features of CPU and available cache memory. This library can be used for common DSP operations like FIR, IIR, windowing, FFT, as well as for more complex signal processing such as ASR and TTS. For example, front end processing for ASR just requires block processing of speech frames. Blocks for extraction of more commonly used speech signal features have been already made and are available. Using slib, a programmer has only to choose and connect desired blocks and speech features are extracted and available for ASR. Slib is available in open source form at [10].

Automatic Speech Recognition Incorporating Modulation Domain Enhancement

IRJET, 2022

A clean and clear speech signal is linked to the amount of sound in speech development. There are many ways to make a speech signal without a clamor signal. In this paper we are going to study emotion recognition by using KNN filter(Wiener filter), FFT, and melcepts methods.In many places Phones and cell phones get noisy air Domains like cars, airports, roads, trains, stations. Therefore, we attempt to eliminate the clamor signal by using a spectral reduction method. The main purpose of this paper is for real-time device to reduce or decrease background sound with a visual speech signal, this is called speech development. Variety of languages of speech are present, in that background noise lowering speech. Applications like mobile communication can be learned a lot in recent years, speech improvement is required. The purpose of this speech development is to reverse the noise from a noisy speech, such as the speech phase or accessibility. It is often difficult to remove the background sound without interrupting the speech, therefore, the exit of the speech enhancement system is not allowed between speech contradiction and noise reduction. There may be other techniques such as Wiener filtering, wavelet-based, dynamic filtering and optical output remain a useful method. In order to reduce the spectral, we must measure the clamor spectrum and reduce it from the clamorous acoustic spectrum. Completely this approach, there are the following three scenarios to consider: sound adds speech signal and sound is not related to a single channel in the market. During this paper, we attempted to reduce the audio spectrum in order to improve distorted speech by using spectral output. We have described the method tested in the actual speech data frame in the MATLAB area. The signals we receive from Real speech signals are a website used for various tests. Then we suggest how to reduce the noise between the average noise level and the noise spectrum. In general, only one medium system is set up based on a variety of speech data and unwanted screaming that, it works in difficult situations where no previous clamor intelligence is available. Genres often assume that sound is stable whenever the speech is alert. They usually allow for disturbed sound during speech operations but in reality, when the sound is not moving, the performance of the speech signal is greatly reduced.

AM-Demodulation of Speech Spectra and Its Application to Noise Robust Speech Recognition

In this paper, a novel algorithm that resembles amplitude demodulation in the frequency domain is introduced, and its application to automatic speech recognition (ASR) is studied. Speech production can be regarded as a result of amplitude modulation (AM) with the source (excitation) spectrum being the carrier and the vocal tract transfer function (VTTF) being the modulating signal. From this point of view, the VTTF can be recovered by amplitude demodulation. Amplitude demodulation of the speech spectrum is achieved by a novel nonlinear technique, which effectively performs envelope detection by using amplitudes of the harmonics and discarding inter-harmonic valleys. The technique is noise robust since frequency bands of low energy are discarded. The same principle is used to reshape the detected envelope. The algorithm is then used to construct an ASR feature extraction module. It is shown that this technique achieves superior performance to MFCCs in the presence of additive noise. ...

Principles of Speech Coding

Principles of Speech Coding, 2010

Introduction to LTT Systems 2.1.1 Linearity 2.1.2 Time Invariance 2.1.3 Representation Using Impulse Response 2.1.4 Representation of Any Continuous-Time (CT) Signal .. 2.1.5 Convolution 2.1.6 Differential Equation Models 2.2 Review of Digital Signal Processing 2.2.1 Sampling 2.2.2 Shifted Unit Pulse: 8 (wk) 2.2.3 Representation of Any DT Signal 2.2.4 Introduction to Z Transforms 2.2.5 Fourier Transform, Discrete Fourier Transform 2.2.6 Digital Filter Structures 2.3 Review of Stochastic Signal Processing 2.3.1 Power Spectral Density 2.4 Response of a Linear System to a Stochastic Process Input.... 2.5 Windowing 2.6 AR Models for Speech Signals, Yule-Walker Equations 2.7 Short-Term Frequency (or Fourier) Transform and Cepstrum. 2.7.1 Short-Term Frequency Transform (STFT) 2.7.2 The Cepstrum 2.8 Periodograms 2.9 Spectral Envelope Determination for Speech Signals 2.10 Voiced/Unvoiced Classification of Speech Signals 2.10.1 Time-Domain Methods 2.10.1.1 Periodic Similarity 2.10.1.2 Frame Energy 2.10.1.3 Pre-Emphasized Energy Ratio 2.10.1.4 Low-to Full-Band Energy Ratio 2.10.1.5 Zero Crossing 2.10.1.6 Prediction Gain 2.10.1.7 Peakiness of Speech 2.10.1.8 Spectrum Tilt 2.10.2 Frequency-Domain Methods 2.10.3 Voiced/Unvoiced Decision Making 2.11 Pitch Period Estimation Methods 2.12 Summary Exercise Problems References Bibliography Contents ix 3. Sampling Theory 61 3.1 4.10 ITU G.711 |i-Law and A-Law PCM Standards 4.10.1 Conversion between Linear and Companded Codes ... 4.10.1.1 Linear to |x-Law Conversion 92 4.10.1.2^i-Law to Linear Code Conversion 93 4.10.1.3 Linear to A-Law Conversion 94 4.10.1.4 A-Law to Linear Conversion 95 4.11 Optimum Quantization 95 4.11.1 Closed Form Solution for the Optimum Companding Characteristics 96 4.11.2 Lloyd-Max Quantizer 97 4.12 Adaptive Quantization

Speech/Data discrimination in Communication systems

This paper proposes a discrimination algorithm, which discriminates speech and data on a multiplexed input signal. Commercial communication networks may use single voice band channel for transmission of both speech and data. Also, for optimum utilization of channel, the pauses in voice signal are being utilized. At receiver side the speech and data should be separately extracted, in order to send information to the respective users. For above mentioned to happen with least error, sufficient measures are to be taken for identifying the type of the signal. The speech/data discriminator is the solution for above mentioned problem. This algorithm may also be useful in the analysis of intercepted signal, where speech/data discrimination may be performed to make sure that whether the communication channel carries data or voice. After discrimination, voice will be sent to voice codec and data to the data decoder for extraction of intelligence. In this paper we proposed a simple and low com...

Efficiency of the energy contained in modulators in the Arabic vowels recognition

International Journal of Electrical and Computer Engineering (IJECE), 2021

The speech signal is described as many acoustic properties that may contribute differently to spoken word recognition. Vowel characterization is an important process of studying the acoustic characteristics or behaviors of speech within different contexts. This current study focuses on the modulators characteristics of three Arabic vowels, we proposed a new approach to characterize the three Arabic vowels /a/, /i/ and /u/. The proposed method is based on the energy contained in the speech modulators. The coherent subband demodulation method related to the spectral center of gravity (COG) was used to calculate the energy of the speech modulators. The obtained results showed that the modulators energy help characterize the Arabic vowels /a/, /i/ and /u/ with an interesting recognition rate ranging from 86% to 100%.