Restoration of the voice timbre in telephone networks based on both voice and line properties (original) (raw)

Correction of the voice timbre distortions in telephone networks: method and evaluation

Speech Communication, 2004

In a telephone link, the voice timbre is impaired by spectral distortions generated by the analog parts of the link. Our purpose is to restore a timbre as close as possible to that of the original voice of the speaker, using a blind equalizer centralized in the network, which compensates for the spectral distortions. We propose a spectral equalization algorithm, which consists in matching the long-term spectrum of the processed signal to a reference spectrum within a limited frequency bandwidth (200-3150 Hz). Subjective evaluations show a satisfactory restoration of the timbre of the speakers, within the limits of the chosen equalization band. The A-law quantization of the output samples of the equalizer induces however a disturbing noise at the reception end. A subjective evaluation shows that speakers' voices with corrected timbre, even with quantization noise, are preferred to the same voices at the output of a link without timbre correction (and without noise). In order to make the reference spectrum more appropriate to the various speakers' voices, we classify them according to their long-term spectra and use a specific reference spectrum for each class. This leads to a decrease of the spectral distortion induced by the equalizer, significantly perceived as an improvement of the timbre correction, as a subjective test shows.

Correction of the voice timbre distortions on telephone network

In a telephone link, the voice timbre is affected by the loss of low frequencies components and distortions due to the analog lines. We analyze first how the quantization noise limits the restoration of the timbre. Within this limitation, a method of equalization, inspired by the cepstral subtraction, is then proposed to correct the timbre and is validated by experimental results.

Fast method of channel equalisation for speech signals and its implementation on a DSP

1999

Abstract Blind equalisation of a speech signal that has been passed over a linear filter can be achieved by estimating the poles of the signal and separating the stationary poles due to the filter from the time varying poles due to the speech. However, identification of the position of the stationary poles, conventionally done by pole clustering, is unreliable and slow. A new algorithm for the identification of stationary poles is presented which is more accurate and faster than clustering

Training issues and channel equalization techniques for the construction of telephone acoustic models using a high-quality speech corpus

IEEE Transactions on Speech and Audio Processing, 1994

We describe an approach for the estimation of acoustic phonetic models that will be used in a hidden Markov model (HMM) recognizer operating over the telephone. We explore two complementary techniques to developing telephone acoustic models. The first technique presents two new channel compensation algorithms. Experimental results on the Wall Street Journal corpus show no significant improvement over sentencebased cepstral-mean removal. The second technique uses an existing "high-quality" speech corpus to train acoustic models that are appropriate for the Switchboard Credir Card task over long-distance telephone lines. Experimental results show that cross-database acoustic training yields performance similar to that of conventional task-dependent acoustic training.

A Simplified Two-Stage Equalizer with a Reduced Number of Multiplications for Data Transmission over Voiceband Telephone Links

IEEE Journal on Selected Areas in Communications, 1984

The paper presents an equalization scheme based on the two-stage equalizer originally introduced by Proakis. Mostly additions are required for implementation of the equalizer structure and the adaptation algorithm. The equalizer can be applied to data transmission systems which use four-phase modulation, especially in 1200/2400 bit/s modems, thus replacing the fixed equalizer recommended by CCITI'. The results presented in the paper deal with the analog and digital implementation of the proposed equalizer. Its performance is compared to the performance of the conventional transversal equalizer, a decision-feedback equalizer, an ideal linear canceller, and an ideal QPSK system.

A Signal Envelop Criterion for Passiv Voice Quality Analizing

2017

The paper is discussing problems connected with tools to non-intrusively evaluate VoIP quality by signal waveform analysis. The aim of this paper is to present new models for objective, nonintrusive, prediction of voice quality for IP networks and to illustrate their application to voice quality monitoring control in VoIP networks. The method detects impairments of quality of audio for human perception. It enables to see the quality of VoIP connection at a glance and warns when quality deteriorates. This gives the option to troubleshoot VoIP network before users are affected by VoIP specific connection problems (echo, noise or breaks in the conversation). The signal waveform envelope distortion is reviewed; practical questions of its numerical implementation are discussed. Several examples of how the criterion can be used are given.

A Proposed Improvement Equalizer for Telephone and Mobile Circuit Channels

In the transmission of digital data at a relatively high rate over a particular band limited channel, it is normally necessary to employ an equalizer at the receiver in order to correct the signal distortion introduced by the channel .ISI (inter symbol interference) leads to large error probability if it is not suppressed .The possible solutions for coping with ISI such as equalization technique. Maximum Likelihood Sequence Estimation (MLSE) implemented with Viterbi algorithm is the optimal equalizer for this ISI problem sense it minimizes the sequence of error rate. This estimator involves a very considerable amount of equipment complexity especially when detecting a multilevel digital signal having large alphabet, and/or operating under a channel with long impulse response, this arises a need to develop detection algorithms with reduced complexity without losing the performance. The aim of this work is to study the various ways to remove the ISI, concentrating on the decision-base...

Experiments in voice quality modification of natural speech signals: the spectral approach

1998

Voice quality is currently a key issue in speech synthesis research. The lack of realistic intra-speaker voice quality variation is an important source of concern for concatenation-based synthesis methods. A challenging problem is to reproduce the voice quality changes that are occuring in natural speech when the vocal e ort is varying. A new method for voice quality modi cation is presented. It takes advantage of a spectral theory for voice source signal representation. An algorithm based on periodic-aperiodic decomposition and spectral processing (using the short-term Fourier transform) is described. The use of adaptive inverse ltering in this framework is also discussed. Applications of this algorithm may include: pre-processing of speech corpora, modi cation of voice quality parameters together with intonation in synthesis, voice transformation. Some experiments are reported, showing convincing voice quality modi cations for various speakers.

Acoustic Noise Reduction for Mobile Telephony

DSP World Spring Design Conference, 2000

In the context of mobile telephony, speech signals are often corrupted by surrounding acoustic noise such as engine, traffic and wind as well as by system-introduced noise such as quantization, handoff, and channel interference. This in turn has an adverse effect on the perceived quality and intelligibility of speech as well as on the performance of speech processing algorithms throughout the network, such as speech coding and recognition. Wireless telephony by definition has a lower speech quality than wireline, due to the speech coding process. If, in addition, the cellular system encodes a noisy signal prior to its transmission, then further degradation may occur, since coders rely on a model for the clean signal which is not suitable for the noisy signal. Similarly, speech recognition systems will degrade drastically in noisy environments, due to differences between testing and training conditions. The aim of acoustic noise reduction is to minimize the effect of noise on the performance of voice communication systems. This means improving the perceived quality to the human listener as well as providing a more appropriate signal for estimating crucial speech parameters such as spectral content, pitch, voicing, and others. The quality of speech signals is a subjective measure which reflects on the way the signal is perceived by listeners. It can be expressed in terms of how pleasant speech sounds to the human ear. Intelligibility on the other hand is an objective measure of the amount of information which can be extracted by listeners from the given signal [20]. Intelligibility is important in situationssuch as military or emergencies-where the content of the message is critical. 1.2 Technical Challenge Noise reduction is an ancient art, but is still far from a perfect science. Conceptually, it can be viewed as the combination of the classical problem of signal estimation, coupled with psychoacoustic aspects that account for the characteristics of the speech signal and the peculiarities of human hearing. The challenge is that the latter aspect is less understood, thus preventing one from formulating the problem in a way that leads to a globally optimum solution. As a result, a number This document was created with FrameBuilder 4.0.2 of suboptimal solutions based on mathematically tractable distortion measures or on some properties of the auditory system have been proposed. 1.3 Structure of this Paper The paper is structured as follows: section 2 is the business perspective on NR. Section 3 provides a brief background on speech properties. The common approaches to NR are described in section 4. The system aspects of using NR in a cellular systems are discussed in sections 5 and 6. An overview of some commercial implementations is given in section 7. 2.0 N OISE R EDUCTION : A B USINESS P ERSPECTIVE 2.1 Business Rationale • Market needs: speech quality is gaining increasing importance in the context of Personal Communication Services as greater consumer acceptance is being sought by advertising these to be as reliable in service and quality as the wireline counterpart. Analysts predict that service functionality, which includes voice quality, will eventually become more important than price as a differentiator in wireless services. • Competitive importance: telcos and manufacturers seek various proprietary solutions to improve the end-to-end voice quality on their networks in an attempt to differentiate their products from others. • Strategic importance: voice processing technologies, such as speech enhancement or echo cancellation, are an integral part of voice communication network equipment. Ownership of this technology, as opposed to reliance on acquired solutions, is almost a necessity to minimize exposure given the strategic importance of these features. 2.2 Target Markets Wireless telephony constitutes a large potential market for noise reduction. The technology however is also applicable to other communication contexts where ambient noise needs to be removed. The market thus targeted includes, though not limited to, the following applications: • Cellular phone hands free cradles: to reduce ambient road, engine and background speaker to the far-end listener. • International long distance telephony: to improve the intelligibility on low-quality international voice circuits where old analog technologies cause static and other switching noise on these communications. • Voice-activated phone dialing: to increase the voice recognition hit rates when background noise is present. • Internet telephony: to improve voice quality in noisy office environments, particularly where the input microphone is placed somewhere far from the speaker (ex on the computer terminal). • Teleconferencing: to remove the effect of interfering speakers or background noise (chair movements, etc...) in handsfree teleconferencing.