Immediate recognition of signals in simultaneous transmission of voice and high speed data (original) (raw)

Fast On-Line Speech/Voiceband--Data Discrimination for Statistical Multiplexing of Data with Telephone Conversations

IEEE Transactions on Communications, 1986

This paper concerns the design of a speech/data discriminator with the speed and accuracy required for statistical multiplexing of speech and high-speed baseband data on a single telephone line. The structure proposed for the discriminator combines the information from two parallel processes. The first process, even though slow, accurately recognizes the speech/data nature of the signal. This nature is determined by statistical pattern. classification applied to simple zero-crossing-type parameters extracted from the signal filtered in a PCM representation. The second process has temporal accuracy in detecting transition events, from speech to data and vice versa. This detection is provided by special parameters which supply tentative transition markers. This paper applies these processes to the case of 9600 bit/s modems and describes a complete discrimination algorithm which is fit for microprocessor technology. The simulation results provided indicate a very low error rate and demonstrate that transitions are detected exactly or within one or two sampling intervals.

Speech over VoIP Networks: Advanced Signal Processing and System Implementation

IEEE Circuits and Systems Magazine, 2012

Speech communication using the Voice over Internet Protocol (VoIP) is very common today. The underlying network channel may be the public switched telephone network (PSTN channel), satellite channels or cellular wireless channels to name a few. The packetization of speech and its transmission through packet switched networks, however, introduce numerous impairments such as delay, jitter, packet loss and decoder clock offset, which degrade the quality of the speech. We present an overview of the challenges and a description of the advanced signal processing algorithms used to combat these impairments and render the perceived quality of a VoIP conversation to be as good as that of the existing telephone system. We also present an example of a speech coder designed for packet-switched networks and discuss the possibilities for hardware implementations.

TELEPHONY APPLICATIONS WITH SPEECH RECOGNITION

In this paper, we present and describe several computer telephony applications using speech recognition. These applications were developed under a research project carried out in collaboration with Portugal Telecom. Two possibilities have been explored in the developing of speech recognition applications. In the first one, speech recognition was implemented only with the help of software. In the second one, we used hardware equipped with DSP's. Both possibilities support Word Spotting and Barge-in. In addition to the telephone applications, the generic tools that have been built to develop those applications are also presented.

Comparison Between Silence Substitution and Packet Repetition for Real-Time Speech Communications

Real-time speech communication over packet switched networks requires low delay packet loss concealment (PLC) methods. There are several PLC methods used in IP telephony to cope with packet losses. Two commonly used methods are silence substitution and packet (waveform) repetition. We compare these methods according to the rate distortion criterion by introducing a penalty for packet (waveform) repetition. This analysis allows us for fair comparison between the two methods. We also compare the results with that of ITU-T's E-model and find them to be in agreement.

Automatic Speech Recognition for VoIP with Packet Loss Concealment

Procedia Computer Science, 2018

This paper proposes a packet loss concealment (PLC) technique for increase the robustness of automatic speech recognition (ASR) of speech coded with the G729 codec, on the Voice over Internet Protocol (VoIP). Many of the standard ITU-T CELP based speech coders, such as the G.723.1, G.728, and G.729, model speech reproduction in their decoders. These decoders have enough state information to integrate PLC algorithms directly in the decoder, and are specified as part of their standards in particular by PLC based ITU-T G711 Appendix I. Speech is transmitted with source and channel codes optimized, this channel is simulated by two states Markov model to modeled loss packets. The objective of PLC based ITU-T G711 Appendix I is to generate a synthetic speech signal to cover missing data or loss packets in a received bit stream for the ASR application, i.e., to minimize word error rate.

How to Detect Speech in Telephone Dialogue Systems

Proceedings of EURASIP Conference on Digital Signal Processing for Multimedia Communications and Services ECMCS, 2001

– In practical telephone dialogue applications there are many problems with a speech detection. This paper discusses two main problems with a silence detection and presents techniques for increasing reliability and recognition accuracy of the whole telephone dialogue system. The first problem causes a different level of signal and background noise of incoming calls. The solution could be to store the local speech/silence decision in a buffer and to use an adaptive threshold to make a global decision about each frame. The second problem is the detection of the ...

A Mobile Dual VoIP System for Enhancing Speech Quality and Intelligibility: Simulation and Test bed

Transactions on Networks and Communications, 2015

As it is well known in a 3G/4G network scenario, the quality of voice traffic over IP (VoIP) is greatly reduced due to the strong current limitations in terms of requirements regarding delay, jitter, packet loss rate and guaranteed bandwidth. The present work highlights the benefits in terms of improved intelligibility when making a duplication of VoIP packets through two wireless data accesses provided by different operators. In particular, the paper presents the architecture and the prototype of a dual stream approach to mobile VoIP applications (Dual VoIP) over HSPA access networks. Test results, obtained via simulations and a real-time implementation of Dual VoIP algorithm, demonstrate an average packet loss reduction up to 90% and an average improvement of speech quality up to 1 PESQ point. Furthermore, the paper highlights the significant reduction of the audio signal clipping at all levels: phoneme, word, sentence and conversation. Enhancement of the speech quality and intelligibility of the audio signal is a very important aspect in common best effort applications using VoIP as well as in particular conditions such as environmental wiretapping for forensic uses and/or private tactical communications in network-centric contexts where real time listening and intelligibility of the speech signal play a key role. The deep evaluation presented here has the aim of understanding the behavior of the proposed architecture under different application scenarios and drawing, at the same time, useful conclusions on the improvement of the Dual VoIP prototype.

Speech/Data discrimination in Communication systems

This paper proposes a discrimination algorithm, which discriminates speech and data on a multiplexed input signal. Commercial communication networks may use single voice band channel for transmission of both speech and data. Also, for optimum utilization of channel, the pauses in voice signal are being utilized. At receiver side the speech and data should be separately extracted, in order to send information to the respective users. For above mentioned to happen with least error, sufficient measures are to be taken for identifying the type of the signal. The speech/data discriminator is the solution for above mentioned problem. This algorithm may also be useful in the analysis of intercepted signal, where speech/data discrimination may be performed to make sure that whether the communication channel carries data or voice. After discrimination, voice will be sent to voice codec and data to the data decoder for extraction of intelligence. In this paper we proposed a simple and low com...

Telephony Speech Recognition System: Challenges

Ijca Proceedings on National Conference on Communication Technologies Its Impact on Next Generation Computing 2012, 2012

Present paper describes the challenges to design the telephony Automatic Speech Recognition (ASR) System. Telephonic speech data are collected automatically from all geographical regions of West Bengal to cover major dialectal variations of Bangla spoken language. All incoming calls are handled by Asterisk Server i.e. Computer telephony interface (CTI). The system asks some queries and users' spoken responses are stored and transcribed manually for ASR system training. In real time scenario, the telephonic speech contains channel drop, silence or no speech event, truncated speech signal, noisy signal etc along with the desired speech event. This paper describes these kinds of challenges of telephony ASR system. And also describes some brief techniques which will handle such unwanted signals in case of telephonic speech to certain extent and able to provide almost desired speech signal for the ASR system.