Multilingual speech to speech translation system in bluetooth environment (original) (raw)

Multilingual speech-to-speech translation System in Mobile Off- line Environment

The process of translating speech-to-speech denotes the conversion of speech signals specifically from the original source language into a distinctive speech signal-bearing identical meaning or resolve into the target language. To achieve such, there is a need to adopt coordinated cooperation of the separate Human Language Technology components. In particular, the most significant elements in a speech translation system encompass the automatic speech recognition, machine translation as well as text to speech. With this understanding, the current paper explores the design as well as architectural building blocks linked to the "Translator" speech-to-speech translation system. In addition, the current paper explores their interactions with each other to ease speech-to-speech translation in terms of reliability, scalability, and potentially distribution.

An Application for Performing Real Time Speech Translation in Mobile Environment

This paper presents the method of applying speaker-independent and bidirectional speech-to-speech translation system for spontaneous dialogs in real time calling system. This technique recognizes spoken input, analyzes and translates it, and finally utters the translation. The major part of Speech translation comes under Natural language processing. Natural language processing is a branch of Artificial Intelligence that deals with analyzing, understanding and generating the languages that humans use naturally in order to interface with computers in both written and spoken contexts using natural human languages instead of computer languages. Speech Translation involves techniques to translate the spoken sentences from one language to another. The major part of speech translation involves Speech Recognition which is the translation of spoken speech to text and identifying the context and linguistic structure of the input speech. In the current scenario, the machine does not identify whether the given word is in past tense or present tense. By using the algorithm, we search for a word to check if it is past or present by searching for the sub strings, as " ed " , " had " , " Done " , etc., This paper gives us an idea on working with API's to translate the input speech to the required output speech and thus increasing the efficiency of Speech Translation in cellular devices and also a mobile application that will help us to monitor all the audios present in mobile device and translate it into required language.

Development of the “VoiceTra” Multi-Lingual Speech Translation System

IEICE Transactions on Information and Systems, 2017

This study introduces large-scale field experiments of VoiceTra, which is the world's first speech-to-speech multilingual translation application for smart phones. In the study, approximately 10 million input utterances were collected since the experiments commenced. The usage of collected data was analyzed and discussed. The study has several important contributions. First, it explains system configuration, communication protocol between clients and servers, and details of multilingual automatic speech recognition, multilingual machine translation, and multilingual speech synthesis subsystems. Second, it demonstrates the effects of mid-term system updates using collected data to improve an acoustic model, a language model, and a dictionary. Third, it analyzes system usage.

AN OPTIMIZED APPROACH TO VOICE TRANSLATION ON MOBILE PHONES

Current voice translation tools and services use natural language understanding and natural language processing to convert words. However, these parsing methods concentrate more on capturing keywords and translating them, completely neglecting the considerable amount of processing time involved. In this paper, we are suggesting techniques that can optimize the processing time thereby increasing the throughput of voice translation services. Techniques like template matching, indexing frequently used words using probability search and session-based cache can considerably enhance processing times. More so, these factors become all the more important when we need to achieve real-time translation on mobile phones.

BBN TransTalk: Robust multilingual two-way speech-to-speech translation for mobile platforms

Computer Speech & Language, 2013

In this paper we present a speech-to-speech (S2S) translation system called the BBN TransTalk that enables two-way communication between speakers of English and speakers who do not understand or speak English. The BBN TransTalk has been configured for several languages including Iraqi Arabic, Pashto, Dari, Farsi, Malay, Indonesian, and Levantine Arabic. We describe the key components of our system: automatic speech recognition (ASR), machine translation (MT), text-to-speech (TTS), dialog manager, and the user interface (UI). In addition, we present novel techniques for overcoming specific challenges in developing high-performing S2S systems. For ASR, we present techniques for dealing with lack of pronunciation and linguistic resources and effective modeling of ambiguity in pronunciations of words in these languages. For MT, we describe techniques for dealing with data sparsity as well as modeling context. We also present and compare different user confirmation techniques for detecting errors that can cause the dialog to drift or stall.

IJERT-Voice to Voice Language Translation System

International Journal of Engineering Research and Technology (IJERT), 2014

https://www.ijert.org/voice-to-voice-language-translation-system https://www.ijert.org/research/voice-to-voice-language-translation-system-IJERTV3IS100924.pdf In this paper the nascent stage of developing a personalized interpreter, we propose to develop a prototype which uses a speech processing hardware and on translators to provide the user with real time translation. Speech processing hardware works on the principle of 'compare and forward', i.e., a database is already stored in the unit which is used for comparing with the input speech and the result is forwarded for further processing. The need arises from the inability of dictionaries and human translators to suit our needs for better communication. In this situation the prototype proposed will suffice the purpose reasonably well and minimize the communication inefficiencies.

Voice & Text Translator

Zenodo (CERN European Organization for Nuclear Research), 2022

The Voice and Text Translator is about translating one form of speech to another. This is a multi-tasking project which can perform various tasks at the same time. The definition of voice recognition should be taken into account, as it is often correlated with the process of identifying a person from his or her voice, i.e., the recognition of a speaker. The main aim of this app is to provide a mechanism for Speech to Speech. It further provides the mechanism for Speech to Text, Text to Speech, and also Text to Text in various languages. This software which deals with speech recognition has to get adapted to the unpredictable and highly variable nature of the human race. Every algorithm that is involved in the speech recognition process is tested and trained on different speaking styles, languages, accents, phrasings, or speaking patterns. Moreover, all these softwares also have to separate the actually spoken speech(audio) from the unwanted background noise that often accompanies these signals. This also contains some advanced features like voice recognition, understanding, and conversion, which are difficult for a machine to perform. But nowadays AI and ML are dominating the Tech. At its core, this project is built in the python programming language. Python is known for its rich and vast library and its usage in almost all kinds of projects. So, this Translator also contains some very precious modules of Python like Google-Trans, gTTs, etc. A broad array and series of research in the fields of computer science and linguistics are used in the speech recognition process. To have an easier life or to take part in the trending technologies that include hands-free use of any device, almost all the modern devices are adapting and shifting towards integrating speech recognition functions into their device and making up with the new trending technologies.

Prototype Of Speech Translation System For Audio Effective Communication

2006

The present document exposes the development of a prototype of translation system as a Thesis Project. It consists basically on the capture of a flow of voice from the emitter, integrating advanced technologies of voice recognition, instantaneous translation and communication over the internet protocol RTP/RTCP (Real time Transport Protocol) to send information in real-time to the receiver. This prototype doesn’t transmit image, it only boards the audio stage. Finally, the project besides embracing a problem of personal communications, tries to contribute to the development of activities related with the speech recognition, motivating new investigations and advances on the area.

Developing Client-Server Speech Translation Platform

7th International Conference on Mobile Data Management (MDM'06), 2006

This paper describes a client-server speech translation platform designed for use at mobile terminals. Because terminals and servers are connected via a 3G public mobile phone networks, speech translation services are available at various places with thin client. This platform realizes hands-free communication and robustness for real use of speech translation in noisy environments. A microphone array and new noise suppression technique improves speech recognition performance, and a corpus-based approach enables wide coverage, robustness and portability to new languages and domains. The experimental result for evaluating the communicability of speakers of different languages shows that task completion rates using the speech translation system of 85% and 75% are achieved for Japanese-English and Japanese-Chinese, respectively. The system also has the ability to convey approximately one item of information per 2 utterances (one turn) on average for both Japanese-English and Japanese-Chinese in a task-oriented dialogue.