The DIRHA simulated corpus (original) (raw)

The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015

This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and corresponding Kaldi recipes have already been developed. The paper reports a first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level.

A French Corpus for Distant-Microphone Speech Processing in Real Homes

2016

We introduce a new corpus for distant-microphone speech processing in domestic environments. This corpus includes reverberated , noisy speech signals spoken by native French talkers in a lounge and recorded by an 8-microphone device at various angles and distances and in various noise conditions. Room impulse responses and noise-only signals recorded in various real rooms and homes and baseline speaker localization and enhancement software are also provided. This corpus stands apart from other corpora in the field by the number of rooms and homes considered and by the fact that it is publicly available at no cost. We describe the corpus specifications and annotations and the data recorded so far. We report baseline results.

The dirha-grid corpus: baseline and tools for multi-room distant speech recognition using distributed microphones

Distant speech recognition in real-world environments is still a challenging problem and a particularly interesting topic is the investigation of multi-channel processing in case of distributed microphones in home environments. This paper presents an initiative oriented to address the challenges of such a scenario; an experimental recognition framework comprising a multi-room, multi-channel corpus and the accompanying evaluation tools is made publicly available. The overall goal is to represent a common platform for comparing state-of-the-art algorithms, share ideas of different research communities and integrate several components in a realistic distant-talking recognition chain, e.g., voice activity detection, speech/feature enhancement, channel selection and fusion, model compensation. The recordings include spoken commands (derived from the well-known GRID corpus) mixed with other acoustic events occurring in different rooms of a real apartment. The work provides a detailed description of data, tasks and baseline results, discussing the potential and limits of the approach and highlighting the impact of single modules on recognition performance.

Realistic multi-microphone data simulation for distant speech recognition

The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology. The reliability, flexibility and low computational cost of a data simulation process may ultimately allow researchers to train, tune and test different techniques in a variety of acoustic scenarios, avoiding the laborious effort of directly recording real data from the targeted environment. In the last decade, several simulated corpora have been released to the research community, including the data-sets distributed in the context of projects and international challenges, such as CHiME and REVERB. These efforts were extremely useful to derive baselines and common evaluation frameworks for comparison purposes. At the same time, in many cases they highlighted the need of a better coherence between real and simulated conditions. In this paper, we examine this issue and we describe our approach to the generation of realistic corpora in a domestic context. Experimental validation, conducted in a multi-microphone scenario, shows that a comparable performance trend can be observed with both real and simulated data across different recognition frameworks, acoustic models, as well as multi-microphone processing techniques.

F.: Distant speech recognition in a smart home: Comparison of several multisource asrs in realistic conditions

2011

While the smart home domain has become a major field of ap-plication of ICT to improve support and wellness of people in loss of autonomy, speech technology in smart home has, com-paratively to other ICTs, received limited attention. This pa-per presents the SWEET-HOME project whose aim is to make it possible for frail persons to control their domestic environment through voice interfaces. Several state-of-the-art and novel ASR techniques were evaluated on realistic data acquired in a multiroom smart home. This distant speech French corpus was recorded with 21 speakers playing scenarios including activities of daily living in a smart home equipped with several micro-phones. Techniques acting at the decoding stage and using a priori knowledge such as DDA give better results (WER=8.8%, Domotic F-measure=96.8%) than the baseline (WER=18.3%, Domotic F-measure=89.2%) and other approaches. Index Terms: home automation, smart home, distant speech,

Distant Speech Recognition in a Smart Home

2011

Distant speech recognition for home automation: Preliminary experimental results in a smart home

2011 6th Conference on Speech Technology and Human-Computer Dialogue (SpeD), 2011

This paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors.

Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions

While the smart home domain has become a major field of application of ICT to improve support and wellness of people in loss of autonomy, speech technology in smart home has, comparatively to other ICTs, received limited attention. This paper presents the SWEET-HOME project whose aim is to make it possible for frail persons to control their domestic environment through voice interfaces. Several state-of-the-art and novel ASR techniques were evaluated on realistic data acquired in a multiroom smart home. This distant speech French corpus was recorded with 21 speakers playing scenarios including activities of daily living in a smart home equipped with several microphones. Techniques acting at the decoding stage and using a priori knowledge such as DDA give better results (WER=8.8%, Domotic F-measure=96.8%) than the baseline (WER=18.3%, Domotic F-measure=89.2%) and other approaches.

Context-Aware Voice-Based Interaction in Smart Home - VocADom@A4H Corpus Collection and Empirical Assessment of Its Usefulness

2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)

Smart homes aim at enhancing the quality of life of people at home by the use of home automation systems and Ambient Intelligence. Most of these smart homes provide enhanced interaction by relying on context-aware systems learned on data. Whereas voice-based interaction is the current emerging trend, most available corpora are either concerned only with home automation sensors or only with audio technology, which limits the development of context-aware voice-based systems. This paper presents the VocADom@A4H corpus, which is a dataset composed of users' interactions recorded in a fully equipped Smart Home. About 12 hours of multichannel distant speech signal synchronized with logs of an openHAB home automation system were collected from 11 participants who performed activities of daily living with the presence of real-life noises, such as other persons speaking, use of vacuum cleaner, TV, etc. This corpus can serve as a valuable material for studies in pervasive intelligence, such as human tracking, human activity recognition, context aware interaction, and robust distant speech processing in the home. Experiments performed on multichannel speech and home automation sensors data for robust voice activity detection and multiresident localization show the potential of the corpus to support the development of context-aware smart home systems.

FAU IISAH Corpus - A German Speech Database Consisting of Human-Machine and Human-Human Interaction Acquired by Close-Talking and Far-Distance Microphones

In this paper the FAU IISAH corpus and its recording conditions are described: a new speech database consisting of human-machine and human-human interaction recordings. Beside close-talking microphones for the best possible audio quality of the recorded speech, far-distance microphones were used to acquire the interaction and communication. The recordings took place during a Wizard-of-Oz experiment in the intelligent, senior-adapted house (ISA-House). That is a living room with a speech controlled home assistance system for elderly people, based on a dialogue system, which is able to process spontaneous speech. During the studies in the ISA-House more than eight hours of interaction data were recorded including 3 hours and 27 minutes of spontaneous speech. The data were annotated under the aspect of human-human (off-talk) and human-machine (on-talk) interaction.

The DIRHA simulated corpus (original) (raw)

Related papers