Analysis of binaural cue matching using ambisonics to binaural decoding techniques (original) (raw)
Related papers
Perceptual assessment of binaural decoding of first-order ambisonics
The first-order Ambisonics microphone (e.g. Soundfield®) is a both compact and efficient setup for spatial audio recording with the benefit of a full 3D spatialization. Another advantage is that the signals delivered by this microphone (i.e. B-Format) can be rendered over headphones by applying appropriate processing, while ensuring that the 3D spatial information is preserved. With the growing use of personal devices, it should be considered that most audio content is listened to over headphones. Thus first order Ambisonics recording provides an attractive solution to pickup 3D audio content compatible with headphone reproduction. "Binaural decoding" refers to the processing to adapt B-Format for headphone rendering (i.e. "binaural format"). One solution is based on binaural synthesis of virtual loudspeakers. One promising way to improve the decoding is active processing which takes information from a pre-analysis of the sound scene, particularly in terms of spatial information. This paper will compare various binaural decoders. Starting from a listening test which assesses existing solutions and which shows that the perceived quality may strongly vary from one decoder to another, the processing is analyzed step by step. The performances are measured by a set of objective criteria derived from localization cues.
2019
This paper analyzes the limitations of binaural sound that is reproduced with headphones based on Ambisonics for Virtual Reality (VR) audio. VR audio can be provided with binaural sound that compensates head rotation of a listener. Ambisonics is widely used for recording and reproducing ambient sound fields around a listener in VR audio, and the First order Ambisonics (FOA) is still being used for VR audio because of its simplicity. However, the maximum frequencies with this order is too low to perfectly reproduce ear signals, and thus the binaural reproduction has inherent limitations in terms of spectrum and sound localization. This paper investigates these limitations by comparing the signals arrived at ear positions in the reference field and the reproduced field. An incidence wave is defined as a reference field, and reproduced over virtual loudspeakers. Frequency responses, inter-aural level differences, and inter-aural phase differences are compared. The results show, above t...
2011
The aim of this project is to expand on the techniques and knowledge used in binaural audio. This includes main characteristics: Interaural Time Difference (ITD), Interaural Level Difference (ILD) and Head Related Transfer Function (HRTF). Recordings were made at the University's anechoic chamber with a dummy head and binaural microphones to test the effect of turning the head in front of a speaker. The recordings done included a range of pure tunes at different frequencies, white noise and sine sweeps. Programs were done in MATLAB to determine ITDs and IILs as well as HRTFs based on Fourier analysis and cross correlation and autocorrelation of the sounds recorded at the microphones and the sounds played. The outcome of the project was a set of binaural cues and data used to generate transfer functions that can be applied to dry mono sounds to perform virtual localization on them. Declaration
Ambisonic Based Binaural Sound Reproduction System
2005
A computationally efficient 3D real time rendering engine for binaural sound reproduction via headphones is presented. Binaural sound reproduction requires to filter the virtual sound source signals with head related transfer functions (HRTFs). To improve humans localization capabilities head tracking as well as room simulation have to be incorporated. This yields the problem of high-quality, time-varying interpolation between different HRTFs. To overcome this problem a virtual ambisonic approach is used that results in a bank of time-invariant HRTF filter.
3D binaural sound reproduction using a virtual ambisonic approach
IEEE International Symposium on Virtual Environments, Human-Computer Interfaces and Measurement Systems, 2003. VECIMS '03. 2003, 2003
A --Convincing binaural sound reprodudion via headphones requires fofllfer the virfual sound source signals wifh head relafed transfer funcfions (HRTFs). Furfhermore, humnns are able lo improve fheir locdimfion capabilifies by smoll unconscious head movemenfs. Therefore it is imporfant fa incorporate head-tracking. This yields fheproblem of high-quality, rime-varying interpolation between dvferent HRTFs A funher improvemenf of humnns Iocalizafion accuracy can be done by considering room simulafion yielding a huge nmounf of virtual sound sources. To increase the computafional eflciency offhe proposed sysfem a virfual Ambisonic approach is used, fhaf resulfs in a bank offimeinvarinnf HRTFJXfer independenf of fhe number of sources Io encode
Individualized HRTF for Playing VR Videos with Ambisonics Spatial Audio on HMDs
2018
Current audio/video head-mounted rendering systems for virtual and augmented reality rely on a binaural approach combined with Ambisonics technology. These head-tracking systems employ generic HRTFs commonly measured with a dummy head in anechoic room. In this paper, we describe a new solution that has been designed to play 360 video files with spatial audio, developed for desktop and portable platforms and built from existing open source software. The HRTF sets can be loaded from a standard audio file chosen in an existing database or from an ad-hoc measurement. The capability to switch multiple HRTF sets while playing files has been added.
Towards Generating Ambisonics Using Audio-visual Cue for Virtual Reality
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
Ambisonics i.e., a full-sphere surround sound, is quintessential with 360 • visual content to provide a realistic virtual reality (VR) experience. While 360 • visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360 • videos using the audiovisual cue. With this aim, firstly, a novel 360 • audiovisual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning based audiovisual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360 • input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360 • audiovisual analysis for future investigations.
Evaluation of Binaural Renderers: Localization
Binaural renderers can be used to reproduce spatial audio over headphones. A number of different renderers have recently become commercially available for use in creating immersive audio content. High-quality spatial audio can be used to significantly enhance experiences in a number of different media applications, such as virtual, mixed and augmented reality, computer games, and music and movie. A large multi-phase experiment evaluating six commercial binaural renderers was performed. This paper presents the methodology, evaluation criteria, and main findings of the horizontal-plane source localization experiment carried out with these renderers. Significant differences between renderers' regional localization accuracy were found. Consistent with previous research, subjects tended to localize better in the front and back of the head than at the sides. Differences between renderer performance at the side regions heavily contributed to their overall regional localization accuracy.
The Journal of the Acoustical Society of America
In this article, the application of spatial covariance matching is investigated for the task of producing spatially enhanced binaural signals using head-worn microphone arrays. A two-step processing paradigm is followed, whereby an initial estimate of the binaural signals is first produced using one of three suggested binaural rendering approaches. The proposed spatial covariance matching enhancement is then applied to these estimated binaural signals with the intention of producing refined binaural signals that more closely exhibit the correct spatial cues as dictated by the employed sound-field model and associated spatial parameters. It is demonstrated, through objective and subjective evaluations, that the proposed enhancements in the majority of cases produce binaural signals that more closely resemble the spatial characteristics of simulated reference signals when the enhancement is applied to and compared against the three suggested starting binaural rendering approaches. Fur...
A 3D Ambisonic Based Binaural Sound Reproduction System
2003
A computationally efficient 3D real time rendering engine for binaural sound reproduction via headphones is presented. Binaural sound reproduction requires to filter the virtual sound source signals with head related transfer functions (HRTFs). To improve humans localization capabilities head tracking as well as room simulation have to be incorporated. This yields the problem of high-quality, time-varying interpolation between different HRTFs. To overcome this problem a virtual ambisonic approach is used that results in a bank of time-invariant HRTF filter.