User-Driven Quality Enhancement for Audio Signal Processing (original) (raw)
Related papers
Interactive quality enhancement in acoustic echo cancellation
2013 36th International Conference on Telecommunications and Signal Processing (TSP)
Classical adaptive algorithms for acoustic echo cancellation (AEC) are often based on error-driven optimization strategies, such as the mean-square error minimization. However, these approaches do not always satisfy the quality requirements demanded by users that avail of such audio signal processing systems. In order to meet subjective specifications, in this paper we put forward the idea of a user-driven approach to echo cancellation through the inclusion of an interactive evolutionary algorithm (IEA) in the optimization stage. As a consequence, performance of an AEC system can be adapted to any user preferences in a principled and systematic way, thus reflecting the desired subjective quality. Experiments in the context of AEC prove the effectiveness of the proposed methodology in enhancing the processed signal quality and show significant statistical advantages of the proposed framework with respect to classical approaches.
A Survey on Adaptive 360° Video Streaming: Solutions, Challenges and Opportunities
IEEE Communications Surveys & Tutorials
Omnidirectional or 360 • video is increasingly being used, mostly due to the latest advancements in immersive Virtual Reality (VR) technology. However, its wide adoption is hindered by the higher bandwidth and lower latency requirements than associated with traditional video content delivery. Diverse researchers propose and design solutions that help support an immersive visual experience of 360 • video, primarily when delivered over a dynamic network environment. This paper presents the state-of-the-art on adaptive 360 • video delivery solutions considering end-to-end video streaming in general and then specifically of 360 • video delivery. Current and emerging solutions for adaptive 360 • video streaming, including viewportindependent, viewport-dependent, and tile-based schemes are presented. Next, solutions for network-assisted unicast and multicast streaming of 360 • video content are discussed. Different research challenges for both on-demand and live 360 • video streaming are also analyzed. Several proposed standards and technologies and top international research projects are then presented. We demonstrate the ongoing standardization efforts for 360 • media services that ensure interoperability and immersive media deployment on a massive scale. Finally, the paper concludes with a discussion about future research opportunities enabled by 360 • video.
Development of the MPEG-H TV Audio System for ATSC 3.0
IEEE Transactions on Broadcasting, 2017
A new TV audio system based on the MPEG-H 3D audio standard has been designed, tested, and implemented for ATSC 3.0 broadcasting. The system offers immersive sound to increase the realism and immersion of programming, and offers audio objects that enable interactivity or personalization by viewers. Immersive sound may be broadcast using loudspeaker channel-based signals or scene-based components in combination with static or dynamic audio objects. Interactivity can be enabled through broadcaster-authored preset mixes or through user control of object gains and positions. Improved loudness and dynamic range control allows tailoring the sound for best reproduction on a variety of consumer devices and listening environments. The system includes features to allow operation in HD-SDI broadcast plants, storage, and editing of complex audio programs on existing video editor software or digital audio workstations, frame-accurate switching of programs, and new technologies to adapt current mixing consoles for live broadcast production of immersive and interactive sound. Field tests at live broadcast events were conducted during system design and a live demonstration test bed was constructed to prove the viability of the system design. The system also includes receiver-side components to enable interactivity, binaural rendering for headphone, or tablet computer listening, a "3D soundbar" for immersive playback without overhead speakers, and transport over HDMI 1.4 connections in consumer equipment. The system has been selected as a proposed Manuscript
Quality evaluation of long duration audiovisual content
2012
With the deepest gratitude I wish to thank my two thesis advisors, prof. U. Peter Svensson and dr. Ulrich Reiter. Their support, encouragement and insightful discussions have been invaluable during my work. Without their patience, feedback and wisdom, this work would not be possible. I would also like to acknowledge and gratitude to all the former colleagues who worked at Q2S for creating a very enjoyable working environment. I would like to thank my parents for allowing me to realize my own potential and for being very supportive during my entire education process. Finally, I'd like to thank my wonderful wife Anna and my two sons Bartek and Filip for all their love, understanding and patience.
Multimedia Tools and Applications, 2016
Recent studies encourage the development of sensorially-enriched media to enhance the user experience by stimulating senses other than sight and hearing. Sensory effects as odor, wind, vibration and light effects, as well as an enhanced audio quality, have been found to favour media enjoyment and to have a positive influence on the sense of Presence and on the perceived quality, relevance and reality of a multimedia experience. In particular, sports is among the genres that could benefit the most from these solutions. Several works have demonstrated also the technical feasibility of implementing and deploying end-to-end solutions integrating sensory effects into a legacy system. Thus, multi-sensorial media emerges as a mean to deliver a new form of immersive experiences to the mass market in a non-disruptive manner. However, many questions remain concerning issues as the sensory effects that can better complement a given audiovisual content or the best way in which to integrate and combine them to enhance the user experience of a target audience segment. The work presented in this paper aims to gain insight into the impact of binaural audio and sensory (light and olfactory) effects on the sports media experience, both at the overall level (average effect) and as a function of users' characteristics (heterogeneous effects). To this aim, we conducted an experimental study exploring the influence of these immersive elements on the quality and Presence dimensions of the media experience. Along the quality dimension, we look for possible variations on the quality scores assigned to the overall media experience and to the media components content, image, audio and sensory effects. The potential impact on Presence is analyzed in terms of Spatial Presence and Engagement. The users' characteristics considered encompass specific personal affective, cognitive and behavioral attributes. At the overall level we found that participants preferred binaural audio over standard stereo audio and that the presence of sensory effects increased significantly the level of Spatial Presence. Several heterogeneous effects were also revealed as a result of our experimental manipulations. Whereas binaural audio was found to have a generalized impact on the majority of the quality and Presence measures considered, the effects of sensory effects concentrate mainly on the Presence dimension. Personal characteristics explained most of the variation in the dependent variables, being individuals' preferences in relation to the content, knowledge of involved technologies, tendency to emotional involvement and conscientiousness among the user variables with the most generalized influence. In particular, the former two features seem to present a conflict in the allocation of attentional resources towards the media content versus the technical features of the system, respectively. Additionally, football fans' experience seems to be modulated by emotional processes whereas for not fans cognitive processes-and in particular those related to quality judgment-prevail.
A systematic threat analysis and defense strategies for the metaverse and extended reality systems
Computers & Security, 2023
With the rapid development and evolution of immersive technologies there are growing concerns of security and privacy threats to the metaverse and extended reality (XR) systems. Immersive reality solutions are a combination of multiple vulnerable technologies allowing attackers to easily undermine security. Furthermore the deployment of appropriate security controls and defensive mechanisms for resource constrained proprietary XR products has been limited. In this paper, we provide a comprehensive overview of extended reality systems and the metaverse with emphasis on technology weaknesses, cyber security challenges and users' safety concerns. Five major taxonomies have been presented in this research with an aim of identifying privacy inference vectors and potential cyber threats; determining the impact on human health and the extent to which cyberstalking, and digital currency scam activities proliferate when using XR. This research also proposes strategies for primary lines of defense and provides recommendations on the adoption of safety measures.
Ambidio: Sound Stage Width Extension for Internal Laptop Loudspeakers
2014
This paper introduces a sound stage width extension method for internal loudspeakers. Ambidio is a realtime application that enhances a stereo sound file playing on a laptop in order to provide a more immersive experience over built-in laptop loudspeakers. The method, based on Ambiophonics principles, is relatively robust to a listener’s head position, and requires no measured/synthesized HRTFs. The key novelty of the approach is the pre/post-processing algorithm that dynamically tracks the image spread and modifies it to fit the hardware setting in real-time. Two detailed evaluations are provided to assess the robustness of the proposed method. Experimental results show that the average perceived stage width of Ambidio is 176° using internal speakers, while keeping a relatively flat frequency response and a higher user preference rating.
On the suitability of evolutionary computing to developing tools for intelligent music production
2017
Intelligent music production tools aim to assist the user by automating music production tasks. Many previous systems sought to create the best possible mix based on technical parameters but rarely has subjectivity been directly incorporated. This paper proposes that a new generation of tools can be designed based on evolutionary computation methods, which are particularly suited to dealing with the non-linearities and complex solution spaces introduced by perceptual evaluation. These techniques are well-suited to studio applications, in contrast to many previous systems which prioritized the live environment. Furthermore, there is potential to address accessibility issues in existing systems which rely greatly on visual feedback. A survey of previous literature is provided before the current state-of-the-art is described and a number of suggestions for future directions in the field are made.
3D-TV R&D Activities in Europe
IEEE Transactions on Broadcasting, 2011
3D-TV is a topic that has been studied for many years in Europe. Through the research frameworks of the European Commission in particular, a number of long-term issues have been addressed to overcome limitations of the traditional two-view stereoscopy. This article gives a brief overview of the goals and achievements of some completed European projects starting in the 1990s. It then reviews the topics related to 3D-TV in recent European research. Finally an overview with a selection of recent projects is presented. Index Terms-Digital video broadcasting, multimedia systems, stereo vision, TV broadcasting. I. INTRODUCTION E UROPE has a long history in three-dimensional television (3D-TV), starting from the first demonstration of stereoscopic television by Baird in 1928. Although the normal, two-dimensional television has been broadcast for decades, again pioneered in Europe, 3D-TV did not become a regular service in the 20th century. One of the main reasons was immature technology, which then led to a number of research activities to overcome these inhibiting factors. The contributions from Europe in this global effort are the subject of this paper. The prime concept of 3D-TV is to add a stereoscopic sensation to the television experience. The technological way to implement this is to provide different views to each eye of the viewer. This can be achieved in many ways. The simplest concept is still based on principles of stereo photography developed in the 19th century, in which two views of the scene with two horizontally offset cameras are captured and then presented individually to each eye of the viewer. This two-view stereo forms the basis for current implementation of 3D in cinema and recently emerging TV services. Although two-view stereo is likely to be the representation of choice of the industry for some time to come, it has a number of limitations. These arise mainly out of the fact that the parameters of a two-camera capture rig have to be fixed during the capture and cannot be changed either in post-production or at the end-device. This limitation led to a number of European research initiatives in the 1990s that looked into topics including Manuscript