Axel Plinge | TU Dortmund (original) (raw)
Papers by Axel Plinge
A multitude of multi-microphone speech enhancement methods is available. In this paper, we focus ... more A multitude of multi-microphone speech enhancement methods is available. In this paper, we focus our attention to the well-known minimum variance distortionless response (MVDR) beamformer, due to its ability to preserve distortionless response towards the desired speaker while minimizing the output noise power. We explore two alternatives for constructing the steering vectors towards the desired speech source. One is only using the direct path of the speech propagation in the form of delay-only filters, while the other is using the entire room impulse response (RIR). All beamforming methods requires some control information to be able to accomplish the task of enhancing a desired speech signal. In this paper, an acoustic event detection method using biologically-inspired features is employed. It can interpret the auditory scene by detecting the presence of different auditory objects. This is employed to control the estimation procedures used by beamformer. The resulting system provides a blind method of speech enhancement that can improve intelligibility independently of any additional information. Experiments with real recordings show the practical applicability of the method. Significant gain in fwSNRseg is achieved. Compared to using the direct path only, the use of the entire RIR proves beneficial. Index Terms— microphone array, auditory scene analysis, blind beamformer for speech enhancement
Tracking speakers is one of the key tasks in smart environments. A neurobiologically inspired rea... more Tracking speakers is one of the key tasks in smart environments. A neurobiologically inspired realtime system using multiple distributed nodes with small circular microphone arrays is designed to accomplish this task. Each node localizes speakers with a dedicated cochlear and midbrain model. Sparse angular localizations and their spectra are transmitted to an integration node where they are associated using their spectra to resolve the ambiguity of multiple simultaneous detections for multiple concurrent speakers. The speakers' Euclidean coordinates are computed by triangulation and tracking is realized by integrating over time using spatial association. The system is designed to be robust against drift, jitter and transmission errors, so that it can be easily realized with wireless connections. Practical applicability is proven with recordings of persons in a laboratory setup in an highly reverberant conference room where concurrent speakers are tracked with good accuracy.
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014
ABSTRACT Tracking speakers is one of the key tasks in smart environments. A neurobiologically ins... more ABSTRACT Tracking speakers is one of the key tasks in smart environments. A neurobiologically inspired realtime system using multiple distributed nodes with small circular microphone arrays is designed to accomplish this task. Each node localizes speakers with a dedicated cochlear and midbrain model. Sparse angular localizations and their spectra are transmitted to an integration node where they are associated using their spectra to resolve the ambiguity of multiple simultaneous detections for multiple concurrent speakers. The speakers' Euclidean coordinates are computed by triangulation and tracking is realized by integrating over time using spatial association. The system is designed to be robust against drift, jitter and transmission errors, so that it can be easily realized with wireless connections. Practical applicability is proven with recordings of persons in a laboratory setup in an highly reverberant conference room where concurrent speakers are tracked with good accuracy.
Psychological Research, 2012
The aim of this study was to investigate the spatial orienting of visual attention in depth under... more The aim of this study was to investigate the spatial orienting of visual attention in depth under purely stereoscopic viewing conditions. Random-dot stereograms were used to present disparity-defined target stimuli that were either validly or invalidly cued in depth. In separate tasks, participants responded either to the relative depth of the target (protruding vs. receding) or to its shape (square vs. diamond). Stimulus onset asynchronies (SOAs) between an uninformative exogenous cue and target were varied from 250 to 600 ms. For both tasks, mean response times (RTs) were shorter for validly than invalidly cued target depths and this RT advantage was essentially restricted to the shortest SOA of 250 ms. These results indicate that attention can be reflexively allocated to locations in stereo depth under conditions of low perceptual load, and independent of whether depth is relevant to the task or not.
Abstract. For several years, we have been working on means to improve speech reception for severe... more Abstract. For several years, we have been working on means to improve speech reception for severely sensory hearing-impaired persons. The work done includes algorithms for non-linear speech processing as well as phoneme spotting and transposition. The overall goal is to implement some of these algorithms into low-power DSPs in a wearable device. Here we will present the current state of our research, and some examples of processed speech will be demonstrated. Keywords: hearing impairment, digital processing, transposition, ...
Assistive technology: added value to the quality of life, AAATE'01, Oct 1, 2001
Abstract. Modern communication technology-if not adapted to the needs of the severely hearing imp... more Abstract. Modern communication technology-if not adapted to the needs of the severely hearing impaired person-leads to the exclusion from everyday communication. Only if it is well adapted it may offer a higher degree of freedom and integration. A telecommunication adapter was developed and can be used for two purposes: to provide access to mobile phone technology, and-with extension–to high quality access to PSTN phones. At the same time connections to TV, radio and other external sources are made possible.
A multitude of multi-microphone speech enhancement methods is available. In this paper, we focus ... more A multitude of multi-microphone speech enhancement methods is available. In this paper, we focus our attention to the well-known minimum variance distortionless response (MVDR) beamformer, due to its ability to preserve distortionless response towards the desired speaker while minimizing the output noise power. We explore two alternatives for constructing the steering vectors towards the desired speech source. One is only using the direct path of the speech propagation in the form of delay-only filters, while the other is using the entire room impulse response (RIR). All beamforming methods requires some control information to be able to accomplish the task of enhancing a desired speech signal. In this paper, an acoustic event detection method using biologically-inspired features is employed. It can interpret the auditory scene by detecting the presence of different auditory objects. This is employed to control the estimation procedures used by beamformer. The resulting system provides a blind method of speech enhancement that can improve intelligibility independently of any additional information. Experiments with real recordings show the practical applicability of the method. Significant gain in fwSNRseg is achieved. Compared to using the direct path only, the use of the entire RIR proves beneficial. Index Terms— microphone array, auditory scene analysis, blind beamformer for speech enhancement
Tracking speakers is one of the key tasks in smart environments. A neurobiologically inspired rea... more Tracking speakers is one of the key tasks in smart environments. A neurobiologically inspired realtime system using multiple distributed nodes with small circular microphone arrays is designed to accomplish this task. Each node localizes speakers with a dedicated cochlear and midbrain model. Sparse angular localizations and their spectra are transmitted to an integration node where they are associated using their spectra to resolve the ambiguity of multiple simultaneous detections for multiple concurrent speakers. The speakers' Euclidean coordinates are computed by triangulation and tracking is realized by integrating over time using spatial association. The system is designed to be robust against drift, jitter and transmission errors, so that it can be easily realized with wireless connections. Practical applicability is proven with recordings of persons in a laboratory setup in an highly reverberant conference room where concurrent speakers are tracked with good accuracy.
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014
ABSTRACT Tracking speakers is one of the key tasks in smart environments. A neurobiologically ins... more ABSTRACT Tracking speakers is one of the key tasks in smart environments. A neurobiologically inspired realtime system using multiple distributed nodes with small circular microphone arrays is designed to accomplish this task. Each node localizes speakers with a dedicated cochlear and midbrain model. Sparse angular localizations and their spectra are transmitted to an integration node where they are associated using their spectra to resolve the ambiguity of multiple simultaneous detections for multiple concurrent speakers. The speakers' Euclidean coordinates are computed by triangulation and tracking is realized by integrating over time using spatial association. The system is designed to be robust against drift, jitter and transmission errors, so that it can be easily realized with wireless connections. Practical applicability is proven with recordings of persons in a laboratory setup in an highly reverberant conference room where concurrent speakers are tracked with good accuracy.
Psychological Research, 2012
The aim of this study was to investigate the spatial orienting of visual attention in depth under... more The aim of this study was to investigate the spatial orienting of visual attention in depth under purely stereoscopic viewing conditions. Random-dot stereograms were used to present disparity-defined target stimuli that were either validly or invalidly cued in depth. In separate tasks, participants responded either to the relative depth of the target (protruding vs. receding) or to its shape (square vs. diamond). Stimulus onset asynchronies (SOAs) between an uninformative exogenous cue and target were varied from 250 to 600 ms. For both tasks, mean response times (RTs) were shorter for validly than invalidly cued target depths and this RT advantage was essentially restricted to the shortest SOA of 250 ms. These results indicate that attention can be reflexively allocated to locations in stereo depth under conditions of low perceptual load, and independent of whether depth is relevant to the task or not.
Abstract. For several years, we have been working on means to improve speech reception for severe... more Abstract. For several years, we have been working on means to improve speech reception for severely sensory hearing-impaired persons. The work done includes algorithms for non-linear speech processing as well as phoneme spotting and transposition. The overall goal is to implement some of these algorithms into low-power DSPs in a wearable device. Here we will present the current state of our research, and some examples of processed speech will be demonstrated. Keywords: hearing impairment, digital processing, transposition, ...
Assistive technology: added value to the quality of life, AAATE'01, Oct 1, 2001
Abstract. Modern communication technology-if not adapted to the needs of the severely hearing imp... more Abstract. Modern communication technology-if not adapted to the needs of the severely hearing impaired person-leads to the exclusion from everyday communication. Only if it is well adapted it may offer a higher degree of freedom and integration. A telecommunication adapter was developed and can be used for two purposes: to provide access to mobile phone technology, and-with extension–to high quality access to PSTN phones. At the same time connections to TV, radio and other external sources are made possible.