Jörn Anemüller | Carl von Ossietzky University of Oldenburg (original) (raw)
Papers by Jörn Anemüller
Independent component analysis (ICA) has proven useful for modeling brain and electroencephalogra... more Independent component analysis (ICA) has proven useful for modeling brain and electroencephalographic (EEG) data. Here, we present a new, generalized method to better capture the dynamics of brain signals than previous ICA algorithms. We regard EEG sources as eliciting spatio-temporal activity patterns, corresponding to, e.g. trajectories of activation propagating across cortex. This leads to a model of convolutive signal superposition, in contrast with the commonly used instantaneous mixing model. In the frequency-domain, convolutive mixing is equivalent to multiplicative mixing of complex signal sources within distinct spectral bands. We decompose the recorded spectral- domain signals into independent components by a complex infomax ICA algorithm. First results from a visual attention EEG experiment exhibit: (1) sources of spatio-temporal dynamics in the data, (2) links to subject behavior, (3) sources with a limited spectral extent, and (4) a higher degree of independence compare...
2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014
Sound source localization algorithms commonly include assessment of inter-sensor (generalized) co... more Sound source localization algorithms commonly include assessment of inter-sensor (generalized) correlation functions to obtain direction-of-arrival estimates. Here, we present a classificationbased method for source localization that uses discriminative support vector machine-learning of correlation patterns that are indicative of source presence or absence. Subsequent probabilistic modeling generates a map of sound source presence probability in given directions. Being data-driven, the method during training adapts to characteristics of the sensor setup, such as convolution effects in non-free-field situations, and to target signal specific acoustic properties. Experimental evaluation was conducted with algorithm training in anechoic single-talker scenarios and test data from several reverberant multi-talker situations, together with diffuse and real-recorded background noise, respectively. Results demonstrate that the method successfully generalizes from training to test conditions. Improvement over the best of five investigated state-of-the-art angular spectrum-based reference methods was on average about 45% in terms of relative F-measure-related error reduction.
This paper presents techniques aiming at improving automatic speech recognition (ASR) in single c... more This paper presents techniques aiming at improving automatic speech recognition (ASR) in single channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. System improvements range from speech enhancement over robust feature extraction to model adaptation and word-based integration of multiple classifiers. The selective temporal cepstrum smoothing (TCS) technique is applied to enhance the reverberant speech signal at moderate noise levels, based on a statistical model of room impulse responses (RIRs) and minimum statistics (MS), considering estimates of late reverberations and the noise power spectrum densities (PSDs). Robust feature extraction is performed by amplitude modulation filtering of the cepstrogram to extract its temporal modulation information. As an alternative classifier, the acoustic models have been adopted using different RIRs and a RIR selection scheme based on a multi-layer perceptron (MLP) system that uses spectro-temporal features as the input. In the final stage, a system combination approach achieved by recognizer output voting error reduction (ROVER) is employed to obtain a jointly optimal recognized transcription. The proposed system has been evaluated in two different processing modes, i.e. utterancebased batch processing and full batch processing, which results in an overall average absolute improvement of 11% under variant reverberant conditions compared to the baseline system.
Journal of Physics A: Mathematical and General, 2001
dirac.uni-oldenburg.de
... The first fully scale-invariant spatio-temporal feature detector that is fast enough for vide... more ... The first fully scale-invariant spatio-temporal feature detector that is fast enough for video ... new audio-visual sensor allowing to sense images and sound in a coherent observer-centered ... The machine learning approach was used to extract relevant features of short activities and ...
FORTSCHRITTE DER …, 2000
... Algorithmen zu Störgeräuschreduktion die Erkennungs-leistung des PEMO/LRNN Systems bei additi... more ... Algorithmen zu Störgeräuschreduktion die Erkennungs-leistung des PEMO/LRNN Systems bei additiven Störgeräuschen beträchtlich steigern ... Methoden Es wurden Kunstkopfaufnahmen von Sprache und Störgeräusch aus reflexionsarmer und aus verhallter Umgebung benutzt ...
arXiv: Audio and Speech Processing, Aug 23, 2021
We investigate the viability of a variational U-Net architecture for denoising of single-channel ... more We investigate the viability of a variational U-Net architecture for denoising of single-channel audio data. Deep network speech enhancement systems commonly aim to estimate filter masks, or opt to work on the waveform signal, potentially neglecting relationships across higher dimensional spectrotemporal features. We study the adoption of a probabilistic bottleneck into the classic U-Net architecture for direct spectral reconstruction. Evaluation of several ablation network variants is carried out using signal-to-distortion ratio and perceptual measures, on audio data that includes known and unknown noise types as well as reverberation. Our experiments show that the residual (skip) connections in the proposed system are a prerequisite for successful spectral reconstruction, i.e., without filter mask estimation. Results show, on average, an advantage of the proposed variational U-Net architecture over its classic, nonvariational version in signal enhancement performance under reverberant conditions of 0.31 and 6.98 in PESQ and STOI scores, respectively. Anecdotal evidence points to improved suppression of impulsive noise sources with the variational U-Net compared to the recurrent mask estimation network baseline.
Independent component analysis (ICA) has proven useful for modeling brain and electroencephalogra... more Independent component analysis (ICA) has proven useful for modeling brain and electroencephalographic (EEG) data. Here, we present a new, generalized method to better capture the dynamics of brain signals than previous ICA algorithms. We regard EEG sources as eliciting spatio-temporal activity patterns, corresponding to, e.g. trajectories of activation propagating across cortex. This leads to a model of convolutive signal superposition, in contrast with the commonly used instantaneous mixing model. In the frequency-domain, convolutive mixing is equivalent to multiplicative mixing of complex signal sources within distinct spectral bands. We decompose the recorded spectral- domain signals into independent components by a complex infomax ICA algorithm. First results from a visual attention EEG experiment exhibit: (1) sources of spatio-temporal dynamics in the data, (2) links to subject behavior, (3) sources with a limited spectral extent, and (4) a higher degree of independence compare...
2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014
Sound source localization algorithms commonly include assessment of inter-sensor (generalized) co... more Sound source localization algorithms commonly include assessment of inter-sensor (generalized) correlation functions to obtain direction-of-arrival estimates. Here, we present a classificationbased method for source localization that uses discriminative support vector machine-learning of correlation patterns that are indicative of source presence or absence. Subsequent probabilistic modeling generates a map of sound source presence probability in given directions. Being data-driven, the method during training adapts to characteristics of the sensor setup, such as convolution effects in non-free-field situations, and to target signal specific acoustic properties. Experimental evaluation was conducted with algorithm training in anechoic single-talker scenarios and test data from several reverberant multi-talker situations, together with diffuse and real-recorded background noise, respectively. Results demonstrate that the method successfully generalizes from training to test conditions. Improvement over the best of five investigated state-of-the-art angular spectrum-based reference methods was on average about 45% in terms of relative F-measure-related error reduction.
This paper presents techniques aiming at improving automatic speech recognition (ASR) in single c... more This paper presents techniques aiming at improving automatic speech recognition (ASR) in single channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. System improvements range from speech enhancement over robust feature extraction to model adaptation and word-based integration of multiple classifiers. The selective temporal cepstrum smoothing (TCS) technique is applied to enhance the reverberant speech signal at moderate noise levels, based on a statistical model of room impulse responses (RIRs) and minimum statistics (MS), considering estimates of late reverberations and the noise power spectrum densities (PSDs). Robust feature extraction is performed by amplitude modulation filtering of the cepstrogram to extract its temporal modulation information. As an alternative classifier, the acoustic models have been adopted using different RIRs and a RIR selection scheme based on a multi-layer perceptron (MLP) system that uses spectro-temporal features as the input. In the final stage, a system combination approach achieved by recognizer output voting error reduction (ROVER) is employed to obtain a jointly optimal recognized transcription. The proposed system has been evaluated in two different processing modes, i.e. utterancebased batch processing and full batch processing, which results in an overall average absolute improvement of 11% under variant reverberant conditions compared to the baseline system.
Journal of Physics A: Mathematical and General, 2001
dirac.uni-oldenburg.de
... The first fully scale-invariant spatio-temporal feature detector that is fast enough for vide... more ... The first fully scale-invariant spatio-temporal feature detector that is fast enough for video ... new audio-visual sensor allowing to sense images and sound in a coherent observer-centered ... The machine learning approach was used to extract relevant features of short activities and ...
FORTSCHRITTE DER …, 2000
... Algorithmen zu Störgeräuschreduktion die Erkennungs-leistung des PEMO/LRNN Systems bei additi... more ... Algorithmen zu Störgeräuschreduktion die Erkennungs-leistung des PEMO/LRNN Systems bei additiven Störgeräuschen beträchtlich steigern ... Methoden Es wurden Kunstkopfaufnahmen von Sprache und Störgeräusch aus reflexionsarmer und aus verhallter Umgebung benutzt ...
arXiv: Audio and Speech Processing, Aug 23, 2021
We investigate the viability of a variational U-Net architecture for denoising of single-channel ... more We investigate the viability of a variational U-Net architecture for denoising of single-channel audio data. Deep network speech enhancement systems commonly aim to estimate filter masks, or opt to work on the waveform signal, potentially neglecting relationships across higher dimensional spectrotemporal features. We study the adoption of a probabilistic bottleneck into the classic U-Net architecture for direct spectral reconstruction. Evaluation of several ablation network variants is carried out using signal-to-distortion ratio and perceptual measures, on audio data that includes known and unknown noise types as well as reverberation. Our experiments show that the residual (skip) connections in the proposed system are a prerequisite for successful spectral reconstruction, i.e., without filter mask estimation. Results show, on average, an advantage of the proposed variational U-Net architecture over its classic, nonvariational version in signal enhancement performance under reverberant conditions of 0.31 and 6.98 in PESQ and STOI scores, respectively. Anecdotal evidence points to improved suppression of impulsive noise sources with the variational U-Net compared to the recurrent mask estimation network baseline.