Jörn Anemüller | Carl von Ossietzky University of Oldenburg (original) (raw)

Papers by Jörn Anemüller

Research paper thumbnail of 2003 Special Issue Complex independent component analysis of frequency-domain electroencephalographic data

Independent component analysis (ICA) has proven useful for modeling brain and electroencephalogra... more Independent component analysis (ICA) has proven useful for modeling brain and electroencephalographic (EEG) data. Here, we present a new, generalized method to better capture the dynamics of brain signals than previous ICA algorithms. We regard EEG sources as eliciting spatio-temporal activity patterns, corresponding to, e.g. trajectories of activation propagating across cortex. This leads to a model of convolutive signal superposition, in contrast with the commonly used instantaneous mixing model. In the frequency-domain, convolutive mixing is equivalent to multiplicative mixing of complex signal sources within distinct spectral bands. We decompose the recorded spectral- domain signals into independent components by a complex infomax ICA algorithm. First results from a visual attention EEG experiment exhibit: (1) sources of spatio-temporal dynamics in the data, (2) links to subject behavior, (3) sources with a limited spectral extent, and (4) a higher degree of independence compare...

Research paper thumbnail of Meyer et al BMC Neurosci 2009

Research paper thumbnail of A discriminative learning approach to probabilistic acoustic source localization

2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014

Sound source localization algorithms commonly include assessment of inter-sensor (generalized) co... more Sound source localization algorithms commonly include assessment of inter-sensor (generalized) correlation functions to obtain direction-of-arrival estimates. Here, we present a classificationbased method for source localization that uses discriminative support vector machine-learning of correlation patterns that are indicative of source presence or absence. Subsequent probabilistic modeling generates a map of sound source presence probability in given directions. Being data-driven, the method during training adapts to characteristics of the sensor setup, such as convolution effects in non-free-field situations, and to target signal specific acoustic properties. Experimental evaluation was conducted with algorithm training in anechoic single-talker scenarios and test data from several reverberant multi-talker situations, together with diffuse and real-recorded background noise, respectively. Results demonstrate that the method successfully generalizes from training to test conditions. Improvement over the best of five investigated state-of-the-art angular spectrum-based reference methods was on average about 45% in terms of relative F-measure-related error reduction.

Research paper thumbnail of Akustische Quellentrennung im Frequenzbereich

Research paper thumbnail of Convolutive Blind Source Sepatation of Speech Signals

Research paper thumbnail of Acoustic event detection using signal enhancement and spectro-temporal feature extraction

Research paper thumbnail of Robust ASR in reverberant environments using temporal cepstrum smoothing for speech enhancement and an amplitude modulation filterbank for feature extraction

This paper presents techniques aiming at improving automatic speech recognition (ASR) in single c... more This paper presents techniques aiming at improving automatic speech recognition (ASR) in single channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. System improvements range from speech enhancement over robust feature extraction to model adaptation and word-based integration of multiple classifiers. The selective temporal cepstrum smoothing (TCS) technique is applied to enhance the reverberant speech signal at moderate noise levels, based on a statistical model of room impulse responses (RIRs) and minimum statistics (MS), considering estimates of late reverberations and the noise power spectrum densities (PSDs). Robust feature extraction is performed by amplitude modulation filtering of the cepstrogram to extract its temporal modulation information. As an alternative classifier, the acoustic models have been adopted using different RIRs and a RIR selection scheme based on a multi-layer perceptron (MLP) system that uses spectro-temporal features as the input. In the final stage, a system combination approach achieved by recognizer output voting error reduction (ROVER) is employed to obtain a jointly optimal recognized transcription. The proposed system has been evaluated in two different processing modes, i.e. utterancebased batch processing and full batch processing, which results in an overall average absolute improvement of 11% under variant reverberant conditions compared to the baseline system.

Research paper thumbnail of Coupled dynamics of fast spins and slow exchange interactions in theXYspin glass

Journal of Physics A: Mathematical and General, 2001

Research paper thumbnail of Final Activity Report

dirac.uni-oldenburg.de

... The first fully scale-invariant spatio-temporal feature detector that is fast enough for vide... more ... The first fully scale-invariant spatio-temporal feature detector that is fast enough for video ... new audio-visual sensor allowing to sense images and sound in a coherent observer-centered ... The machine learning approach was used to extract relevant features of short activities and ...

Research paper thumbnail of Blinde Quellentrennung als Vorverarbeitung zur robusten Spracherkennung

FORTSCHRITTE DER …, 2000

... Algorithmen zu Störgeräuschreduktion die Erkennungs-leistung des PEMO/LRNN Systems bei additi... more ... Algorithmen zu Störgeräuschreduktion die Erkennungs-leistung des PEMO/LRNN Systems bei additiven Störgeräuschen beträchtlich steigern ... Methoden Es wurden Kunstkopfaufnahmen von Sprache und Störgeräusch aus reflexionsarmer und aus verhallter Umgebung benutzt ...

Research paper thumbnail of Blinde Separation von Sprachsignalen basierend auf dem Kriterium maximaler Disjunktheit

Research paper thumbnail of Blinde akustische Quellentrennung im Frequenzbereich

Research paper thumbnail of Audio event detection for in-home care

Research paper thumbnail of Noninvasive Imaging of Independent Cortical Flow Patterns

Research paper thumbnail of Biomedical Applications-Reliable Measurement of Cortical Flow Patterns Using Complex Independent Component Analysis of Electroencephalographic Signals

Research paper thumbnail of Biomedical Applications-Unraveling Spatio-temporal Dynamics in fMRI Recordings Using Complex ICA

Research paper thumbnail of Monitoring and representing complex signals

Research paper thumbnail of Biologically motivated audio-visual cue integration for object categorization

Research paper thumbnail of Supporting Information to “Discriminative Learning of Receptive Fields from Responses to non-Gaussian Stimulus Ensembles”

Research paper thumbnail of Towards speech enhancement using a variational U-Net architecture

arXiv: Audio and Speech Processing, Aug 23, 2021

We investigate the viability of a variational U-Net architecture for denoising of single-channel ... more We investigate the viability of a variational U-Net architecture for denoising of single-channel audio data. Deep network speech enhancement systems commonly aim to estimate filter masks, or opt to work on the waveform signal, potentially neglecting relationships across higher dimensional spectrotemporal features. We study the adoption of a probabilistic bottleneck into the classic U-Net architecture for direct spectral reconstruction. Evaluation of several ablation network variants is carried out using signal-to-distortion ratio and perceptual measures, on audio data that includes known and unknown noise types as well as reverberation. Our experiments show that the residual (skip) connections in the proposed system are a prerequisite for successful spectral reconstruction, i.e., without filter mask estimation. Results show, on average, an advantage of the proposed variational U-Net architecture over its classic, nonvariational version in signal enhancement performance under reverberant conditions of 0.31 and 6.98 in PESQ and STOI scores, respectively. Anecdotal evidence points to improved suppression of impulsive noise sources with the variational U-Net compared to the recurrent mask estimation network baseline.

Research paper thumbnail of 2003 Special Issue Complex independent component analysis of frequency-domain electroencephalographic data

Independent component analysis (ICA) has proven useful for modeling brain and electroencephalogra... more Independent component analysis (ICA) has proven useful for modeling brain and electroencephalographic (EEG) data. Here, we present a new, generalized method to better capture the dynamics of brain signals than previous ICA algorithms. We regard EEG sources as eliciting spatio-temporal activity patterns, corresponding to, e.g. trajectories of activation propagating across cortex. This leads to a model of convolutive signal superposition, in contrast with the commonly used instantaneous mixing model. In the frequency-domain, convolutive mixing is equivalent to multiplicative mixing of complex signal sources within distinct spectral bands. We decompose the recorded spectral- domain signals into independent components by a complex infomax ICA algorithm. First results from a visual attention EEG experiment exhibit: (1) sources of spatio-temporal dynamics in the data, (2) links to subject behavior, (3) sources with a limited spectral extent, and (4) a higher degree of independence compare...

Research paper thumbnail of Meyer et al BMC Neurosci 2009

Research paper thumbnail of A discriminative learning approach to probabilistic acoustic source localization

2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014

Sound source localization algorithms commonly include assessment of inter-sensor (generalized) co... more Sound source localization algorithms commonly include assessment of inter-sensor (generalized) correlation functions to obtain direction-of-arrival estimates. Here, we present a classificationbased method for source localization that uses discriminative support vector machine-learning of correlation patterns that are indicative of source presence or absence. Subsequent probabilistic modeling generates a map of sound source presence probability in given directions. Being data-driven, the method during training adapts to characteristics of the sensor setup, such as convolution effects in non-free-field situations, and to target signal specific acoustic properties. Experimental evaluation was conducted with algorithm training in anechoic single-talker scenarios and test data from several reverberant multi-talker situations, together with diffuse and real-recorded background noise, respectively. Results demonstrate that the method successfully generalizes from training to test conditions. Improvement over the best of five investigated state-of-the-art angular spectrum-based reference methods was on average about 45% in terms of relative F-measure-related error reduction.

Research paper thumbnail of Akustische Quellentrennung im Frequenzbereich

Research paper thumbnail of Convolutive Blind Source Sepatation of Speech Signals

Research paper thumbnail of Acoustic event detection using signal enhancement and spectro-temporal feature extraction

Research paper thumbnail of Robust ASR in reverberant environments using temporal cepstrum smoothing for speech enhancement and an amplitude modulation filterbank for feature extraction

This paper presents techniques aiming at improving automatic speech recognition (ASR) in single c... more This paper presents techniques aiming at improving automatic speech recognition (ASR) in single channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. System improvements range from speech enhancement over robust feature extraction to model adaptation and word-based integration of multiple classifiers. The selective temporal cepstrum smoothing (TCS) technique is applied to enhance the reverberant speech signal at moderate noise levels, based on a statistical model of room impulse responses (RIRs) and minimum statistics (MS), considering estimates of late reverberations and the noise power spectrum densities (PSDs). Robust feature extraction is performed by amplitude modulation filtering of the cepstrogram to extract its temporal modulation information. As an alternative classifier, the acoustic models have been adopted using different RIRs and a RIR selection scheme based on a multi-layer perceptron (MLP) system that uses spectro-temporal features as the input. In the final stage, a system combination approach achieved by recognizer output voting error reduction (ROVER) is employed to obtain a jointly optimal recognized transcription. The proposed system has been evaluated in two different processing modes, i.e. utterancebased batch processing and full batch processing, which results in an overall average absolute improvement of 11% under variant reverberant conditions compared to the baseline system.

Research paper thumbnail of Coupled dynamics of fast spins and slow exchange interactions in theXYspin glass

Journal of Physics A: Mathematical and General, 2001

Research paper thumbnail of Final Activity Report

dirac.uni-oldenburg.de

... The first fully scale-invariant spatio-temporal feature detector that is fast enough for vide... more ... The first fully scale-invariant spatio-temporal feature detector that is fast enough for video ... new audio-visual sensor allowing to sense images and sound in a coherent observer-centered ... The machine learning approach was used to extract relevant features of short activities and ...

Research paper thumbnail of Blinde Quellentrennung als Vorverarbeitung zur robusten Spracherkennung

FORTSCHRITTE DER …, 2000

... Algorithmen zu Störgeräuschreduktion die Erkennungs-leistung des PEMO/LRNN Systems bei additi... more ... Algorithmen zu Störgeräuschreduktion die Erkennungs-leistung des PEMO/LRNN Systems bei additiven Störgeräuschen beträchtlich steigern ... Methoden Es wurden Kunstkopfaufnahmen von Sprache und Störgeräusch aus reflexionsarmer und aus verhallter Umgebung benutzt ...

Research paper thumbnail of Blinde Separation von Sprachsignalen basierend auf dem Kriterium maximaler Disjunktheit

Research paper thumbnail of Blinde akustische Quellentrennung im Frequenzbereich

Research paper thumbnail of Audio event detection for in-home care

Research paper thumbnail of Noninvasive Imaging of Independent Cortical Flow Patterns

Research paper thumbnail of Biomedical Applications-Reliable Measurement of Cortical Flow Patterns Using Complex Independent Component Analysis of Electroencephalographic Signals

Research paper thumbnail of Biomedical Applications-Unraveling Spatio-temporal Dynamics in fMRI Recordings Using Complex ICA

Research paper thumbnail of Monitoring and representing complex signals

Research paper thumbnail of Biologically motivated audio-visual cue integration for object categorization

Research paper thumbnail of Supporting Information to “Discriminative Learning of Receptive Fields from Responses to non-Gaussian Stimulus Ensembles”

Research paper thumbnail of Towards speech enhancement using a variational U-Net architecture

arXiv: Audio and Speech Processing, Aug 23, 2021

We investigate the viability of a variational U-Net architecture for denoising of single-channel ... more We investigate the viability of a variational U-Net architecture for denoising of single-channel audio data. Deep network speech enhancement systems commonly aim to estimate filter masks, or opt to work on the waveform signal, potentially neglecting relationships across higher dimensional spectrotemporal features. We study the adoption of a probabilistic bottleneck into the classic U-Net architecture for direct spectral reconstruction. Evaluation of several ablation network variants is carried out using signal-to-distortion ratio and perceptual measures, on audio data that includes known and unknown noise types as well as reverberation. Our experiments show that the residual (skip) connections in the proposed system are a prerequisite for successful spectral reconstruction, i.e., without filter mask estimation. Results show, on average, an advantage of the proposed variational U-Net architecture over its classic, nonvariational version in signal enhancement performance under reverberant conditions of 0.31 and 6.98 in PESQ and STOI scores, respectively. Anecdotal evidence points to improved suppression of impulsive noise sources with the variational U-Net compared to the recurrent mask estimation network baseline.