A. Adami | Universidade de Caxias do Sul (original) (raw)
Papers by A. Adami
Anais do XXVI Simpósio Brasileiro de Telecomunicações
Speech processing is a data-driven technology that relies on public corpora and associated resour... more Speech processing is a data-driven technology that relies on public corpora and associated resources. In contrast to languages such as English, there are few resources for Brazilian Portuguese (BP). This work describes efforts toward decreasing such gap and presents systems for speech recognition in BP using two public corpora: Spoltech and OGI-22. The following resources are made available: ATK and HTK scripts, pronunciation dictionary, language and acoustic models. The work discusses the baseline results obtained with these resources.
Proceedings Third International Conference on Computational Intelligence and Multimedia Applications. ICCIMA'99 (Cat. No.PR00300)
This paper presents an implementation of a security system for elevators using speaker identifica... more This paper presents an implementation of a security system for elevators using speaker identification. The system uses a model of an artificial neural network called a multi-layer perceptron as a classifier. In this work, some features, such as pitch, formants, perceptual linear prediction coefficients, mel-cepstral coefficients and cepstral coefficients, are used to obtain the best results in the classification process
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2000
Most current state-of-the-art automatic speaker recognition systems extract speaker-dependent fea... more Most current state-of-the-art automatic speaker recognition systems extract speaker-dependent features by looking at shortterm spectral information. This approach ignores long-term information that can convey supra-segmental information, such as prosodics and speaking style. We propose two approaches that use the fundamental frequency and energy trajectories to capture long-term information. The first approach uses bigram models to model the dynamics of the fundamental frequency and energy trajectories for each speaker. The second approach uses the fundamental frequency trajectories of a pre-defined set of words as the speaker templates and then, using dynamic time warping, computes the distance between the templates and the words from the test message. The results presented in this work are on Switchboard I using the NIST Extended Data evaluation design. We show that these approaches can achieve an equal error rate of 3.7%, which is a 77% relative improvement over a system based on short-term pitch and energy features alone.
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2003
DoD (3) IBM (4) ICSI (5) OGI (6) CMU (7) Charles Univ. (8) York Univ. (9) Princeton Univ. (10) Co... more DoD (3) IBM (4) ICSI (5) OGI (6) CMU (7) Charles Univ. (8) York Univ. (9) Princeton Univ. (10) Cornell Univ. • This work is sponsored by the Department of Defense under Air Force Contract F19628-00-C-0002.and the CLSP/JHU workshop was supported by NSF and DoD fudning. Opinions, interpretations, conclusions and recommendations are those of the authors and are not necessarily endorsed by the United States Government + The authors gratefully acknowledge the CLSP group at JHU for organizing and hosting WS2002.
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2005
Modern brain computer interface (BCI) applications use information obtained from the user's e... more Modern brain computer interface (BCI) applications use information obtained from the user's electroencephalogram (EEG) to estimate the mental states. Selecting an optimal subset of the EEG channels instead of using all of them is especially important for ambulatory EEG where the user is mobile due to reduced data communication and computational load requirements. In addition, elimination of irrelevant sensors improves the robustness of the classification system by reducing dimensionality. In this paper, we propose a filter approach for EEG channel selection using mutual information (MI) maximization. This method ranks the EEG channels, such that the MI between the selected sensors and class labels is maximized. This selection criterion is known to reduce classification error. We employ a computationally efficient approach for MI estimation and EEG channel ranking. This approach is illustrated on EEG data recorded from three subjects performing two mental tasks. Experiment result...
Using EEG signals to estimate cognitive state has drawn increasing attention in recently years, e... more Using EEG signals to estimate cognitive state has drawn increasing attention in recently years, especially in the context of brain-computer interface (BCI) design. However, this goal is extremely difficult because, in addition to the complex relationships between the cognitive state and EEG signals that yields the non-stationarity of the features extracted from EEG signals, there are artefacts introduced by eye blinks and head and body motion. In this paper, we present a classification system, which can estimate the subject's cognitive state from the measured EEG signals. In the proposed system, a mutual information based method is employed to reduce the dimensionality of the features as well as to increase the robustness of the system. A committee of three classifiers was implemented and the majority voting results of the committee are taken to be the final decisions. The results of a preliminary test with data from freely moving subjects performing various tasks as opposed to the strictly controlled experimental setups of BCI provide strong support for this approach.
2014 International Telecommunications Symposium (ITS), 2014
The MIT Lincoln Laboratory submission for the 2004 NIST Speaker Recognition Evaluation (SRE) was ... more The MIT Lincoln Laboratory submission for the 2004 NIST Speaker Recognition Evaluation (SRE) was built upon seven core systems using speaker information from short-term acoustics, pitch and duration prosodic behavior, and phoneme and word usage. These different levels of information were modeled and classified using Gaussian Mixture Models, Support Vector Machines and N-gram language models and were combined using a single layer percepton fuser. The 2004 SRE used a new multilingual , multi-channel speech corpus that provided a challenging speaker detection task for the above systems. In this paper we describe the core systems used and provide an overview of their performance on the 2004 SRE detection tasks.
Recent work has proposed the use of a discrete representation of the dynamics of the fundamental ... more Recent work has proposed the use of a discrete representation of the dynamics of the fundamental frequency and short-term energy temporal trajectories to characterize speaker and/or language information. Since the short-term energy trajectory is affected by several factors, like speaker, phone, and channel information, we propose the use of the temporal trajectories from frequency bands instead of the short-term energy
Feature selection and dimensionality reduction are important steps in pattern recognition. In thi... more Feature selection and dimensionality reduction are important steps in pattern recognition. In this paper, we propose a scheme for feature selection using linear independent component analysis and mutual information maximization method. The method is theoretically motivated by the fact that the classification error rate is related to the mutual information between the feature vectors and the class labels. The feasibility of the principle is illustrated on a synthetic dataset and its performance is demonstrated using EEG signal classification. Experimental results show that this method works well for feature selection.
Augmented cognition is an emerging concept that aims to enhance user performance and cognitive ca... more Augmented cognition is an emerging concept that aims to enhance user performance and cognitive capabilities on the basis of adaptive assistance. An integral part of such systems is the automatic assessment of the instantaneous cognitive state of the user. This paper describes an automatic cognitive state estimation methodology based on the use of EEG measurements with ambulatory users. The required robustness in this context is achieved through the use of a mutual information based dimensionality reduction approach in conjunction with a committee of classifiers, and median filter outlier rejection element. We present classification results associated with cognitive tasks performed in mobile and stationary modalities.
1st Transdisciplinary Conference on Distributed Diagnosis and Home Healthcare, Conference Proceedings, 2006
The continuous assessment of mobility and speed of processing is an important component underlyin... more The continuous assessment of mobility and speed of processing is an important component underlying physical and cognitive functions. We propose a novel approach to measure mobility, e.g. speed of walking and possibly speed of processing by unobtrusive monitoring of elders response times to specific events. The particular application investigated is response times to telephone ring. A key idea put forth in this paper is that if the elders' location distribution is stable over time, response times can be used to assess the "instantaneous" speed of walking. The feasibility of this approach is illustrated using data collected in a study performed by Intel in homes of several subjects.
The current paradigm of clinic-focused healthcare is challenged by growing numbers of aging baby ... more The current paradigm of clinic-focused healthcare is challenged by growing numbers of aging baby boomers and the concomitant cost of managing chronic health conditions. We have begun investigating an alternative to clinic-based health assessments, in which pervasive technologies are used to enable continuous monitoring and assessment of patients in a variety of settings outside of hospitals. There are many outstanding
Lecture Notes in Computer Science, 2010
Despite the availability of several speech corpora that can be used to build automatic speech rec... more Despite the availability of several speech corpora that can be used to build automatic speech recognition systems, there are only a few corpora for the Brazilian Portuguese (BP) language. This lack of corpora does not allow an extensive and deep research on ...
Anais do XXVI Simpósio Brasileiro de Telecomunicações
Speech processing is a data-driven technology that relies on public corpora and associated resour... more Speech processing is a data-driven technology that relies on public corpora and associated resources. In contrast to languages such as English, there are few resources for Brazilian Portuguese (BP). This work describes efforts toward decreasing such gap and presents systems for speech recognition in BP using two public corpora: Spoltech and OGI-22. The following resources are made available: ATK and HTK scripts, pronunciation dictionary, language and acoustic models. The work discusses the baseline results obtained with these resources.
Proceedings Third International Conference on Computational Intelligence and Multimedia Applications. ICCIMA'99 (Cat. No.PR00300)
This paper presents an implementation of a security system for elevators using speaker identifica... more This paper presents an implementation of a security system for elevators using speaker identification. The system uses a model of an artificial neural network called a multi-layer perceptron as a classifier. In this work, some features, such as pitch, formants, perceptual linear prediction coefficients, mel-cepstral coefficients and cepstral coefficients, are used to obtain the best results in the classification process
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2000
Most current state-of-the-art automatic speaker recognition systems extract speaker-dependent fea... more Most current state-of-the-art automatic speaker recognition systems extract speaker-dependent features by looking at shortterm spectral information. This approach ignores long-term information that can convey supra-segmental information, such as prosodics and speaking style. We propose two approaches that use the fundamental frequency and energy trajectories to capture long-term information. The first approach uses bigram models to model the dynamics of the fundamental frequency and energy trajectories for each speaker. The second approach uses the fundamental frequency trajectories of a pre-defined set of words as the speaker templates and then, using dynamic time warping, computes the distance between the templates and the words from the test message. The results presented in this work are on Switchboard I using the NIST Extended Data evaluation design. We show that these approaches can achieve an equal error rate of 3.7%, which is a 77% relative improvement over a system based on short-term pitch and energy features alone.
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2003
DoD (3) IBM (4) ICSI (5) OGI (6) CMU (7) Charles Univ. (8) York Univ. (9) Princeton Univ. (10) Co... more DoD (3) IBM (4) ICSI (5) OGI (6) CMU (7) Charles Univ. (8) York Univ. (9) Princeton Univ. (10) Cornell Univ. • This work is sponsored by the Department of Defense under Air Force Contract F19628-00-C-0002.and the CLSP/JHU workshop was supported by NSF and DoD fudning. Opinions, interpretations, conclusions and recommendations are those of the authors and are not necessarily endorsed by the United States Government + The authors gratefully acknowledge the CLSP group at JHU for organizing and hosting WS2002.
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2005
Modern brain computer interface (BCI) applications use information obtained from the user's e... more Modern brain computer interface (BCI) applications use information obtained from the user's electroencephalogram (EEG) to estimate the mental states. Selecting an optimal subset of the EEG channels instead of using all of them is especially important for ambulatory EEG where the user is mobile due to reduced data communication and computational load requirements. In addition, elimination of irrelevant sensors improves the robustness of the classification system by reducing dimensionality. In this paper, we propose a filter approach for EEG channel selection using mutual information (MI) maximization. This method ranks the EEG channels, such that the MI between the selected sensors and class labels is maximized. This selection criterion is known to reduce classification error. We employ a computationally efficient approach for MI estimation and EEG channel ranking. This approach is illustrated on EEG data recorded from three subjects performing two mental tasks. Experiment result...
Using EEG signals to estimate cognitive state has drawn increasing attention in recently years, e... more Using EEG signals to estimate cognitive state has drawn increasing attention in recently years, especially in the context of brain-computer interface (BCI) design. However, this goal is extremely difficult because, in addition to the complex relationships between the cognitive state and EEG signals that yields the non-stationarity of the features extracted from EEG signals, there are artefacts introduced by eye blinks and head and body motion. In this paper, we present a classification system, which can estimate the subject's cognitive state from the measured EEG signals. In the proposed system, a mutual information based method is employed to reduce the dimensionality of the features as well as to increase the robustness of the system. A committee of three classifiers was implemented and the majority voting results of the committee are taken to be the final decisions. The results of a preliminary test with data from freely moving subjects performing various tasks as opposed to the strictly controlled experimental setups of BCI provide strong support for this approach.
2014 International Telecommunications Symposium (ITS), 2014
The MIT Lincoln Laboratory submission for the 2004 NIST Speaker Recognition Evaluation (SRE) was ... more The MIT Lincoln Laboratory submission for the 2004 NIST Speaker Recognition Evaluation (SRE) was built upon seven core systems using speaker information from short-term acoustics, pitch and duration prosodic behavior, and phoneme and word usage. These different levels of information were modeled and classified using Gaussian Mixture Models, Support Vector Machines and N-gram language models and were combined using a single layer percepton fuser. The 2004 SRE used a new multilingual , multi-channel speech corpus that provided a challenging speaker detection task for the above systems. In this paper we describe the core systems used and provide an overview of their performance on the 2004 SRE detection tasks.
Recent work has proposed the use of a discrete representation of the dynamics of the fundamental ... more Recent work has proposed the use of a discrete representation of the dynamics of the fundamental frequency and short-term energy temporal trajectories to characterize speaker and/or language information. Since the short-term energy trajectory is affected by several factors, like speaker, phone, and channel information, we propose the use of the temporal trajectories from frequency bands instead of the short-term energy
Feature selection and dimensionality reduction are important steps in pattern recognition. In thi... more Feature selection and dimensionality reduction are important steps in pattern recognition. In this paper, we propose a scheme for feature selection using linear independent component analysis and mutual information maximization method. The method is theoretically motivated by the fact that the classification error rate is related to the mutual information between the feature vectors and the class labels. The feasibility of the principle is illustrated on a synthetic dataset and its performance is demonstrated using EEG signal classification. Experimental results show that this method works well for feature selection.
Augmented cognition is an emerging concept that aims to enhance user performance and cognitive ca... more Augmented cognition is an emerging concept that aims to enhance user performance and cognitive capabilities on the basis of adaptive assistance. An integral part of such systems is the automatic assessment of the instantaneous cognitive state of the user. This paper describes an automatic cognitive state estimation methodology based on the use of EEG measurements with ambulatory users. The required robustness in this context is achieved through the use of a mutual information based dimensionality reduction approach in conjunction with a committee of classifiers, and median filter outlier rejection element. We present classification results associated with cognitive tasks performed in mobile and stationary modalities.
1st Transdisciplinary Conference on Distributed Diagnosis and Home Healthcare, Conference Proceedings, 2006
The continuous assessment of mobility and speed of processing is an important component underlyin... more The continuous assessment of mobility and speed of processing is an important component underlying physical and cognitive functions. We propose a novel approach to measure mobility, e.g. speed of walking and possibly speed of processing by unobtrusive monitoring of elders response times to specific events. The particular application investigated is response times to telephone ring. A key idea put forth in this paper is that if the elders' location distribution is stable over time, response times can be used to assess the "instantaneous" speed of walking. The feasibility of this approach is illustrated using data collected in a study performed by Intel in homes of several subjects.
The current paradigm of clinic-focused healthcare is challenged by growing numbers of aging baby ... more The current paradigm of clinic-focused healthcare is challenged by growing numbers of aging baby boomers and the concomitant cost of managing chronic health conditions. We have begun investigating an alternative to clinic-based health assessments, in which pervasive technologies are used to enable continuous monitoring and assessment of patients in a variety of settings outside of hospitals. There are many outstanding
Lecture Notes in Computer Science, 2010
Despite the availability of several speech corpora that can be used to build automatic speech rec... more Despite the availability of several speech corpora that can be used to build automatic speech recognition systems, there are only a few corpora for the Brazilian Portuguese (BP) language. This lack of corpora does not allow an extensive and deep research on ...