cristian segura | INPAHU - Academia.edu (original) (raw)

cristian segura

Uploads

Papers by cristian segura

Research paper thumbnail of Audiovisual event detection towards scene understanding

Computer Vision and Pattern Recognition, 2009

Acoustic events produced in meeting environments may contain useful information for perceptually ... more Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting crosscorrelation among microphone arrays. Second, several video cues obtained from multi-person tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.

Research paper thumbnail of Multimodal Head Orientation Towards Attention Tracking in Smartrooms

International Conference on Acoustics, Speech, and Signal Processing, 2007

This paper presents a multimodal approach to head pose estimation and 3D gaze orientation of indi... more This paper presents a multimodal approach to head pose estimation and 3D gaze orientation of individuals in a SmartRoom environment equipped with multiple cameras and microphones. We first introduce the two monomodal approaches as reference. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Two multimodal information fusion schemes working at data and decision levels are analyzed in terms of accuracy and robustness of the estimation. Experimental results conducted over the CLEAR evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.

Research paper thumbnail of Multimodal real-time focus of attention estimation in SmartRooms

Computer Vision and Pattern Recognition, 2008

This paper presents an overview of our work on real-time multimodal tracking focus of attention o... more This paper presents an overview of our work on real-time multimodal tracking focus of attention of multiple persons in a SmartRoom scenario. Redundancy among cameras is exploited to generate a 3D discrete reconstruction of the space. This information is fed to a novel low complexity Monte Carlo based tracking scheme. Estimated locations of people in the room are used to automatically determine their head positions. Head orientation of every person is computed using video and audio separately and then a multimodal estimation is produced by combining data at feature level employing a decentralized Kalman filter. Finally, participantspsila focus attention is estimated by means of two geometric descriptors: the attention cone and the attention map. Experiments conducted over annotated databases yield quantitative results proving the effectiveness of the presented approach.

Research paper thumbnail of Audiovisual event detection towards scene understanding

Computer Vision and Pattern Recognition, 2009

Acoustic events produced in meeting environments may contain useful information for perceptually ... more Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting crosscorrelation among microphone arrays. Second, several video cues obtained from multi-person tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.

Research paper thumbnail of Multimodal Head Orientation Towards Attention Tracking in Smartrooms

International Conference on Acoustics, Speech, and Signal Processing, 2007

This paper presents a multimodal approach to head pose estimation and 3D gaze orientation of indi... more This paper presents a multimodal approach to head pose estimation and 3D gaze orientation of individuals in a SmartRoom environment equipped with multiple cameras and microphones. We first introduce the two monomodal approaches as reference. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Two multimodal information fusion schemes working at data and decision levels are analyzed in terms of accuracy and robustness of the estimation. Experimental results conducted over the CLEAR evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.

Research paper thumbnail of Multimodal real-time focus of attention estimation in SmartRooms

Computer Vision and Pattern Recognition, 2008

This paper presents an overview of our work on real-time multimodal tracking focus of attention o... more This paper presents an overview of our work on real-time multimodal tracking focus of attention of multiple persons in a SmartRoom scenario. Redundancy among cameras is exploited to generate a 3D discrete reconstruction of the space. This information is fed to a novel low complexity Monte Carlo based tracking scheme. Estimated locations of people in the room are used to automatically determine their head positions. Head orientation of every person is computed using video and audio separately and then a multimodal estimation is produced by combining data at feature level employing a decentralized Kalman filter. Finally, participantspsila focus attention is estimated by means of two geometric descriptors: the attention cone and the attention map. Experiments conducted over annotated databases yield quantitative results proving the effectiveness of the presented approach.

Log In