Benoît Maison - Academia.edu (original) (raw)

Papers by Benoît Maison

Research paper thumbnail of UVLC2-a new universal VLC scheme for video coding

Proceedings of 1994 IEEE International Symposium on Information Theory, 1994

Research paper thumbnail of Toward island-of-reliability-driven very-large-vocabulary on-line handwriting recognition using character confidence scoring

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001

We explore a novel approach for handwriting recognition tasks whose intrinsic vocabularies are to... more We explore a novel approach for handwriting recognition tasks whose intrinsic vocabularies are too large to be applied directly as constraints during recognition. Our approach makes use of vocabulary constraints, and addresses the issue that some parts of words may be written more recognizably than others. An initial pass is made with an HMM recognizer, without vocabulary constraints, generating a lattice of character-hypothesis arcs representing likely segmentations of the handwriting signal. Arc confidence scores are computed using a posteriori probabilities. The most confidently recognized characters are used to filter the overall vocabulary, generating a word subset manageable for constraining a second recognition pass. With a vocabulary of 273000 words, we can limit to 50000 words in the second pass and eliminate 39.3% of the word errors made by a one-pass recognizer without vocabulary constraints, and 18.3% of errors made using a fixed 30000-word set

Research paper thumbnail of Tests préliminaires pour le codage d'images désentrelacées

Research paper thumbnail of UVLC2: A new universal VLC for video coding

Research paper thumbnail of Method and device for identifying and extracting images of multiple users, and for recognizing user gestures

Research paper thumbnail of <title>A point-based tele-immersion system: from acquisition to stereoscopic display</title>

Stereoscopic Displays and Virtual Reality Systems XIV, 2007

We present a point based reconstruction and transmission pipeline for a collaborative tele-immers... more We present a point based reconstruction and transmission pipeline for a collaborative tele-immersion system. Two or more users in different locations collaborate with each other in a shared, simulated environment as if they were in the same physical room. Each user perceives point-based models of distant users along with collaborative data like molecule models. Disparity maps, computed by a commercial stereo solution, are filtered and transformed into clouds of 3D points. The clouds are compressed and transmitted over the network to distant users. At the other side the clouds are decompressed and incorporated into the 3D scene. The viewpoint used to display the 3D scene is dependent on the position of the head of the user. Collaborative data is manipulated through natural hand gestures. We analyse the performance of the system in terms of computation time, latency and photo realistic quality of the reconstructed models.

Research paper thumbnail of Automatic baseform generation from acoustic data

Eighth European Conference on Speech …, 2003

... version of Yvon&amp;#x27;s “overlapping chunks” [4]. Our objective is to allow any phonem... more ... version of Yvon&amp;#x27;s “overlapping chunks” [4]. Our objective is to allow any phoneme sequence that is ... Those languages are: Spanish, Italian, French, German, Mandarin, Hindi, Czech and Russian. ... 3.1. Multiple Alignments We represent a multiple alignment as an array Φ of N rows ...

Research paper thumbnail of Audio-visual speaker recognition for video broadcast news: some fusion techniques

Abstract Audio-based speaker identification degrades severely when there is a mismatch between tr... more Abstract Audio-based speaker identification degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identification with audio-based speaker ...

Research paper thumbnail of Using Place Name Data to Train Language Identification Models

Eighth European Conference on Speech …, 2003

Page 1. Using Place Name Data to Train Language Identification Models Stanley F. Chen, Benoıt Mai... more Page 1. Using Place Name Data to Train Language Identification Models Stanley F. Chen, Benoıt Maison IBM TJ Watson Research Center PO Box 218, Yorktown Heights, NY 10598 {stanchen,bmaison}@us.ibm.com Abstract ...

Research paper thumbnail of Pronunciation Modeling for Names of Foreign Origin

… Speech Recognition and …, 2004

Page 1. PRONUNCIATION MODELING FOR NAMES OF FORElGN ORIGIN Benoit Maison, Stanley E Chen and Paul... more Page 1. PRONUNCIATION MODELING FOR NAMES OF FORElGN ORIGIN Benoit Maison, Stanley E Chen and Paul S. Cohen IBM TJ Watson Research Center PO Box 218, Yorktown Heights, NY 10598, USA {bmaison,stanchen}@us.ibm.com, pausyl@aol.com ...

Research paper thumbnail of Joint processing of audio and visual information for multimedia indexing and human-computer interaction

ABSTRACT Information fusion in the context of combining multiple streams of data e.g., audio stre... more ABSTRACT Information fusion in the context of combining multiple streams of data e.g., audio streams and video streams corresponding to the same perceptual process is considered in a somewhat generalized setting. Specifically, we consider the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of descriptors e.g., speech recognition/transcription, speaker change detection, speaker identification and speaker event detection. These happen to be important descriptors for multimedia content (video) for efficient search and retrieval. A general framework for considering all of these fusion problems in a unified setting is considered.

Research paper thumbnail of Automatic generation and selection of multiple pronunciations for dynamic vocabularies

Research paper thumbnail of Natural error handling in speech recognition

Research paper thumbnail of Robust confidence annotation and rejection for continuous speech recognition

Acoustics, Speech, and Signal …, 2001

We are looking for confidence scoring techniques that perform well on a broad variety of tasks. O... more We are looking for confidence scoring techniques that perform well on a broad variety of tasks. Our main focus is on word-level error rejection, but most results apply to other scenarios as well. A variation of the normalized cross entropy that is adapted to that ...

Research paper thumbnail of IBM’s 10x Real-time Broadcast News Transcription System Used in the 1999 Hub4 Evaluation

Proc. DARPA Speech Transcription Workshop, 2000

We describe the system used by IBM in the 1999 HUB4 Evaluation under the 10 times real-time const... more We describe the system used by IBM in the 1999 HUB4 Evaluation under the 10 times real-time constraint. We detail the system architecture and show that the performance of this system is over 20 percent more accurate at the same speed than the system used in the 1998 Evaluation. Furthermore, we have closed the gap between our unlimited resource system and our 10 times real time system from 45 percent to 14 percent.

Research paper thumbnail of Auditory-Visual Speech Processing (AVSP'99)

Research paper thumbnail of An adaptive transform approach for image compression

1996 IEEE Digital Signal Processing Workshop Proceedings, 1996

A new transform/subband coding algorithm is proposed. Its original characteristic is that the tra... more A new transform/subband coding algorithm is proposed. Its original characteristic is that the transform operator is continuously updated during the encoding process on the base of previously decoded samples. Unlike other content based algorithms, no data overhead is implied and the method can be easily incorporated into most image compression schemes. Simulation results demonstrate that a substantial benefit can be

Research paper thumbnail of MMSE design of polyphase components for generalized interpolators

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing, 1994

Research paper thumbnail of Coding of deinterlaced image sequences

Proceedings of 1st International Conference on Image Processing, 1994

Research paper thumbnail of Spatial scalability-interlace-to-interlace up/down conversion

Research paper thumbnail of UVLC2-a new universal VLC scheme for video coding

Proceedings of 1994 IEEE International Symposium on Information Theory, 1994

Research paper thumbnail of Toward island-of-reliability-driven very-large-vocabulary on-line handwriting recognition using character confidence scoring

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001

We explore a novel approach for handwriting recognition tasks whose intrinsic vocabularies are to... more We explore a novel approach for handwriting recognition tasks whose intrinsic vocabularies are too large to be applied directly as constraints during recognition. Our approach makes use of vocabulary constraints, and addresses the issue that some parts of words may be written more recognizably than others. An initial pass is made with an HMM recognizer, without vocabulary constraints, generating a lattice of character-hypothesis arcs representing likely segmentations of the handwriting signal. Arc confidence scores are computed using a posteriori probabilities. The most confidently recognized characters are used to filter the overall vocabulary, generating a word subset manageable for constraining a second recognition pass. With a vocabulary of 273000 words, we can limit to 50000 words in the second pass and eliminate 39.3% of the word errors made by a one-pass recognizer without vocabulary constraints, and 18.3% of errors made using a fixed 30000-word set

Research paper thumbnail of Tests préliminaires pour le codage d'images désentrelacées

Research paper thumbnail of UVLC2: A new universal VLC for video coding

Research paper thumbnail of Method and device for identifying and extracting images of multiple users, and for recognizing user gestures

Research paper thumbnail of <title>A point-based tele-immersion system: from acquisition to stereoscopic display</title>

Stereoscopic Displays and Virtual Reality Systems XIV, 2007

We present a point based reconstruction and transmission pipeline for a collaborative tele-immers... more We present a point based reconstruction and transmission pipeline for a collaborative tele-immersion system. Two or more users in different locations collaborate with each other in a shared, simulated environment as if they were in the same physical room. Each user perceives point-based models of distant users along with collaborative data like molecule models. Disparity maps, computed by a commercial stereo solution, are filtered and transformed into clouds of 3D points. The clouds are compressed and transmitted over the network to distant users. At the other side the clouds are decompressed and incorporated into the 3D scene. The viewpoint used to display the 3D scene is dependent on the position of the head of the user. Collaborative data is manipulated through natural hand gestures. We analyse the performance of the system in terms of computation time, latency and photo realistic quality of the reconstructed models.

Research paper thumbnail of Automatic baseform generation from acoustic data

Eighth European Conference on Speech …, 2003

... version of Yvon&amp;#x27;s “overlapping chunks” [4]. Our objective is to allow any phonem... more ... version of Yvon&amp;#x27;s “overlapping chunks” [4]. Our objective is to allow any phoneme sequence that is ... Those languages are: Spanish, Italian, French, German, Mandarin, Hindi, Czech and Russian. ... 3.1. Multiple Alignments We represent a multiple alignment as an array Φ of N rows ...

Research paper thumbnail of Audio-visual speaker recognition for video broadcast news: some fusion techniques

Abstract Audio-based speaker identification degrades severely when there is a mismatch between tr... more Abstract Audio-based speaker identification degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identification with audio-based speaker ...

Research paper thumbnail of Using Place Name Data to Train Language Identification Models

Eighth European Conference on Speech …, 2003

Page 1. Using Place Name Data to Train Language Identification Models Stanley F. Chen, Benoıt Mai... more Page 1. Using Place Name Data to Train Language Identification Models Stanley F. Chen, Benoıt Maison IBM TJ Watson Research Center PO Box 218, Yorktown Heights, NY 10598 {stanchen,bmaison}@us.ibm.com Abstract ...

Research paper thumbnail of Pronunciation Modeling for Names of Foreign Origin

… Speech Recognition and …, 2004

Page 1. PRONUNCIATION MODELING FOR NAMES OF FORElGN ORIGIN Benoit Maison, Stanley E Chen and Paul... more Page 1. PRONUNCIATION MODELING FOR NAMES OF FORElGN ORIGIN Benoit Maison, Stanley E Chen and Paul S. Cohen IBM TJ Watson Research Center PO Box 218, Yorktown Heights, NY 10598, USA {bmaison,stanchen}@us.ibm.com, pausyl@aol.com ...

Research paper thumbnail of Joint processing of audio and visual information for multimedia indexing and human-computer interaction

ABSTRACT Information fusion in the context of combining multiple streams of data e.g., audio stre... more ABSTRACT Information fusion in the context of combining multiple streams of data e.g., audio streams and video streams corresponding to the same perceptual process is considered in a somewhat generalized setting. Specifically, we consider the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of descriptors e.g., speech recognition/transcription, speaker change detection, speaker identification and speaker event detection. These happen to be important descriptors for multimedia content (video) for efficient search and retrieval. A general framework for considering all of these fusion problems in a unified setting is considered.

Research paper thumbnail of Automatic generation and selection of multiple pronunciations for dynamic vocabularies

Research paper thumbnail of Natural error handling in speech recognition

Research paper thumbnail of Robust confidence annotation and rejection for continuous speech recognition

Acoustics, Speech, and Signal …, 2001

We are looking for confidence scoring techniques that perform well on a broad variety of tasks. O... more We are looking for confidence scoring techniques that perform well on a broad variety of tasks. Our main focus is on word-level error rejection, but most results apply to other scenarios as well. A variation of the normalized cross entropy that is adapted to that ...

Research paper thumbnail of IBM’s 10x Real-time Broadcast News Transcription System Used in the 1999 Hub4 Evaluation

Proc. DARPA Speech Transcription Workshop, 2000

We describe the system used by IBM in the 1999 HUB4 Evaluation under the 10 times real-time const... more We describe the system used by IBM in the 1999 HUB4 Evaluation under the 10 times real-time constraint. We detail the system architecture and show that the performance of this system is over 20 percent more accurate at the same speed than the system used in the 1998 Evaluation. Furthermore, we have closed the gap between our unlimited resource system and our 10 times real time system from 45 percent to 14 percent.

Research paper thumbnail of Auditory-Visual Speech Processing (AVSP'99)

Research paper thumbnail of An adaptive transform approach for image compression

1996 IEEE Digital Signal Processing Workshop Proceedings, 1996

A new transform/subband coding algorithm is proposed. Its original characteristic is that the tra... more A new transform/subband coding algorithm is proposed. Its original characteristic is that the transform operator is continuously updated during the encoding process on the base of previously decoded samples. Unlike other content based algorithms, no data overhead is implied and the method can be easily incorporated into most image compression schemes. Simulation results demonstrate that a substantial benefit can be

Research paper thumbnail of MMSE design of polyphase components for generalized interpolators

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing, 1994

Research paper thumbnail of Coding of deinterlaced image sequences

Proceedings of 1st International Conference on Image Processing, 1994

Research paper thumbnail of Spatial scalability-interlace-to-interlace up/down conversion