Kiyohiro Shikano - Academia.edu (original) (raw)
Uploads
Papers by Kiyohiro Shikano
Interspeech, 1997
Bookmarks Related papers MentionsView impact
Interspeech 2007
Bookmarks Related papers MentionsView impact
International Journal of Pattern Recognition and Artificial Intelligence
This paper describes a phonetic typewriter and a dictation machine that utilize the underlying st... more This paper describes a phonetic typewriter and a dictation machine that utilize the underlying statistical structure of phoneme or character sequences. The approach of using syllable or character trigrams is applied to language source modeling. The language source models are obtained by calculating trigram probabilities from a large text database. These models are combined with the HMM-LR continuous speech recognition system.3,6 The phonetic typewriter is tested using 274 phrases uttered by one male speaker. The syllable source model achieves a 94.9% phoneme recognition rate with the test-set phoneme perplexity of 3.9. Without the syllable source model, the phoneme recognition rate is only 73.2%. A trigram model based on characters is also evaluated. This character source model can reduce the syllable perplexity significantly to 7.7, compared with 10.5 of the syllable source model. The character source model achieves a 78.5% character transcription rate for the 274 phrase utterances...
Bookmarks Related papers MentionsView impact
2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628), 2002
Bookmarks Related papers MentionsView impact
6th International Conference on Signal Processing, 2002., 2000
Bookmarks Related papers MentionsView impact
Ieee Workshop on Applications of Signal Processing to Audio and Acoustics 2005, Oct 1, 2005
Bookmarks Related papers MentionsView impact
Systems and Computers in Japan, May 1, 2003
ABSTRACT In this paper, a method is investigated for using an array of microphones to capture a h... more ABSTRACT In this paper, a method is investigated for using an array of microphones to capture a high-quality recording of voice under reverberation. In the past, multiple beamforming has been proposed by Flanagan's group as a method for efficiently suppressing reverberation. However, because this multiple beamforming method computes the arrival positions of the sound and its reflections based on the prior knowledge of the shape of the room and the position of the target sound source, it is not suited to practical applications. In this paper, a method of multiple beamforming is proposed in which the source positions of the sound and its reflections are localized without using prior knowledge of the shape of the room or the position of the target sound source. The effectiveness of the proposed method was evaluated in a simulation experiment using the image method and in an experiment in an actual hallway environment. First, an experiment was run to localize the positions of the sources of the sound and its reflections under a condition in which both the shape of the room and the source position of the sound were unknown. As the result, it was found that the localization error for the source position of the sound was small. Furthermore, although the localization errors for the source positions of the reflections were large, the localized reflection paths were found to be approximately equal to the true reflection paths. Next, the output signal from the proposed method was evaluated based on SNR. The result demonstrated that multiple beamforming is more effective than single beamforming. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(5): 69–80, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.1204
Bookmarks Related papers MentionsView impact
ABSTRACT: This paper aims to examine suitability of the marginal statistics based contrast functi... more ABSTRACT: This paper aims to examine suitability of the marginal statistics based contrast function eg negentropy for the separation of convolutive speech mixtures picked up by a linear microphone array. For this study we choose our frequency domain fixed-point ICA algorithm, based on ...
Bookmarks Related papers MentionsView impact
Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences, Mar 1, 2003
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Acoustics Speech and Signal Processing 1988 Icassp 88 1988 International Conference on, Apr 15, 2007
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
ABSTRACT This paper describes a fixed-point independent component analysis (ICA) algorithm in com... more ABSTRACT This paper describes a fixed-point independent component analysis (ICA) algorithm in combination with the null beamforming technique to sieve out speech signals from their convoluted mixture observed using a linear microphone array. The fixed-point algorithm shows fast convergence to the solution, however it is highly sensitive to the initial value from which iteration starts. A good initial value leads to faster convergence and yields better results. We propose the use of a null beamformer-based initial value for iteration and explore its effects on separation performance under different acoustic conditions by examining the noise reduction rate (NRR) and convergence speed. The result of the simulation confirms the efficacy and accuracy of the proposed algorithm.
Bookmarks Related papers MentionsView impact
Interspeech, 2007
Bookmarks Related papers MentionsView impact
Proceedings Apsipa Asc 2009 Asia Pacific Signal and Information Processing Association 2009 Annual Summit and Conference, Oct 4, 2009
Bookmarks Related papers MentionsView impact
Interspeech, 2003
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Interspeech, 1997
Bookmarks Related papers MentionsView impact
Interspeech 2007
Bookmarks Related papers MentionsView impact
International Journal of Pattern Recognition and Artificial Intelligence
This paper describes a phonetic typewriter and a dictation machine that utilize the underlying st... more This paper describes a phonetic typewriter and a dictation machine that utilize the underlying statistical structure of phoneme or character sequences. The approach of using syllable or character trigrams is applied to language source modeling. The language source models are obtained by calculating trigram probabilities from a large text database. These models are combined with the HMM-LR continuous speech recognition system.3,6 The phonetic typewriter is tested using 274 phrases uttered by one male speaker. The syllable source model achieves a 94.9% phoneme recognition rate with the test-set phoneme perplexity of 3.9. Without the syllable source model, the phoneme recognition rate is only 73.2%. A trigram model based on characters is also evaluated. This character source model can reduce the syllable perplexity significantly to 7.7, compared with 10.5 of the syllable source model. The character source model achieves a 78.5% character transcription rate for the 274 phrase utterances...
Bookmarks Related papers MentionsView impact
2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628), 2002
Bookmarks Related papers MentionsView impact
6th International Conference on Signal Processing, 2002., 2000
Bookmarks Related papers MentionsView impact
Ieee Workshop on Applications of Signal Processing to Audio and Acoustics 2005, Oct 1, 2005
Bookmarks Related papers MentionsView impact
Systems and Computers in Japan, May 1, 2003
ABSTRACT In this paper, a method is investigated for using an array of microphones to capture a h... more ABSTRACT In this paper, a method is investigated for using an array of microphones to capture a high-quality recording of voice under reverberation. In the past, multiple beamforming has been proposed by Flanagan's group as a method for efficiently suppressing reverberation. However, because this multiple beamforming method computes the arrival positions of the sound and its reflections based on the prior knowledge of the shape of the room and the position of the target sound source, it is not suited to practical applications. In this paper, a method of multiple beamforming is proposed in which the source positions of the sound and its reflections are localized without using prior knowledge of the shape of the room or the position of the target sound source. The effectiveness of the proposed method was evaluated in a simulation experiment using the image method and in an experiment in an actual hallway environment. First, an experiment was run to localize the positions of the sources of the sound and its reflections under a condition in which both the shape of the room and the source position of the sound were unknown. As the result, it was found that the localization error for the source position of the sound was small. Furthermore, although the localization errors for the source positions of the reflections were large, the localized reflection paths were found to be approximately equal to the true reflection paths. Next, the output signal from the proposed method was evaluated based on SNR. The result demonstrated that multiple beamforming is more effective than single beamforming. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(5): 69–80, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.1204
Bookmarks Related papers MentionsView impact
ABSTRACT: This paper aims to examine suitability of the marginal statistics based contrast functi... more ABSTRACT: This paper aims to examine suitability of the marginal statistics based contrast function eg negentropy for the separation of convolutive speech mixtures picked up by a linear microphone array. For this study we choose our frequency domain fixed-point ICA algorithm, based on ...
Bookmarks Related papers MentionsView impact
Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences, Mar 1, 2003
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Acoustics Speech and Signal Processing 1988 Icassp 88 1988 International Conference on, Apr 15, 2007
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
ABSTRACT This paper describes a fixed-point independent component analysis (ICA) algorithm in com... more ABSTRACT This paper describes a fixed-point independent component analysis (ICA) algorithm in combination with the null beamforming technique to sieve out speech signals from their convoluted mixture observed using a linear microphone array. The fixed-point algorithm shows fast convergence to the solution, however it is highly sensitive to the initial value from which iteration starts. A good initial value leads to faster convergence and yields better results. We propose the use of a null beamformer-based initial value for iteration and explore its effects on separation performance under different acoustic conditions by examining the noise reduction rate (NRR) and convergence speed. The result of the simulation confirms the efficacy and accuracy of the proposed algorithm.
Bookmarks Related papers MentionsView impact
Interspeech, 2007
Bookmarks Related papers MentionsView impact
Proceedings Apsipa Asc 2009 Asia Pacific Signal and Information Processing Association 2009 Annual Summit and Conference, Oct 4, 2009
Bookmarks Related papers MentionsView impact
Interspeech, 2003
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact