Maori Kobayashi - Academia.edu (original) (raw)

Papers by Maori Kobayashi

Research paper thumbnail of Methods for improving word intelligibility of bone-conducted speech by using bone-conduction headphones

Research paper thumbnail of Study on Relations between Emotion Perception and Acoustic Features using Speech Morphing Techniques

In order to investigate what acoustic features are important to emotional impressions and how tho... more In order to investigate what acoustic features are important to emotional impressions and how those features relate to emotion perception, we interpolate voices from pairs of typical emotions with a morphing method, collect emotion scores on Arousal-Valence space by a listening test, and analyze how acoustic features relate to the evaluations. The results show that Arousal perception can be stably described by merely using fundamental frequency (F0). In contrast, although this research found that F0 and formants can fit Valence scores, how acoustic features correspond to Valence perception vary with different morphing references. Furthermore, the results show that modification rules of different formant components are necessary for the voice conversion system with better Valence control

Research paper thumbnail of Method for improving the word intelligibility of presented speech using bone-conduction headphones

Research paper thumbnail of 聴触覚相互作用が音像の距離弁別精度に与える影響(<特集>VR心理学6)

Transactions of the Virtual Reality Society of Japan, 2016

Research paper thumbnail of The cross- modal effect of perceptual organization of sounds on the visual target detection

Research paper thumbnail of 3次元音場再現システムでのコンテンツ選択における指針の提案

Transactions of the Virtual Reality Society of Japan, 2014

Research paper thumbnail of Voice Modification for Announcement of Evacuation Guidance in Noisy Reverberant Environments

Research paper thumbnail of Study on method for protecting speech privacy by actively controlling speech transmission index in simulated room

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017

Protecting speech privacy in a specific room is an important challenge in room acoustics. However... more Protecting speech privacy in a specific room is an important challenge in room acoustics. However, protecting people's conversation from being overheard by an unintended listener, that is, making them not understandable, is difficult. This paper proposes a method for protecting speech privacy by actively controlling the speech transmission index (STI) in a simulated room containing an unintended listener. In this method, the STI in the simulated room can be controlled by manipulating the parameters of the simulated room impulse response (RIR). We can control the STI by convolving speech with the simulated RIR because the presentation of speech and additive delayed-manipulated speech can be regarded as the convolution of speech with late reverberation in the simulated room. Three experiments (world intelligibility, listening difficulty, and annoyance tests) were conducted to compare the proposed method with two conventional methods (noise masking and reverberation). The results showed that speech privacy can be protected by controlling STI derived by manipulating the simulated RIR. The results also showed that the proposed method can protect the privacy of conversations as effectively as those other methods can by using lower noise levels and shorter reverberation.

Research paper thumbnail of The effect of aoudiory stimuli on the visual target detection task

Research paper thumbnail of Following instructions, risk perception of evacuation calling voice and placement of emotional VAD space

Research paper thumbnail of Study on Perception of Speaker Age by Semantic Differential Method

Humans can perceive ages of speakers from uttered voices by their own judgements. The perceived a... more Humans can perceive ages of speakers from uttered voices by their own judgements. The perceived ages are called perceptual ages (PAs). Many earlier studies focused on statistical correlations between aging voices and acoustic features without taking into account the fact that human perception is vague rather than precise [1]. This paper focuses on the psychological factors to study human perceptions for aging voices. An experiment was carried out to evaluate the aging voices by candidates of semantic primitives, and the results of the listening test were analyzed by Semantic Differential Method and Regression Analysis to investigate impressions that human use to estimate PAs of speakers. Results show that with regards to both male and female voices, the Metal Factor (Deep-Flimsy, Full-Delicate, Rich-Thin, Heavy-Light), which shows a linear relation with both male and female PAs, is the most important factor that helps listeners judge PAs of uttered voices. In addition, the rest of the factors show both linear and non-linear relationships with male aging voices, while only non-linear relations with female aging voices.

Research paper thumbnail of Research and Development of Intelligent Speech Presentation System for Evacuation Guidance that can Reliably Convey Information Necessary for Disasters by Speech

Research paper thumbnail of Presence of Three-Dimensional Sound Field Facilitates Listeners’ Mood, Felt Emotion, and Respiration Rate When Listening to Music

Frontiers in Psychology, 2021

Many studies have investigated the effects of music listening from the viewpoint of music feature... more Many studies have investigated the effects of music listening from the viewpoint of music features such as tempo or key by measuring psychological or psychophysiological responses. In addition, technologies for three-dimensional sound field (3D-SF) reproduction and binaural recording have been developed to induce a realistic sensation of sound. However, it is still unclear whether music listened to in the presence of 3D-SF is more impressive than in the absence of it. We hypothesized that the presence of a 3D-SF when listening to music facilitates listeners’ moods, emotions for music, and physiological activities such as respiration rate. Here, we examined this hypothesis by evaluating differences between a reproduction condition with headphones (HD condition) and one with a 3D-SF reproduction system (3D-SF condition). We used a 3D-SF reproduction system based on the boundary surface control principle (BoSC system) to reproduce a sound field of music in the 3D-SF condition. Music in...

Research paper thumbnail of Acoustic features in speech for emergency perception

The Journal of the Acoustical Society of America, 2018

Previous studies have reported that the acoustic features such as the speech rate, fundamental fr... more Previous studies have reported that the acoustic features such as the speech rate, fundamental frequency (F0), amplitude, and voice gender are related to emergency perception in speech. However, the most critical factor influencing the emergency perception in speech remains unknown. In this study, we compared influences of three acoustic features (speech rate, F0, and spectral sequence (amplitude)) to determine the acoustic feature that has the most influence on emergency perception in speech. Prior to conducting our experiments, we selected five speech phrases with different level of perceived emergency among various speech phrase spoken by TV casters during real emergencies. We then created synthesized voices by replacing three acoustic features separately among the selected five voices. In experiment 1, we presented these synthesized voices to 10 participants and asked them to evaluate levels of the perceived emergency of each voice by the magnitude estimation method. The results from experiment 1 showed that F0 was most influential on emergency perception. In experiment 2, we examined influences of the three acoustic features on auditory impression related to the perceived emergency by the SD method. The results suggested that emotional effects of some words such as “tense” or/and “rush” were influenced by the fundamental frequency.Previous studies have reported that the acoustic features such as the speech rate, fundamental frequency (F0), amplitude, and voice gender are related to emergency perception in speech. However, the most critical factor influencing the emergency perception in speech remains unknown. In this study, we compared influences of three acoustic features (speech rate, F0, and spectral sequence (amplitude)) to determine the acoustic feature that has the most influence on emergency perception in speech. Prior to conducting our experiments, we selected five speech phrases with different level of perceived emergency among various speech phrase spoken by TV casters during real emergencies. We then created synthesized voices by replacing three acoustic features separately among the selected five voices. In experiment 1, we presented these synthesized voices to 10 participants and asked them to evaluate levels of the perceived emergency of each voice by the magnitude estimation method. The results from experiment 1 show...

Research paper thumbnail of The Effect of Irrelevant Sounds on the Auditory Continuity Illusion

Proceedings of Fechner Day, 2007

The effect of irrelevant sounds on the auditory continuity illusion was examined. Listeners judge... more The effect of irrelevant sounds on the auditory continuity illusion was examined. Listeners judged whether a tone (inducee) that was repeatedly alternated with a band-pass noise (in- ducer) was continuous or discontinuous. A sequence of irrelevant sounds, that is, tone pips at a remote frequency from the inducee, increased the limit of illusory continuity in terms of maximum inducee level when the irrelevant sounds were synchronized with the onsets of the inducers. The effect of the irrelevant sounds depends on the timing relationship between the irrelevant sounds and the inducers. These results suggest that illusory continuity is not fully determined by local, pre-attentive processing in the auditory system.

Research paper thumbnail of The Effects of Spatialized Sounds on the Sense of Presence in Auditory Virtual Environments: A Psychological and Physiological Study

Presence: Teleoperators and Virtual Environments, 2015

Although many studies have indicated that spatialized sounds increase the subjective sense of pre... more Although many studies have indicated that spatialized sounds increase the subjective sense of presence in virtual environments, few studies have examined the effects of sounds objectively. In this study, we examined whether three-dimensional reproduced sounds increase the sense of presence in auditory virtual environments by using physiological and psychological measures. We presented the sounds of people approaching the listener through a three-dimensional reproduction system using 96 loudspeakers. There were two spatial sound conditions, spatialized and non-spatialized, which had different spatial accuracy of the reproduction. The experimental results showed that presence ratings for spatialized sounds were greater than for non-spatialized sounds. Further, the results of the physiological measures showed that the sympathetic nervous system was activated to a greater extent by the spatialized sounds compared with the non-spatialized sounds, and the responses to the three-dimensiona...

Research paper thumbnail of Consideration of Effective Acoustic Rendering of Spatialized Ambient Sound

Interdisciplinary Information Sciences, 2012

Three-dimensional sound auralization systems have been developed actively in the last few decades... more Three-dimensional sound auralization systems have been developed actively in the last few decades. Such systems are called virtual auditory displays (VADs). In conventional VADs based on head-related transfer functions (HRTFs), a sound source position alone is rendered by disregarding other acoustical phenomena. However, because various sounds surround us in our daily life, we usually hear not only a targeted direct sound but also ambient sounds in an actual sound space. A lack of ambient sound often engenders an unnatural perception of the virtual auditory space presented by VAD based on HRTFs. Therefore, ambient sounds should be included in the VAD system auralization. We investigated an effective rendering method of ambient sound using ordinary colored noise. Furthermore, using subjective evaluations, we discuss the relation between the realism of sound space with ambient sounds and a listener's head movement.

Research paper thumbnail of Crossmodal Contingent Aftereffect

i-Perception, 2011

Sounds containing no motion or positional cues could induce illusory visual motion perception for... more Sounds containing no motion or positional cues could induce illusory visual motion perception for static visual stimuli. Two identical visual stimuli placed side by side were presented in alternation producing apparent motion perception and each stimulus was accompanied by a tone burst of a specific and unique frequency. After prolonged exposure to the apparent motion, the tones acquired driving effects for motion perception; a visual stimulus blinking at a fixed location was perceived as lateral motion. The effect lasted at least for a few days and was only observed at the retinal position that was previously exposed to apparent motion with the tones. Furthermore, the effect was specific to ear and sound frequency presented in the exposure period. These results indicate that strong association between visual motion and sound sequence is easily formed within a short period and that very early stages of sensory processing might be responsive loci for the current phenomenon.

Research paper thumbnail of Sound field reproduction and sharing system based on the boundary surface control principle

Acoustical Science and Technology, 2015

In this paper, we introduce a newly developed sound-field-reproducing and-sharing system. The sys... more In this paper, we introduce a newly developed sound-field-reproducing and-sharing system. The system consists of an 80-channel fullerene-shaped microphone array and a 96-channel loudspeaker array mounted in an enclosure called a sound cask, so named because of its shape. The cask has two functions. First, it functions as a precise sound field reproduction system. The sound signals acquired from a microphone array in any sound field can be reproduced in the sound cask after passing through filters that modify the amplitude and phase on the basis of the boundary surface control principle. The large number of loudspeakers result in the precise orientation and depth of sound images. Second, it functions as a platform for a sound-field-sharing system. Several casks located remotely can appear to exist in the same sound field for subjects inside a cask. In addition, the cask is large enough for one to be able to play a musical instrument inside it. The musical sound or voices produced by subjects can be shared by subjects in a distant cask after convoluting the impulse responses of the original sound field. The concept of the system is explained in detail.

Research paper thumbnail of Indiscriminable sounds determine the direction of visual motion

Scientific reports, 2012

On cross-modal interactions, top-down controls such as attention and explicit identification of c... more On cross-modal interactions, top-down controls such as attention and explicit identification of cross-modal inputs were assumed to play crucial roles for the optimization. Here we show the establishment of cross-modal associations without such top-down controls. The onsets of two circles producing apparent motion perception were accompanied by indiscriminable sounds consisting of six identical and one unique sound frequencies. After adaptation to the visual apparent motion with the sounds, the sounds acquired a driving effect for illusory visual apparent motion perception. Moreover, the pure tones with each unique frequency of the sounds acquired the same effect after the adaptation, indicating that the difference in the indiscriminable sounds was implicitly coded. We further confrimed that the aftereffect didnot transfer between eyes. These results suggest that the brain establishes new neural representations between sound frequency and visual motion without clear identification of...

Research paper thumbnail of Methods for improving word intelligibility of bone-conducted speech by using bone-conduction headphones

Research paper thumbnail of Study on Relations between Emotion Perception and Acoustic Features using Speech Morphing Techniques

In order to investigate what acoustic features are important to emotional impressions and how tho... more In order to investigate what acoustic features are important to emotional impressions and how those features relate to emotion perception, we interpolate voices from pairs of typical emotions with a morphing method, collect emotion scores on Arousal-Valence space by a listening test, and analyze how acoustic features relate to the evaluations. The results show that Arousal perception can be stably described by merely using fundamental frequency (F0). In contrast, although this research found that F0 and formants can fit Valence scores, how acoustic features correspond to Valence perception vary with different morphing references. Furthermore, the results show that modification rules of different formant components are necessary for the voice conversion system with better Valence control

Research paper thumbnail of Method for improving the word intelligibility of presented speech using bone-conduction headphones

Research paper thumbnail of 聴触覚相互作用が音像の距離弁別精度に与える影響(<特集>VR心理学6)

Transactions of the Virtual Reality Society of Japan, 2016

Research paper thumbnail of The cross- modal effect of perceptual organization of sounds on the visual target detection

Research paper thumbnail of 3次元音場再現システムでのコンテンツ選択における指針の提案

Transactions of the Virtual Reality Society of Japan, 2014

Research paper thumbnail of Voice Modification for Announcement of Evacuation Guidance in Noisy Reverberant Environments

Research paper thumbnail of Study on method for protecting speech privacy by actively controlling speech transmission index in simulated room

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017

Protecting speech privacy in a specific room is an important challenge in room acoustics. However... more Protecting speech privacy in a specific room is an important challenge in room acoustics. However, protecting people's conversation from being overheard by an unintended listener, that is, making them not understandable, is difficult. This paper proposes a method for protecting speech privacy by actively controlling the speech transmission index (STI) in a simulated room containing an unintended listener. In this method, the STI in the simulated room can be controlled by manipulating the parameters of the simulated room impulse response (RIR). We can control the STI by convolving speech with the simulated RIR because the presentation of speech and additive delayed-manipulated speech can be regarded as the convolution of speech with late reverberation in the simulated room. Three experiments (world intelligibility, listening difficulty, and annoyance tests) were conducted to compare the proposed method with two conventional methods (noise masking and reverberation). The results showed that speech privacy can be protected by controlling STI derived by manipulating the simulated RIR. The results also showed that the proposed method can protect the privacy of conversations as effectively as those other methods can by using lower noise levels and shorter reverberation.

Research paper thumbnail of The effect of aoudiory stimuli on the visual target detection task

Research paper thumbnail of Following instructions, risk perception of evacuation calling voice and placement of emotional VAD space

Research paper thumbnail of Study on Perception of Speaker Age by Semantic Differential Method

Humans can perceive ages of speakers from uttered voices by their own judgements. The perceived a... more Humans can perceive ages of speakers from uttered voices by their own judgements. The perceived ages are called perceptual ages (PAs). Many earlier studies focused on statistical correlations between aging voices and acoustic features without taking into account the fact that human perception is vague rather than precise [1]. This paper focuses on the psychological factors to study human perceptions for aging voices. An experiment was carried out to evaluate the aging voices by candidates of semantic primitives, and the results of the listening test were analyzed by Semantic Differential Method and Regression Analysis to investigate impressions that human use to estimate PAs of speakers. Results show that with regards to both male and female voices, the Metal Factor (Deep-Flimsy, Full-Delicate, Rich-Thin, Heavy-Light), which shows a linear relation with both male and female PAs, is the most important factor that helps listeners judge PAs of uttered voices. In addition, the rest of the factors show both linear and non-linear relationships with male aging voices, while only non-linear relations with female aging voices.

Research paper thumbnail of Research and Development of Intelligent Speech Presentation System for Evacuation Guidance that can Reliably Convey Information Necessary for Disasters by Speech

Research paper thumbnail of Presence of Three-Dimensional Sound Field Facilitates Listeners’ Mood, Felt Emotion, and Respiration Rate When Listening to Music

Frontiers in Psychology, 2021

Many studies have investigated the effects of music listening from the viewpoint of music feature... more Many studies have investigated the effects of music listening from the viewpoint of music features such as tempo or key by measuring psychological or psychophysiological responses. In addition, technologies for three-dimensional sound field (3D-SF) reproduction and binaural recording have been developed to induce a realistic sensation of sound. However, it is still unclear whether music listened to in the presence of 3D-SF is more impressive than in the absence of it. We hypothesized that the presence of a 3D-SF when listening to music facilitates listeners’ moods, emotions for music, and physiological activities such as respiration rate. Here, we examined this hypothesis by evaluating differences between a reproduction condition with headphones (HD condition) and one with a 3D-SF reproduction system (3D-SF condition). We used a 3D-SF reproduction system based on the boundary surface control principle (BoSC system) to reproduce a sound field of music in the 3D-SF condition. Music in...

Research paper thumbnail of Acoustic features in speech for emergency perception

The Journal of the Acoustical Society of America, 2018

Previous studies have reported that the acoustic features such as the speech rate, fundamental fr... more Previous studies have reported that the acoustic features such as the speech rate, fundamental frequency (F0), amplitude, and voice gender are related to emergency perception in speech. However, the most critical factor influencing the emergency perception in speech remains unknown. In this study, we compared influences of three acoustic features (speech rate, F0, and spectral sequence (amplitude)) to determine the acoustic feature that has the most influence on emergency perception in speech. Prior to conducting our experiments, we selected five speech phrases with different level of perceived emergency among various speech phrase spoken by TV casters during real emergencies. We then created synthesized voices by replacing three acoustic features separately among the selected five voices. In experiment 1, we presented these synthesized voices to 10 participants and asked them to evaluate levels of the perceived emergency of each voice by the magnitude estimation method. The results from experiment 1 showed that F0 was most influential on emergency perception. In experiment 2, we examined influences of the three acoustic features on auditory impression related to the perceived emergency by the SD method. The results suggested that emotional effects of some words such as “tense” or/and “rush” were influenced by the fundamental frequency.Previous studies have reported that the acoustic features such as the speech rate, fundamental frequency (F0), amplitude, and voice gender are related to emergency perception in speech. However, the most critical factor influencing the emergency perception in speech remains unknown. In this study, we compared influences of three acoustic features (speech rate, F0, and spectral sequence (amplitude)) to determine the acoustic feature that has the most influence on emergency perception in speech. Prior to conducting our experiments, we selected five speech phrases with different level of perceived emergency among various speech phrase spoken by TV casters during real emergencies. We then created synthesized voices by replacing three acoustic features separately among the selected five voices. In experiment 1, we presented these synthesized voices to 10 participants and asked them to evaluate levels of the perceived emergency of each voice by the magnitude estimation method. The results from experiment 1 show...

Research paper thumbnail of The Effect of Irrelevant Sounds on the Auditory Continuity Illusion

Proceedings of Fechner Day, 2007

The effect of irrelevant sounds on the auditory continuity illusion was examined. Listeners judge... more The effect of irrelevant sounds on the auditory continuity illusion was examined. Listeners judged whether a tone (inducee) that was repeatedly alternated with a band-pass noise (in- ducer) was continuous or discontinuous. A sequence of irrelevant sounds, that is, tone pips at a remote frequency from the inducee, increased the limit of illusory continuity in terms of maximum inducee level when the irrelevant sounds were synchronized with the onsets of the inducers. The effect of the irrelevant sounds depends on the timing relationship between the irrelevant sounds and the inducers. These results suggest that illusory continuity is not fully determined by local, pre-attentive processing in the auditory system.

Research paper thumbnail of The Effects of Spatialized Sounds on the Sense of Presence in Auditory Virtual Environments: A Psychological and Physiological Study

Presence: Teleoperators and Virtual Environments, 2015

Although many studies have indicated that spatialized sounds increase the subjective sense of pre... more Although many studies have indicated that spatialized sounds increase the subjective sense of presence in virtual environments, few studies have examined the effects of sounds objectively. In this study, we examined whether three-dimensional reproduced sounds increase the sense of presence in auditory virtual environments by using physiological and psychological measures. We presented the sounds of people approaching the listener through a three-dimensional reproduction system using 96 loudspeakers. There were two spatial sound conditions, spatialized and non-spatialized, which had different spatial accuracy of the reproduction. The experimental results showed that presence ratings for spatialized sounds were greater than for non-spatialized sounds. Further, the results of the physiological measures showed that the sympathetic nervous system was activated to a greater extent by the spatialized sounds compared with the non-spatialized sounds, and the responses to the three-dimensiona...

Research paper thumbnail of Consideration of Effective Acoustic Rendering of Spatialized Ambient Sound

Interdisciplinary Information Sciences, 2012

Three-dimensional sound auralization systems have been developed actively in the last few decades... more Three-dimensional sound auralization systems have been developed actively in the last few decades. Such systems are called virtual auditory displays (VADs). In conventional VADs based on head-related transfer functions (HRTFs), a sound source position alone is rendered by disregarding other acoustical phenomena. However, because various sounds surround us in our daily life, we usually hear not only a targeted direct sound but also ambient sounds in an actual sound space. A lack of ambient sound often engenders an unnatural perception of the virtual auditory space presented by VAD based on HRTFs. Therefore, ambient sounds should be included in the VAD system auralization. We investigated an effective rendering method of ambient sound using ordinary colored noise. Furthermore, using subjective evaluations, we discuss the relation between the realism of sound space with ambient sounds and a listener's head movement.

Research paper thumbnail of Crossmodal Contingent Aftereffect

i-Perception, 2011

Sounds containing no motion or positional cues could induce illusory visual motion perception for... more Sounds containing no motion or positional cues could induce illusory visual motion perception for static visual stimuli. Two identical visual stimuli placed side by side were presented in alternation producing apparent motion perception and each stimulus was accompanied by a tone burst of a specific and unique frequency. After prolonged exposure to the apparent motion, the tones acquired driving effects for motion perception; a visual stimulus blinking at a fixed location was perceived as lateral motion. The effect lasted at least for a few days and was only observed at the retinal position that was previously exposed to apparent motion with the tones. Furthermore, the effect was specific to ear and sound frequency presented in the exposure period. These results indicate that strong association between visual motion and sound sequence is easily formed within a short period and that very early stages of sensory processing might be responsive loci for the current phenomenon.

Research paper thumbnail of Sound field reproduction and sharing system based on the boundary surface control principle

Acoustical Science and Technology, 2015

In this paper, we introduce a newly developed sound-field-reproducing and-sharing system. The sys... more In this paper, we introduce a newly developed sound-field-reproducing and-sharing system. The system consists of an 80-channel fullerene-shaped microphone array and a 96-channel loudspeaker array mounted in an enclosure called a sound cask, so named because of its shape. The cask has two functions. First, it functions as a precise sound field reproduction system. The sound signals acquired from a microphone array in any sound field can be reproduced in the sound cask after passing through filters that modify the amplitude and phase on the basis of the boundary surface control principle. The large number of loudspeakers result in the precise orientation and depth of sound images. Second, it functions as a platform for a sound-field-sharing system. Several casks located remotely can appear to exist in the same sound field for subjects inside a cask. In addition, the cask is large enough for one to be able to play a musical instrument inside it. The musical sound or voices produced by subjects can be shared by subjects in a distant cask after convoluting the impulse responses of the original sound field. The concept of the system is explained in detail.

Research paper thumbnail of Indiscriminable sounds determine the direction of visual motion

Scientific reports, 2012

On cross-modal interactions, top-down controls such as attention and explicit identification of c... more On cross-modal interactions, top-down controls such as attention and explicit identification of cross-modal inputs were assumed to play crucial roles for the optimization. Here we show the establishment of cross-modal associations without such top-down controls. The onsets of two circles producing apparent motion perception were accompanied by indiscriminable sounds consisting of six identical and one unique sound frequencies. After adaptation to the visual apparent motion with the sounds, the sounds acquired a driving effect for illusory visual apparent motion perception. Moreover, the pure tones with each unique frequency of the sounds acquired the same effect after the adaptation, indicating that the difference in the indiscriminable sounds was implicitly coded. We further confrimed that the aftereffect didnot transfer between eyes. These results suggest that the brain establishes new neural representations between sound frequency and visual motion without clear identification of...