Maori Kobayashi - Academia.edu (original) (raw)
Papers by Maori Kobayashi
In order to investigate what acoustic features are important to emotional impressions and how tho... more In order to investigate what acoustic features are important to emotional impressions and how those features relate to emotion perception, we interpolate voices from pairs of typical emotions with a morphing method, collect emotion scores on Arousal-Valence space by a listening test, and analyze how acoustic features relate to the evaluations. The results show that Arousal perception can be stably described by merely using fundamental frequency (F0). In contrast, although this research found that F0 and formants can fit Valence scores, how acoustic features correspond to Valence perception vary with different morphing references. Furthermore, the results show that modification rules of different formant components are necessary for the voice conversion system with better Valence control
Transactions of the Virtual Reality Society of Japan, 2016
Transactions of the Virtual Reality Society of Japan, 2014
2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017
Protecting speech privacy in a specific room is an important challenge in room acoustics. However... more Protecting speech privacy in a specific room is an important challenge in room acoustics. However, protecting people's conversation from being overheard by an unintended listener, that is, making them not understandable, is difficult. This paper proposes a method for protecting speech privacy by actively controlling the speech transmission index (STI) in a simulated room containing an unintended listener. In this method, the STI in the simulated room can be controlled by manipulating the parameters of the simulated room impulse response (RIR). We can control the STI by convolving speech with the simulated RIR because the presentation of speech and additive delayed-manipulated speech can be regarded as the convolution of speech with late reverberation in the simulated room. Three experiments (world intelligibility, listening difficulty, and annoyance tests) were conducted to compare the proposed method with two conventional methods (noise masking and reverberation). The results showed that speech privacy can be protected by controlling STI derived by manipulating the simulated RIR. The results also showed that the proposed method can protect the privacy of conversations as effectively as those other methods can by using lower noise levels and shorter reverberation.
Humans can perceive ages of speakers from uttered voices by their own judgements. The perceived a... more Humans can perceive ages of speakers from uttered voices by their own judgements. The perceived ages are called perceptual ages (PAs). Many earlier studies focused on statistical correlations between aging voices and acoustic features without taking into account the fact that human perception is vague rather than precise [1]. This paper focuses on the psychological factors to study human perceptions for aging voices. An experiment was carried out to evaluate the aging voices by candidates of semantic primitives, and the results of the listening test were analyzed by Semantic Differential Method and Regression Analysis to investigate impressions that human use to estimate PAs of speakers. Results show that with regards to both male and female voices, the Metal Factor (Deep-Flimsy, Full-Delicate, Rich-Thin, Heavy-Light), which shows a linear relation with both male and female PAs, is the most important factor that helps listeners judge PAs of uttered voices. In addition, the rest of the factors show both linear and non-linear relationships with male aging voices, while only non-linear relations with female aging voices.
Frontiers in Psychology, 2021
Many studies have investigated the effects of music listening from the viewpoint of music feature... more Many studies have investigated the effects of music listening from the viewpoint of music features such as tempo or key by measuring psychological or psychophysiological responses. In addition, technologies for three-dimensional sound field (3D-SF) reproduction and binaural recording have been developed to induce a realistic sensation of sound. However, it is still unclear whether music listened to in the presence of 3D-SF is more impressive than in the absence of it. We hypothesized that the presence of a 3D-SF when listening to music facilitates listeners’ moods, emotions for music, and physiological activities such as respiration rate. Here, we examined this hypothesis by evaluating differences between a reproduction condition with headphones (HD condition) and one with a 3D-SF reproduction system (3D-SF condition). We used a 3D-SF reproduction system based on the boundary surface control principle (BoSC system) to reproduce a sound field of music in the 3D-SF condition. Music in...
The Journal of the Acoustical Society of America, 2018
Previous studies have reported that the acoustic features such as the speech rate, fundamental fr... more Previous studies have reported that the acoustic features such as the speech rate, fundamental frequency (F0), amplitude, and voice gender are related to emergency perception in speech. However, the most critical factor influencing the emergency perception in speech remains unknown. In this study, we compared influences of three acoustic features (speech rate, F0, and spectral sequence (amplitude)) to determine the acoustic feature that has the most influence on emergency perception in speech. Prior to conducting our experiments, we selected five speech phrases with different level of perceived emergency among various speech phrase spoken by TV casters during real emergencies. We then created synthesized voices by replacing three acoustic features separately among the selected five voices. In experiment 1, we presented these synthesized voices to 10 participants and asked them to evaluate levels of the perceived emergency of each voice by the magnitude estimation method. The results from experiment 1 showed that F0 was most influential on emergency perception. In experiment 2, we examined influences of the three acoustic features on auditory impression related to the perceived emergency by the SD method. The results suggested that emotional effects of some words such as “tense” or/and “rush” were influenced by the fundamental frequency.Previous studies have reported that the acoustic features such as the speech rate, fundamental frequency (F0), amplitude, and voice gender are related to emergency perception in speech. However, the most critical factor influencing the emergency perception in speech remains unknown. In this study, we compared influences of three acoustic features (speech rate, F0, and spectral sequence (amplitude)) to determine the acoustic feature that has the most influence on emergency perception in speech. Prior to conducting our experiments, we selected five speech phrases with different level of perceived emergency among various speech phrase spoken by TV casters during real emergencies. We then created synthesized voices by replacing three acoustic features separately among the selected five voices. In experiment 1, we presented these synthesized voices to 10 participants and asked them to evaluate levels of the perceived emergency of each voice by the magnitude estimation method. The results from experiment 1 show...
Proceedings of Fechner Day, 2007
The effect of irrelevant sounds on the auditory continuity illusion was examined. Listeners judge... more The effect of irrelevant sounds on the auditory continuity illusion was examined. Listeners judged whether a tone (inducee) that was repeatedly alternated with a band-pass noise (in- ducer) was continuous or discontinuous. A sequence of irrelevant sounds, that is, tone pips at a remote frequency from the inducee, increased the limit of illusory continuity in terms of maximum inducee level when the irrelevant sounds were synchronized with the onsets of the inducers. The effect of the irrelevant sounds depends on the timing relationship between the irrelevant sounds and the inducers. These results suggest that illusory continuity is not fully determined by local, pre-attentive processing in the auditory system.
Presence: Teleoperators and Virtual Environments, 2015
Although many studies have indicated that spatialized sounds increase the subjective sense of pre... more Although many studies have indicated that spatialized sounds increase the subjective sense of presence in virtual environments, few studies have examined the effects of sounds objectively. In this study, we examined whether three-dimensional reproduced sounds increase the sense of presence in auditory virtual environments by using physiological and psychological measures. We presented the sounds of people approaching the listener through a three-dimensional reproduction system using 96 loudspeakers. There were two spatial sound conditions, spatialized and non-spatialized, which had different spatial accuracy of the reproduction. The experimental results showed that presence ratings for spatialized sounds were greater than for non-spatialized sounds. Further, the results of the physiological measures showed that the sympathetic nervous system was activated to a greater extent by the spatialized sounds compared with the non-spatialized sounds, and the responses to the three-dimensiona...
Interdisciplinary Information Sciences, 2012
Three-dimensional sound auralization systems have been developed actively in the last few decades... more Three-dimensional sound auralization systems have been developed actively in the last few decades. Such systems are called virtual auditory displays (VADs). In conventional VADs based on head-related transfer functions (HRTFs), a sound source position alone is rendered by disregarding other acoustical phenomena. However, because various sounds surround us in our daily life, we usually hear not only a targeted direct sound but also ambient sounds in an actual sound space. A lack of ambient sound often engenders an unnatural perception of the virtual auditory space presented by VAD based on HRTFs. Therefore, ambient sounds should be included in the VAD system auralization. We investigated an effective rendering method of ambient sound using ordinary colored noise. Furthermore, using subjective evaluations, we discuss the relation between the realism of sound space with ambient sounds and a listener's head movement.
i-Perception, 2011
Sounds containing no motion or positional cues could induce illusory visual motion perception for... more Sounds containing no motion or positional cues could induce illusory visual motion perception for static visual stimuli. Two identical visual stimuli placed side by side were presented in alternation producing apparent motion perception and each stimulus was accompanied by a tone burst of a specific and unique frequency. After prolonged exposure to the apparent motion, the tones acquired driving effects for motion perception; a visual stimulus blinking at a fixed location was perceived as lateral motion. The effect lasted at least for a few days and was only observed at the retinal position that was previously exposed to apparent motion with the tones. Furthermore, the effect was specific to ear and sound frequency presented in the exposure period. These results indicate that strong association between visual motion and sound sequence is easily formed within a short period and that very early stages of sensory processing might be responsive loci for the current phenomenon.
Acoustical Science and Technology, 2015
In this paper, we introduce a newly developed sound-field-reproducing and-sharing system. The sys... more In this paper, we introduce a newly developed sound-field-reproducing and-sharing system. The system consists of an 80-channel fullerene-shaped microphone array and a 96-channel loudspeaker array mounted in an enclosure called a sound cask, so named because of its shape. The cask has two functions. First, it functions as a precise sound field reproduction system. The sound signals acquired from a microphone array in any sound field can be reproduced in the sound cask after passing through filters that modify the amplitude and phase on the basis of the boundary surface control principle. The large number of loudspeakers result in the precise orientation and depth of sound images. Second, it functions as a platform for a sound-field-sharing system. Several casks located remotely can appear to exist in the same sound field for subjects inside a cask. In addition, the cask is large enough for one to be able to play a musical instrument inside it. The musical sound or voices produced by subjects can be shared by subjects in a distant cask after convoluting the impulse responses of the original sound field. The concept of the system is explained in detail.
Scientific reports, 2012
On cross-modal interactions, top-down controls such as attention and explicit identification of c... more On cross-modal interactions, top-down controls such as attention and explicit identification of cross-modal inputs were assumed to play crucial roles for the optimization. Here we show the establishment of cross-modal associations without such top-down controls. The onsets of two circles producing apparent motion perception were accompanied by indiscriminable sounds consisting of six identical and one unique sound frequencies. After adaptation to the visual apparent motion with the sounds, the sounds acquired a driving effect for illusory visual apparent motion perception. Moreover, the pure tones with each unique frequency of the sounds acquired the same effect after the adaptation, indicating that the difference in the indiscriminable sounds was implicitly coded. We further confrimed that the aftereffect didnot transfer between eyes. These results suggest that the brain establishes new neural representations between sound frequency and visual motion without clear identification of...
In order to investigate what acoustic features are important to emotional impressions and how tho... more In order to investigate what acoustic features are important to emotional impressions and how those features relate to emotion perception, we interpolate voices from pairs of typical emotions with a morphing method, collect emotion scores on Arousal-Valence space by a listening test, and analyze how acoustic features relate to the evaluations. The results show that Arousal perception can be stably described by merely using fundamental frequency (F0). In contrast, although this research found that F0 and formants can fit Valence scores, how acoustic features correspond to Valence perception vary with different morphing references. Furthermore, the results show that modification rules of different formant components are necessary for the voice conversion system with better Valence control
Transactions of the Virtual Reality Society of Japan, 2016
Transactions of the Virtual Reality Society of Japan, 2014
2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017
Protecting speech privacy in a specific room is an important challenge in room acoustics. However... more Protecting speech privacy in a specific room is an important challenge in room acoustics. However, protecting people's conversation from being overheard by an unintended listener, that is, making them not understandable, is difficult. This paper proposes a method for protecting speech privacy by actively controlling the speech transmission index (STI) in a simulated room containing an unintended listener. In this method, the STI in the simulated room can be controlled by manipulating the parameters of the simulated room impulse response (RIR). We can control the STI by convolving speech with the simulated RIR because the presentation of speech and additive delayed-manipulated speech can be regarded as the convolution of speech with late reverberation in the simulated room. Three experiments (world intelligibility, listening difficulty, and annoyance tests) were conducted to compare the proposed method with two conventional methods (noise masking and reverberation). The results showed that speech privacy can be protected by controlling STI derived by manipulating the simulated RIR. The results also showed that the proposed method can protect the privacy of conversations as effectively as those other methods can by using lower noise levels and shorter reverberation.
Humans can perceive ages of speakers from uttered voices by their own judgements. The perceived a... more Humans can perceive ages of speakers from uttered voices by their own judgements. The perceived ages are called perceptual ages (PAs). Many earlier studies focused on statistical correlations between aging voices and acoustic features without taking into account the fact that human perception is vague rather than precise [1]. This paper focuses on the psychological factors to study human perceptions for aging voices. An experiment was carried out to evaluate the aging voices by candidates of semantic primitives, and the results of the listening test were analyzed by Semantic Differential Method and Regression Analysis to investigate impressions that human use to estimate PAs of speakers. Results show that with regards to both male and female voices, the Metal Factor (Deep-Flimsy, Full-Delicate, Rich-Thin, Heavy-Light), which shows a linear relation with both male and female PAs, is the most important factor that helps listeners judge PAs of uttered voices. In addition, the rest of the factors show both linear and non-linear relationships with male aging voices, while only non-linear relations with female aging voices.
Frontiers in Psychology, 2021
Many studies have investigated the effects of music listening from the viewpoint of music feature... more Many studies have investigated the effects of music listening from the viewpoint of music features such as tempo or key by measuring psychological or psychophysiological responses. In addition, technologies for three-dimensional sound field (3D-SF) reproduction and binaural recording have been developed to induce a realistic sensation of sound. However, it is still unclear whether music listened to in the presence of 3D-SF is more impressive than in the absence of it. We hypothesized that the presence of a 3D-SF when listening to music facilitates listeners’ moods, emotions for music, and physiological activities such as respiration rate. Here, we examined this hypothesis by evaluating differences between a reproduction condition with headphones (HD condition) and one with a 3D-SF reproduction system (3D-SF condition). We used a 3D-SF reproduction system based on the boundary surface control principle (BoSC system) to reproduce a sound field of music in the 3D-SF condition. Music in...
The Journal of the Acoustical Society of America, 2018
Previous studies have reported that the acoustic features such as the speech rate, fundamental fr... more Previous studies have reported that the acoustic features such as the speech rate, fundamental frequency (F0), amplitude, and voice gender are related to emergency perception in speech. However, the most critical factor influencing the emergency perception in speech remains unknown. In this study, we compared influences of three acoustic features (speech rate, F0, and spectral sequence (amplitude)) to determine the acoustic feature that has the most influence on emergency perception in speech. Prior to conducting our experiments, we selected five speech phrases with different level of perceived emergency among various speech phrase spoken by TV casters during real emergencies. We then created synthesized voices by replacing three acoustic features separately among the selected five voices. In experiment 1, we presented these synthesized voices to 10 participants and asked them to evaluate levels of the perceived emergency of each voice by the magnitude estimation method. The results from experiment 1 showed that F0 was most influential on emergency perception. In experiment 2, we examined influences of the three acoustic features on auditory impression related to the perceived emergency by the SD method. The results suggested that emotional effects of some words such as “tense” or/and “rush” were influenced by the fundamental frequency.Previous studies have reported that the acoustic features such as the speech rate, fundamental frequency (F0), amplitude, and voice gender are related to emergency perception in speech. However, the most critical factor influencing the emergency perception in speech remains unknown. In this study, we compared influences of three acoustic features (speech rate, F0, and spectral sequence (amplitude)) to determine the acoustic feature that has the most influence on emergency perception in speech. Prior to conducting our experiments, we selected five speech phrases with different level of perceived emergency among various speech phrase spoken by TV casters during real emergencies. We then created synthesized voices by replacing three acoustic features separately among the selected five voices. In experiment 1, we presented these synthesized voices to 10 participants and asked them to evaluate levels of the perceived emergency of each voice by the magnitude estimation method. The results from experiment 1 show...
Proceedings of Fechner Day, 2007
The effect of irrelevant sounds on the auditory continuity illusion was examined. Listeners judge... more The effect of irrelevant sounds on the auditory continuity illusion was examined. Listeners judged whether a tone (inducee) that was repeatedly alternated with a band-pass noise (in- ducer) was continuous or discontinuous. A sequence of irrelevant sounds, that is, tone pips at a remote frequency from the inducee, increased the limit of illusory continuity in terms of maximum inducee level when the irrelevant sounds were synchronized with the onsets of the inducers. The effect of the irrelevant sounds depends on the timing relationship between the irrelevant sounds and the inducers. These results suggest that illusory continuity is not fully determined by local, pre-attentive processing in the auditory system.
Presence: Teleoperators and Virtual Environments, 2015
Although many studies have indicated that spatialized sounds increase the subjective sense of pre... more Although many studies have indicated that spatialized sounds increase the subjective sense of presence in virtual environments, few studies have examined the effects of sounds objectively. In this study, we examined whether three-dimensional reproduced sounds increase the sense of presence in auditory virtual environments by using physiological and psychological measures. We presented the sounds of people approaching the listener through a three-dimensional reproduction system using 96 loudspeakers. There were two spatial sound conditions, spatialized and non-spatialized, which had different spatial accuracy of the reproduction. The experimental results showed that presence ratings for spatialized sounds were greater than for non-spatialized sounds. Further, the results of the physiological measures showed that the sympathetic nervous system was activated to a greater extent by the spatialized sounds compared with the non-spatialized sounds, and the responses to the three-dimensiona...
Interdisciplinary Information Sciences, 2012
Three-dimensional sound auralization systems have been developed actively in the last few decades... more Three-dimensional sound auralization systems have been developed actively in the last few decades. Such systems are called virtual auditory displays (VADs). In conventional VADs based on head-related transfer functions (HRTFs), a sound source position alone is rendered by disregarding other acoustical phenomena. However, because various sounds surround us in our daily life, we usually hear not only a targeted direct sound but also ambient sounds in an actual sound space. A lack of ambient sound often engenders an unnatural perception of the virtual auditory space presented by VAD based on HRTFs. Therefore, ambient sounds should be included in the VAD system auralization. We investigated an effective rendering method of ambient sound using ordinary colored noise. Furthermore, using subjective evaluations, we discuss the relation between the realism of sound space with ambient sounds and a listener's head movement.
i-Perception, 2011
Sounds containing no motion or positional cues could induce illusory visual motion perception for... more Sounds containing no motion or positional cues could induce illusory visual motion perception for static visual stimuli. Two identical visual stimuli placed side by side were presented in alternation producing apparent motion perception and each stimulus was accompanied by a tone burst of a specific and unique frequency. After prolonged exposure to the apparent motion, the tones acquired driving effects for motion perception; a visual stimulus blinking at a fixed location was perceived as lateral motion. The effect lasted at least for a few days and was only observed at the retinal position that was previously exposed to apparent motion with the tones. Furthermore, the effect was specific to ear and sound frequency presented in the exposure period. These results indicate that strong association between visual motion and sound sequence is easily formed within a short period and that very early stages of sensory processing might be responsive loci for the current phenomenon.
Acoustical Science and Technology, 2015
In this paper, we introduce a newly developed sound-field-reproducing and-sharing system. The sys... more In this paper, we introduce a newly developed sound-field-reproducing and-sharing system. The system consists of an 80-channel fullerene-shaped microphone array and a 96-channel loudspeaker array mounted in an enclosure called a sound cask, so named because of its shape. The cask has two functions. First, it functions as a precise sound field reproduction system. The sound signals acquired from a microphone array in any sound field can be reproduced in the sound cask after passing through filters that modify the amplitude and phase on the basis of the boundary surface control principle. The large number of loudspeakers result in the precise orientation and depth of sound images. Second, it functions as a platform for a sound-field-sharing system. Several casks located remotely can appear to exist in the same sound field for subjects inside a cask. In addition, the cask is large enough for one to be able to play a musical instrument inside it. The musical sound or voices produced by subjects can be shared by subjects in a distant cask after convoluting the impulse responses of the original sound field. The concept of the system is explained in detail.
Scientific reports, 2012
On cross-modal interactions, top-down controls such as attention and explicit identification of c... more On cross-modal interactions, top-down controls such as attention and explicit identification of cross-modal inputs were assumed to play crucial roles for the optimization. Here we show the establishment of cross-modal associations without such top-down controls. The onsets of two circles producing apparent motion perception were accompanied by indiscriminable sounds consisting of six identical and one unique sound frequencies. After adaptation to the visual apparent motion with the sounds, the sounds acquired a driving effect for illusory visual apparent motion perception. Moreover, the pure tones with each unique frequency of the sounds acquired the same effect after the adaptation, indicating that the difference in the indiscriminable sounds was implicitly coded. We further confrimed that the aftereffect didnot transfer between eyes. These results suggest that the brain establishes new neural representations between sound frequency and visual motion without clear identification of...