Jonathan Berger - Academia.edu (original) (raw)
Papers by Jonathan Berger
CHI Conference on Human Factors in Computing Systems
We seek to detect characteristics of regional language accent in solo singing using two variants ... more We seek to detect characteristics of regional language accent in solo singing using two variants of convolutional neural networks to classify reported country and language from ten countries during karaoke-style vocal performance of the broadly popular hymn, Amazing Grace. The most successful model produces overall accuracy of 15.64%, with false classification of singing segments to be variants of English at 53.4%. The model also separates learned classes along a rhythmic-stress dimension with English variants at its origin. These observations suggest that, based on the network’s success in learning intonation features, a singer’s speech pronunciation adapts to the language of the song being sung.
Journal of Applied Clinical Medical Physics, 2010
Frontiers in Psychology, 2017
We describe an approach and framework for integrating synthesis and processing of digital image a... more We describe an approach and framework for integrating synthesis and processing of digital image and sound. The approach is based on creating an analogy between sound and image by introducing the notion of the soxel, a representation analogous to the pixel. We describe some simple mappings between two domains that map pitch, time, spatial coordinates and timbre to bitmap mode, gray-scale mode, RGB color mode, and layer mode for images. The framework described, sonART, is a powerful multimedia application for integration of image and sound processing with flexible network communication.
IEEE Transactions on Audio, Speech and Language Processing, 2007
Proceedings of the Internaional …, 2004
Raster scanning is a technique for generating or recording a video image by means of a line-by-li... more Raster scanning is a technique for generating or recording a video image by means of a line-by-line sweep, tantamount to a data map-ping scheme between one and two dimensional spaces. While this geometric structure has been widely used on many data transmis-sion and storage systems as well as most video displaying and capturing devices, its application to audio related research or art is rare. In this paper, a data mapping mechanism of raster scanning is proposed as a framework for both image sonification and sound visualization. This mechanism is simple, and produces compelling results when used for sonifying image texture and visualizing sound timbre. In addition to its potential as a cross modal representation, its complementary and analogous property can be applied sequen-tially to create a chain of sonifications and visualizations using digital filters, thus suggesting a useful creative method of audio processing. Special attention is paid to the rastrogram -raster visualiza-ti...
This paper describes a method for integrating audio and visual dis-plays to explore the activity ... more This paper describes a method for integrating audio and visual dis-plays to explore the activity of neurons in the brain. The motivation is twofold: to help understand how populations of neurons respond during cognitive tasks and in turn explore how signals from the brain might be used to create musical sounds. Experimental data was drawn from electrophysiological recordings of individual neurons in awake behaving monkeys, and an interface was designed to allow the user to step through a visual task as seen by the monkey along with concurrent sonification and visualization of activity from a population of recorded neurons. Data from two experimental paradigms illustrating different functional properties of neurons in the prefrontal cortex during attention and decision-making tasks are presented. The current system provides an accessible way to learn about how neural activity underlies cognitive functions and serves as a preliminary framework to explore both analytical and aesthetic ...
… Stilson, Heinrich Taube, Scott Van Duyne, Tony …, 1997
... Jonathan Berger-Associate Professor of Music; Julius Smith-Associate Professor of Music and E... more ... Jonathan Berger-Associate Professor of Music; Julius Smith-Associate Professor of Music and Electrical Engineering; John Chowning-Professor of ... Techniques"-Gary Scavone 'Transient Modeling Synthesis: a flexible analysis/syn-thesis tool for transient signals"-Tony S. Verma ...
In the absence of a well suited measure for quantifying binaural data variations, this study pres... more In the absence of a well suited measure for quantifying binaural data variations, this study presents the use of a global perceptual distance metric which can describe both HRTF as well as listener similarities. The metric is derived based on subjective evaluations of binaural renderings of a sound moving along predefined trajectories in the horizontal and median planes. Its characteristics and advantages in describing data distributions based on perceptually relevant attributes are discussed. In addition, the use of 24 HRTFs from two different databases of origin allows for an evaluation of the perceptual impact of some database-dependent characteristics on spatialization. The effectiveness of the experimental design as well as the correlation between the HRTF evaluations of the two plane trajectories are also discussed.
It is well known that people often are uncomfortable while hearing their recorded singing and spe... more It is well known that people often are uncomfortable while hearing their recorded singing and speaking voice. This unfamiliarity with the recorded voice, compared to normal hearing, is due to a different transmission mechanism; listening to one’s recorded voice only involves a single air-conduction pathway, whereas the voice we hear when we sing and speak is largely due to a boneconduction pathway. Despite the well-known phenomenon, one’s own hearing has received less attention among researchers since it is a very complex process involving multiple paths from vocal cords to hearing sensation. Furthermore, we are studying the perception of living humans, thus adding more difficulty to proceed mechanical studies because of an ethical reason. In this study, we aim to measure one’s own hearing through a perceptual experiment using a graphical equalizer. We assume that if a subject matches a self-hearing and a hearing of recorded voice by altering slider levels on the equalizer, we can d...
We describe a perceptual space for timbre, define an objective metric that takes into account per... more We describe a perceptual space for timbre, define an objective metric that takes into account perceptual orthogonality and measure the quality of timbre interpolation. We discuss two timbre representations and using these two representations, measure perceived relationships between pairs of sounds on a equivalent range of timbre variety. We determine that a timbre space based on Mel-frequency cepstral coefficients (MFCC) is a good model for a perceptual timbre space.
SonART is a flexible, multi-purpose multimedia environment that allows for networked collaborativ... more SonART is a flexible, multi-purpose multimedia environment that allows for networked collaborative interaction with applications for art, science and industry. In this paper we describe the integration of image and audio that SonART enables. An arbitrary number of layered canvases, each with independent control of opacity, RGB values, etc., can transmit or receive data using Open Sound Control. Data from images can be used for synthesis or audio signal processing and vice versa. SonART provides an open ended framework for integration of powerful image and audio processing methds with a flexible network communications protocol. Applications include multimedia art, collaborative and interactive art and design, and scientific and diagnostic exploration of data.
The Covid-19 pandemic severely limited collaboration among musicians in rehearsal and ensemble pe... more The Covid-19 pandemic severely limited collaboration among musicians in rehearsal and ensemble performance, and demanded radical shifts in collaborative practices. Understanding the nature of these changes in music creators' patterns of collaboration, as well as how musicians shifted prioritizations and adapted their use of the available technologies, can offer invaluable insights into the resilience and importance of different aspects of musical collaboration. In addition, assessing changes in the collaboration networks among music creators can improve the current understanding of genre and style formation and evolution. We used an internet survey distributed to music creators, including performers, composers, producers, and engineers, all active before and during the pandemic, to assess their perceptions of how their music, collaborative practice, and use of technology were impacted by shelter-in-place orders associated with Covid-19, as well as how they adapted over the cours...
This paper describes Tap-It, an iOS application for sensori-motor synchronization (SMS) experimen... more This paper describes Tap-It, an iOS application for sensori-motor synchronization (SMS) experiments. Tap-It plays an audio file while simultaneously collecting time-locked tapped responses to the audio. The main features of Tap-It compared to desktop-based SMS apparatuses are mobility, high-precision timing, a touchscreen interface, and online distribution. Tap-It records both the time stamp of the tap time from the touchscreen, as well as the sound of the tapping, recorded from the microphone of the device. We provide an overview of the use of the application, from setting up an experiment to collecting and analyzing the output data. We analyze the latencies of both types of output data and assess the errors of each. We also discuss implications of the application for mobile devices. The application is available free of charge through the Apple App Store, and the source code is also readily available.
Inter-subject correlations (ISCs) of cortical responses have been shown to index audience engagem... more Inter-subject correlations (ISCs) of cortical responses have been shown to index audience engagement with narrative works. In parallel lines of research, continuous self-reports and physiological measures have linked listener engagement and arousal with specific musical events. Here we combine EEG-ISCs, physiological responses, and continuous self-reports to assess listener engagement and arousal in response to a full-length musical excerpt. The temporal resolution of these measures permits connections to be drawn among them, and also to corresponding musical events in the stimulus. Simultaneous 128-channel EEG, ECG, and respiratory inductive plethysmography were recorded from 13 musicians who heard the first movement of Elgar’s E-Minor Cello Concerto in original and reversed conditions. Continuous self-reports of engagement with each excerpt were collected in a separate listen. ISCs of EEG responses were computed in the subspace of its maximally correlated component. Temporally res...
CHI Conference on Human Factors in Computing Systems
We seek to detect characteristics of regional language accent in solo singing using two variants ... more We seek to detect characteristics of regional language accent in solo singing using two variants of convolutional neural networks to classify reported country and language from ten countries during karaoke-style vocal performance of the broadly popular hymn, Amazing Grace. The most successful model produces overall accuracy of 15.64%, with false classification of singing segments to be variants of English at 53.4%. The model also separates learned classes along a rhythmic-stress dimension with English variants at its origin. These observations suggest that, based on the network’s success in learning intonation features, a singer’s speech pronunciation adapts to the language of the song being sung.
Journal of Applied Clinical Medical Physics, 2010
Frontiers in Psychology, 2017
We describe an approach and framework for integrating synthesis and processing of digital image a... more We describe an approach and framework for integrating synthesis and processing of digital image and sound. The approach is based on creating an analogy between sound and image by introducing the notion of the soxel, a representation analogous to the pixel. We describe some simple mappings between two domains that map pitch, time, spatial coordinates and timbre to bitmap mode, gray-scale mode, RGB color mode, and layer mode for images. The framework described, sonART, is a powerful multimedia application for integration of image and sound processing with flexible network communication.
IEEE Transactions on Audio, Speech and Language Processing, 2007
Proceedings of the Internaional …, 2004
Raster scanning is a technique for generating or recording a video image by means of a line-by-li... more Raster scanning is a technique for generating or recording a video image by means of a line-by-line sweep, tantamount to a data map-ping scheme between one and two dimensional spaces. While this geometric structure has been widely used on many data transmis-sion and storage systems as well as most video displaying and capturing devices, its application to audio related research or art is rare. In this paper, a data mapping mechanism of raster scanning is proposed as a framework for both image sonification and sound visualization. This mechanism is simple, and produces compelling results when used for sonifying image texture and visualizing sound timbre. In addition to its potential as a cross modal representation, its complementary and analogous property can be applied sequen-tially to create a chain of sonifications and visualizations using digital filters, thus suggesting a useful creative method of audio processing. Special attention is paid to the rastrogram -raster visualiza-ti...
This paper describes a method for integrating audio and visual dis-plays to explore the activity ... more This paper describes a method for integrating audio and visual dis-plays to explore the activity of neurons in the brain. The motivation is twofold: to help understand how populations of neurons respond during cognitive tasks and in turn explore how signals from the brain might be used to create musical sounds. Experimental data was drawn from electrophysiological recordings of individual neurons in awake behaving monkeys, and an interface was designed to allow the user to step through a visual task as seen by the monkey along with concurrent sonification and visualization of activity from a population of recorded neurons. Data from two experimental paradigms illustrating different functional properties of neurons in the prefrontal cortex during attention and decision-making tasks are presented. The current system provides an accessible way to learn about how neural activity underlies cognitive functions and serves as a preliminary framework to explore both analytical and aesthetic ...
… Stilson, Heinrich Taube, Scott Van Duyne, Tony …, 1997
... Jonathan Berger-Associate Professor of Music; Julius Smith-Associate Professor of Music and E... more ... Jonathan Berger-Associate Professor of Music; Julius Smith-Associate Professor of Music and Electrical Engineering; John Chowning-Professor of ... Techniques"-Gary Scavone 'Transient Modeling Synthesis: a flexible analysis/syn-thesis tool for transient signals"-Tony S. Verma ...
In the absence of a well suited measure for quantifying binaural data variations, this study pres... more In the absence of a well suited measure for quantifying binaural data variations, this study presents the use of a global perceptual distance metric which can describe both HRTF as well as listener similarities. The metric is derived based on subjective evaluations of binaural renderings of a sound moving along predefined trajectories in the horizontal and median planes. Its characteristics and advantages in describing data distributions based on perceptually relevant attributes are discussed. In addition, the use of 24 HRTFs from two different databases of origin allows for an evaluation of the perceptual impact of some database-dependent characteristics on spatialization. The effectiveness of the experimental design as well as the correlation between the HRTF evaluations of the two plane trajectories are also discussed.
It is well known that people often are uncomfortable while hearing their recorded singing and spe... more It is well known that people often are uncomfortable while hearing their recorded singing and speaking voice. This unfamiliarity with the recorded voice, compared to normal hearing, is due to a different transmission mechanism; listening to one’s recorded voice only involves a single air-conduction pathway, whereas the voice we hear when we sing and speak is largely due to a boneconduction pathway. Despite the well-known phenomenon, one’s own hearing has received less attention among researchers since it is a very complex process involving multiple paths from vocal cords to hearing sensation. Furthermore, we are studying the perception of living humans, thus adding more difficulty to proceed mechanical studies because of an ethical reason. In this study, we aim to measure one’s own hearing through a perceptual experiment using a graphical equalizer. We assume that if a subject matches a self-hearing and a hearing of recorded voice by altering slider levels on the equalizer, we can d...
We describe a perceptual space for timbre, define an objective metric that takes into account per... more We describe a perceptual space for timbre, define an objective metric that takes into account perceptual orthogonality and measure the quality of timbre interpolation. We discuss two timbre representations and using these two representations, measure perceived relationships between pairs of sounds on a equivalent range of timbre variety. We determine that a timbre space based on Mel-frequency cepstral coefficients (MFCC) is a good model for a perceptual timbre space.
SonART is a flexible, multi-purpose multimedia environment that allows for networked collaborativ... more SonART is a flexible, multi-purpose multimedia environment that allows for networked collaborative interaction with applications for art, science and industry. In this paper we describe the integration of image and audio that SonART enables. An arbitrary number of layered canvases, each with independent control of opacity, RGB values, etc., can transmit or receive data using Open Sound Control. Data from images can be used for synthesis or audio signal processing and vice versa. SonART provides an open ended framework for integration of powerful image and audio processing methds with a flexible network communications protocol. Applications include multimedia art, collaborative and interactive art and design, and scientific and diagnostic exploration of data.
The Covid-19 pandemic severely limited collaboration among musicians in rehearsal and ensemble pe... more The Covid-19 pandemic severely limited collaboration among musicians in rehearsal and ensemble performance, and demanded radical shifts in collaborative practices. Understanding the nature of these changes in music creators' patterns of collaboration, as well as how musicians shifted prioritizations and adapted their use of the available technologies, can offer invaluable insights into the resilience and importance of different aspects of musical collaboration. In addition, assessing changes in the collaboration networks among music creators can improve the current understanding of genre and style formation and evolution. We used an internet survey distributed to music creators, including performers, composers, producers, and engineers, all active before and during the pandemic, to assess their perceptions of how their music, collaborative practice, and use of technology were impacted by shelter-in-place orders associated with Covid-19, as well as how they adapted over the cours...
This paper describes Tap-It, an iOS application for sensori-motor synchronization (SMS) experimen... more This paper describes Tap-It, an iOS application for sensori-motor synchronization (SMS) experiments. Tap-It plays an audio file while simultaneously collecting time-locked tapped responses to the audio. The main features of Tap-It compared to desktop-based SMS apparatuses are mobility, high-precision timing, a touchscreen interface, and online distribution. Tap-It records both the time stamp of the tap time from the touchscreen, as well as the sound of the tapping, recorded from the microphone of the device. We provide an overview of the use of the application, from setting up an experiment to collecting and analyzing the output data. We analyze the latencies of both types of output data and assess the errors of each. We also discuss implications of the application for mobile devices. The application is available free of charge through the Apple App Store, and the source code is also readily available.
Inter-subject correlations (ISCs) of cortical responses have been shown to index audience engagem... more Inter-subject correlations (ISCs) of cortical responses have been shown to index audience engagement with narrative works. In parallel lines of research, continuous self-reports and physiological measures have linked listener engagement and arousal with specific musical events. Here we combine EEG-ISCs, physiological responses, and continuous self-reports to assess listener engagement and arousal in response to a full-length musical excerpt. The temporal resolution of these measures permits connections to be drawn among them, and also to corresponding musical events in the stimulus. Simultaneous 128-channel EEG, ECG, and respiratory inductive plethysmography were recorded from 13 musicians who heard the first movement of Elgar’s E-Minor Cello Concerto in original and reversed conditions. Continuous self-reports of engagement with each excerpt were collected in a separate listen. ISCs of EEG responses were computed in the subspace of its maximally correlated component. Temporally res...