George Tzanetakis | University of Victoria (original) (raw)

Papers by George Tzanetakis

Research paper thumbnail of Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies

ACM Multimedia, Nov 30, 2011

It is our great pleasure to welcome you to the 1st International ACM Workshop on Music Informatio... more It is our great pleasure to welcome you to the 1st International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies (MIRUM). MIRUM was proposed in order to gather experts from the Music and Multimedia Information Retrieval communities, as well as other neighboring fields, and aims to provide a high-profile platform for presenting current work on Music Information Retrieval, with strong focus on user-centered and multimodal approaches. Music content is multifaceted and exists in many different representations, including audio recordings, symbolic scores, folksonomy descriptions and accompanying video material. No single representation is capable of accounting for all of the music experience, which is strongly guided by affective and subjective context- and user-dependent factors. The existence of complementary representations and information sources in multiple modalities makes music multimedia content by definition. Furthermore, the subjective and affective aspects of music pose challenges that are faced and experienced in the broad Multimedia community. Thus, we believe it is appropriate to discuss these topics in a Multimedia context. The MIRUM 2011 Call for Papers attracted 22 international technical submissions. The program committee accepted 9 papers that cover a wide variety of topics, ranging from beat tracking techniques to affective analysis of music videos. In addition, the full-day program includes a keynote speech by Dr. Roeland Ordelman (Netherlands Institute for Sound and Vision & University of Twente, The Netherlands) on exploitation possibilities of audiovisual data in the networked information society, as well as a panel on bridging opportunities for the music and multimedia domains, featuring multiple experts from the music and multimedia communities. We hope that these proceedings will serve as a valuable reference for researchers in the fields of Music and Multimedia Information Retrieval, as well as neighboring fields.

Research paper thumbnail of Multimedia Technologies for Enriched Music Performance, Production, and Consumption

Research paper thumbnail of HEAR: Holistic Evaluation of Audio Representations

arXiv (Cornell University), Mar 6, 2022

What audio embedding approach generalizes best to a wide range of downstream tasks across a varie... more What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. HEAR was launched as a NeurIPS 2021 shared challenge. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single generalpurpose audio representation can perform as holistically as the human ear.

Research paper thumbnail of Modeling Grouping Cues for Auditory Scene Analysis Using a Spectral Clustering Formulation

IGI Global eBooks, Aug 4, 2010

Research paper thumbnail of Personalizing self-organizing music spaces with anchors: design and evaluation

Multimedia Tools and Applications, Feb 16, 2017

We propose and evaluate a system for content-based visualization and exploration of music collect... more We propose and evaluate a system for content-based visualization and exploration of music collections. The system is based on a modification of Kohonen’s Self-Organizing Map algorithm and allows users to choose the locations of clusters containing acoustically similar tracks on the music space. A user study conducted to evaluate the system shows that the possibility of personalizing the music space was perceived as difficult. Conversely, the user study and objective metrics derived from users’ interactions with the interface demonstrate that the proposed system helped individuals create playlists faster and, under some circumstances, more effectively. We believe that personalized browsing interfaces are an important area of research in Multimedia Information Retrieval, and both the system and user study contribute to the growing work in this field.

Research paper thumbnail of Semi-Automatic Mono to Stereo Up-Mixing Using Sound Source Formation

Journal of The Audio Engineering Society, May 1, 2007

The papers at this Convention have been selected on the basis of a submitted abstract and extende... more The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42 nd Street,

Research paper thumbnail of Pitch Histograms in Audio and Symbolic Music Information Retrieval

Journal of New Music Research, Jun 1, 2003

In order to represent musical content, pitch and timing information is utilized in the majority o... more In order to represent musical content, pitch and timing information is utilized in the majority of existing work in Symbolic Music Information Retrieval (MIR). Symbolic representations such as MIDI allow the easy calculation of such information and its manipulation. In contrast, most of the existing work in Audio MIR uses timbral and beat information, which can be calculated using automatic computer audition techniques. In this paper, Pitch Histograms are defined and proposed as a way to represent the pitch content of music signals both in symbolic and audio form. This representation is evaluated in the context of automatic musical genre classification. A multiple-pitch detection algorithm for polyphonic signals is used to calculate Pitch Histograms for audio signals. In order to evaluate the extent and significance of errors resulting from the automatic multiple-pitch detection, automatic musical genre classification results from symbolic and audio data are compared. The comparison indicates that Pitch Histograms provide valuable information for musical genre classification. The results obtained for both symbolic and audio cases indicate that although pitch errors degrade classification performance for the audio case, Pitch Histograms can be effectively used for classification in both cases.

Research paper thumbnail of Manifold Learning Methods for Visualization and Browsing of Drum Machine Samples

Journal of The Audio Engineering Society, Feb 24, 2021

Research paper thumbnail of Assistive music browsing using self-organizing maps

Music listening is an important activity for many people. Advances in technology have made possib... more Music listening is an important activity for many people. Advances in technology have made possible the creation of music collections with thousands of songs in portable music players. Navigating these large music collections is challenging especially for users with vision and/or motion disabilities. In this paper we describe our current efforts to build effective music browsing interfaces for people with disabilities. The foundation of our approach is the automatic extraction of features for describing musical content and the use of selforganizing maps to create two-dimensional representations of music collections. The ultimate goal is effective browsing without using any meta-data. We also describe different control interfaces to the system: a regular desktop application, an iPhone implementation, an eye tracker, and a smart room interface based on Wii-mote tracking.

Research paper thumbnail of One Billion Audio Sounds from GPU-enabled Modular Synthesis

arXiv (Cornell University), Apr 26, 2021

We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized soun... more We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than any audio dataset in the literature. We also introduce torchsynth, an open source modular synthesizer that generates the synth1B1 samples on-the-fly at 16200x faster than real-time (714MHz) on a single GPU. Finally, we release two new audio datasets: FM synth timbre and subtractive synth pitch. Using these datasets, we demonstrate new rank-based evaluation criteria for existing audio representations. Finally, we propose a novel approach to synthesizer hyperparameter optimization.

Research paper thumbnail of Musical genre classification of audio signals

IEEE Transactions on Speech and Audio Processing, Jul 1, 2002

Musical genres are categorical labels created by humans to characterize pieces of music. A musica... more Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. In this paper, the automatic classification of audio signals into an hierarchy of musical genres is explored. More specifically, three feature sets for representing timbral texture, rhythmic content and pitch content are proposed. The performance and relative importance of the proposed features is investigated by training statistical pattern recognition classifiers using real-world audio collections. Both whole file and real-time frame-based classification schemes are described. Using the proposed feature sets, classification of 61% for ten musical genres is achieved. This result is comparable to results reported for human musical genre classification.

Research paper thumbnail of SpiegeLib: An automatic synthesizer programming library

Journal of The Audio Engineering Society, May 28, 2020

Research paper thumbnail of MarsyasX

The design and implementation of multimedia signal processing systems is challenging especially w... more The design and implementation of multimedia signal processing systems is challenging especially when efficiency and real-time performance is desired. In many modern applications, software systems must be able to handle multiple flows of various types of multimedia data such as audio and video. Researchers frequently have to rely on a combination of different software tools for each modality to assemble proof-of-concept systems that are inefficient, brittle and hard to maintain. Marsyas is a software framework originally developed to address these issues in the domain of audio processing. In this paper we describe MarsyasX, a new open-source cross-modal analysis framework that aims at a broader score of applications. It follows a dataflow architecture where complex networks of processing objects can be assembled to form systems that can handle multiple and different types of multimedia flows with expressiveness and efficiency.

Research paper thumbnail of Intonation: A Dataset of Quality Vocal Performances Refined by Spectral Clustering on Pitch Congruence

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Research paper thumbnail of Deep Autotuner: A Pitch Correcting Network for Singing Performances

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We introduce a data-driven approach to automatic pitch correction of solo singing performances. T... more We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts notewise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to the closest pitch among the twelve equal-tempered scale degrees. The proposed system treats pitch as a continuous value rather than relying on a set of discretized notes found in musical scores, thus allowing for improvisation and harmonization in the singing performance. We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation. Our model is trained on both incorrect intonation, for which it learns a correction, and intentional pitch variation, which it learns to preserve. The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction taskautotuning.

Research paper thumbnail of Voice Coil Actuators For Percussion Robotics

Percussion robots have successfully used a variety of actuator technologies to activate a wide ar... more Percussion robots have successfully used a variety of actuator technologies to activate a wide array of striking mechanisms. Popular types of actuators include solenoids and DC motors. However, the use of industrial strength voice coil actuators provides a compelling alternative given a desirable set of heterogeneous features and requirements that span traditional devices. Their characteristics such as high acceleration and accurate positioning enable the exploration of rendering highly accurate and expressive percussion performances.

Research paper thumbnail of Snare Drum Motion Capture Dataset

Comparative studies require a baseline reference and a documented process to capture new subject ... more Comparative studies require a baseline reference and a documented process to capture new subject data. This paper combined with its principal reference [1] presents a definitive dataset in the context of snare drum performances along with a procedure for data acquisition, and a methodology for quantitative analysis. The dataset contains video, audio, and discrete two dimensional motion data for forty standardized percussive rudiments.

Research paper thumbnail of A multimodal tangible interface for the sonification of phenological data at multiple time-scales

The study of periodic biological processes, such as when plants flower and birds arrive in the sp... more The study of periodic biological processes, such as when plants flower and birds arrive in the spring is known as Phenology. In recent years this field has gained interest from the scientific community because of the applicability of this data to the study of climate change and other ecolog- ical processes. In this paper we propose the use of tangible interfaces for interactive sonification with a specific example of a multimodal tangible interface consisting of a physical paper map and tracking of fiducial markers combined with a novel drawing interface. The designed interface enables one or more users to specify point queries with the map inter- face and to specify time queries with the drawing interface. This allows the user to explore both time and space while receiving immediate sonic feedback of their actions. This system can be used to study and explore the effects of cli- mate change, both as tool to be used by scientists, and as a way to educate and involve members of the g...

Research paper thumbnail of Computer-supported analysis of religious chant

Research paper thumbnail of Browsing Music and sound using gestures in a Self-Organized 3D Space

As digital music and sound collections increase in size there has been a lot of work in developin... more As digital music and sound collections increase in size there has been a lot of work in developing novel interfaces for browsing them. Many of these interfaces rely on automatic content analysis techniques to create representations that reflect similarities between the music pieces or sounds in the collection. Representations in 3D have the potential to convey more information but can be difficult to navigate using the traditional ways of providing input to a computer such as a keyboard and mouse. Utilizing sensors capable of sensing motion in 3-dimensions, we propose a new system for browsing music in augmented reality. Our system places audio files in a virtual cube. The placement of the files into the cube is realized through the use of audio feature extraction and self-organizing maps (SOMs). The system is controlled using gestures, and sound spatialization is utilized to provide auditory cues about the topography of the music or sound collection.

Research paper thumbnail of Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies

ACM Multimedia, Nov 30, 2011

It is our great pleasure to welcome you to the 1st International ACM Workshop on Music Informatio... more It is our great pleasure to welcome you to the 1st International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies (MIRUM). MIRUM was proposed in order to gather experts from the Music and Multimedia Information Retrieval communities, as well as other neighboring fields, and aims to provide a high-profile platform for presenting current work on Music Information Retrieval, with strong focus on user-centered and multimodal approaches. Music content is multifaceted and exists in many different representations, including audio recordings, symbolic scores, folksonomy descriptions and accompanying video material. No single representation is capable of accounting for all of the music experience, which is strongly guided by affective and subjective context- and user-dependent factors. The existence of complementary representations and information sources in multiple modalities makes music multimedia content by definition. Furthermore, the subjective and affective aspects of music pose challenges that are faced and experienced in the broad Multimedia community. Thus, we believe it is appropriate to discuss these topics in a Multimedia context. The MIRUM 2011 Call for Papers attracted 22 international technical submissions. The program committee accepted 9 papers that cover a wide variety of topics, ranging from beat tracking techniques to affective analysis of music videos. In addition, the full-day program includes a keynote speech by Dr. Roeland Ordelman (Netherlands Institute for Sound and Vision & University of Twente, The Netherlands) on exploitation possibilities of audiovisual data in the networked information society, as well as a panel on bridging opportunities for the music and multimedia domains, featuring multiple experts from the music and multimedia communities. We hope that these proceedings will serve as a valuable reference for researchers in the fields of Music and Multimedia Information Retrieval, as well as neighboring fields.

Research paper thumbnail of Multimedia Technologies for Enriched Music Performance, Production, and Consumption

Research paper thumbnail of HEAR: Holistic Evaluation of Audio Representations

arXiv (Cornell University), Mar 6, 2022

What audio embedding approach generalizes best to a wide range of downstream tasks across a varie... more What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. HEAR was launched as a NeurIPS 2021 shared challenge. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single generalpurpose audio representation can perform as holistically as the human ear.

Research paper thumbnail of Modeling Grouping Cues for Auditory Scene Analysis Using a Spectral Clustering Formulation

IGI Global eBooks, Aug 4, 2010

Research paper thumbnail of Personalizing self-organizing music spaces with anchors: design and evaluation

Multimedia Tools and Applications, Feb 16, 2017

We propose and evaluate a system for content-based visualization and exploration of music collect... more We propose and evaluate a system for content-based visualization and exploration of music collections. The system is based on a modification of Kohonen’s Self-Organizing Map algorithm and allows users to choose the locations of clusters containing acoustically similar tracks on the music space. A user study conducted to evaluate the system shows that the possibility of personalizing the music space was perceived as difficult. Conversely, the user study and objective metrics derived from users’ interactions with the interface demonstrate that the proposed system helped individuals create playlists faster and, under some circumstances, more effectively. We believe that personalized browsing interfaces are an important area of research in Multimedia Information Retrieval, and both the system and user study contribute to the growing work in this field.

Research paper thumbnail of Semi-Automatic Mono to Stereo Up-Mixing Using Sound Source Formation

Journal of The Audio Engineering Society, May 1, 2007

The papers at this Convention have been selected on the basis of a submitted abstract and extende... more The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42 nd Street,

Research paper thumbnail of Pitch Histograms in Audio and Symbolic Music Information Retrieval

Journal of New Music Research, Jun 1, 2003

In order to represent musical content, pitch and timing information is utilized in the majority o... more In order to represent musical content, pitch and timing information is utilized in the majority of existing work in Symbolic Music Information Retrieval (MIR). Symbolic representations such as MIDI allow the easy calculation of such information and its manipulation. In contrast, most of the existing work in Audio MIR uses timbral and beat information, which can be calculated using automatic computer audition techniques. In this paper, Pitch Histograms are defined and proposed as a way to represent the pitch content of music signals both in symbolic and audio form. This representation is evaluated in the context of automatic musical genre classification. A multiple-pitch detection algorithm for polyphonic signals is used to calculate Pitch Histograms for audio signals. In order to evaluate the extent and significance of errors resulting from the automatic multiple-pitch detection, automatic musical genre classification results from symbolic and audio data are compared. The comparison indicates that Pitch Histograms provide valuable information for musical genre classification. The results obtained for both symbolic and audio cases indicate that although pitch errors degrade classification performance for the audio case, Pitch Histograms can be effectively used for classification in both cases.

Research paper thumbnail of Manifold Learning Methods for Visualization and Browsing of Drum Machine Samples

Journal of The Audio Engineering Society, Feb 24, 2021

Research paper thumbnail of Assistive music browsing using self-organizing maps

Music listening is an important activity for many people. Advances in technology have made possib... more Music listening is an important activity for many people. Advances in technology have made possible the creation of music collections with thousands of songs in portable music players. Navigating these large music collections is challenging especially for users with vision and/or motion disabilities. In this paper we describe our current efforts to build effective music browsing interfaces for people with disabilities. The foundation of our approach is the automatic extraction of features for describing musical content and the use of selforganizing maps to create two-dimensional representations of music collections. The ultimate goal is effective browsing without using any meta-data. We also describe different control interfaces to the system: a regular desktop application, an iPhone implementation, an eye tracker, and a smart room interface based on Wii-mote tracking.

Research paper thumbnail of One Billion Audio Sounds from GPU-enabled Modular Synthesis

arXiv (Cornell University), Apr 26, 2021

We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized soun... more We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than any audio dataset in the literature. We also introduce torchsynth, an open source modular synthesizer that generates the synth1B1 samples on-the-fly at 16200x faster than real-time (714MHz) on a single GPU. Finally, we release two new audio datasets: FM synth timbre and subtractive synth pitch. Using these datasets, we demonstrate new rank-based evaluation criteria for existing audio representations. Finally, we propose a novel approach to synthesizer hyperparameter optimization.

Research paper thumbnail of Musical genre classification of audio signals

IEEE Transactions on Speech and Audio Processing, Jul 1, 2002

Musical genres are categorical labels created by humans to characterize pieces of music. A musica... more Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. In this paper, the automatic classification of audio signals into an hierarchy of musical genres is explored. More specifically, three feature sets for representing timbral texture, rhythmic content and pitch content are proposed. The performance and relative importance of the proposed features is investigated by training statistical pattern recognition classifiers using real-world audio collections. Both whole file and real-time frame-based classification schemes are described. Using the proposed feature sets, classification of 61% for ten musical genres is achieved. This result is comparable to results reported for human musical genre classification.

Research paper thumbnail of SpiegeLib: An automatic synthesizer programming library

Journal of The Audio Engineering Society, May 28, 2020

Research paper thumbnail of MarsyasX

The design and implementation of multimedia signal processing systems is challenging especially w... more The design and implementation of multimedia signal processing systems is challenging especially when efficiency and real-time performance is desired. In many modern applications, software systems must be able to handle multiple flows of various types of multimedia data such as audio and video. Researchers frequently have to rely on a combination of different software tools for each modality to assemble proof-of-concept systems that are inefficient, brittle and hard to maintain. Marsyas is a software framework originally developed to address these issues in the domain of audio processing. In this paper we describe MarsyasX, a new open-source cross-modal analysis framework that aims at a broader score of applications. It follows a dataflow architecture where complex networks of processing objects can be assembled to form systems that can handle multiple and different types of multimedia flows with expressiveness and efficiency.

Research paper thumbnail of Intonation: A Dataset of Quality Vocal Performances Refined by Spectral Clustering on Pitch Congruence

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Research paper thumbnail of Deep Autotuner: A Pitch Correcting Network for Singing Performances

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We introduce a data-driven approach to automatic pitch correction of solo singing performances. T... more We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts notewise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to the closest pitch among the twelve equal-tempered scale degrees. The proposed system treats pitch as a continuous value rather than relying on a set of discretized notes found in musical scores, thus allowing for improvisation and harmonization in the singing performance. We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation. Our model is trained on both incorrect intonation, for which it learns a correction, and intentional pitch variation, which it learns to preserve. The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction taskautotuning.

Research paper thumbnail of Voice Coil Actuators For Percussion Robotics

Percussion robots have successfully used a variety of actuator technologies to activate a wide ar... more Percussion robots have successfully used a variety of actuator technologies to activate a wide array of striking mechanisms. Popular types of actuators include solenoids and DC motors. However, the use of industrial strength voice coil actuators provides a compelling alternative given a desirable set of heterogeneous features and requirements that span traditional devices. Their characteristics such as high acceleration and accurate positioning enable the exploration of rendering highly accurate and expressive percussion performances.

Research paper thumbnail of Snare Drum Motion Capture Dataset

Comparative studies require a baseline reference and a documented process to capture new subject ... more Comparative studies require a baseline reference and a documented process to capture new subject data. This paper combined with its principal reference [1] presents a definitive dataset in the context of snare drum performances along with a procedure for data acquisition, and a methodology for quantitative analysis. The dataset contains video, audio, and discrete two dimensional motion data for forty standardized percussive rudiments.

Research paper thumbnail of A multimodal tangible interface for the sonification of phenological data at multiple time-scales

The study of periodic biological processes, such as when plants flower and birds arrive in the sp... more The study of periodic biological processes, such as when plants flower and birds arrive in the spring is known as Phenology. In recent years this field has gained interest from the scientific community because of the applicability of this data to the study of climate change and other ecolog- ical processes. In this paper we propose the use of tangible interfaces for interactive sonification with a specific example of a multimodal tangible interface consisting of a physical paper map and tracking of fiducial markers combined with a novel drawing interface. The designed interface enables one or more users to specify point queries with the map inter- face and to specify time queries with the drawing interface. This allows the user to explore both time and space while receiving immediate sonic feedback of their actions. This system can be used to study and explore the effects of cli- mate change, both as tool to be used by scientists, and as a way to educate and involve members of the g...

Research paper thumbnail of Computer-supported analysis of religious chant

Research paper thumbnail of Browsing Music and sound using gestures in a Self-Organized 3D Space

As digital music and sound collections increase in size there has been a lot of work in developin... more As digital music and sound collections increase in size there has been a lot of work in developing novel interfaces for browsing them. Many of these interfaces rely on automatic content analysis techniques to create representations that reflect similarities between the music pieces or sounds in the collection. Representations in 3D have the potential to convey more information but can be difficult to navigate using the traditional ways of providing input to a computer such as a keyboard and mouse. Utilizing sensors capable of sensing motion in 3-dimensions, we propose a new system for browsing music in augmented reality. Our system places audio files in a virtual cube. The placement of the files into the cube is realized through the use of audio feature extraction and self-organizing maps (SOMs). The system is controlled using gestures, and sound spatialization is utilized to provide auditory cues about the topography of the music or sound collection.