William Sethares | University of Wisconsin-Madison (original) (raw)
Papers by William Sethares
Heritage
This paper introduces the Watermark Imaging System (WImSy) which can be used to photograph, docum... more This paper introduces the Watermark Imaging System (WImSy) which can be used to photograph, document, and study sheets of paper. The WImSy provides surface images, raking light images, and transmitted light images of the paper, all in perfect alignment. We develop algorithms that exploit this alignment by combining several images together in a process that mimics both the “surface image removal” technique and the method of “high dynamic range” photographs. An improved optimization criterion and an automatic parameter selection procedure streamline the process and make it practical for art historians and conservators to extract the relevant information to study watermarks. The effectiveness of the method is demonstrated in several experiments on images taken with the WImSy at the Metropolitan Museum of Art in New York and at the Getty Museum in Los Angeles, and the results are compared with manually optimized images.
arXiv (Cornell University), Nov 28, 2022
The mridangam is a double-headed percussion instrument that plays a key role in Carnatic music co... more The mridangam is a double-headed percussion instrument that plays a key role in Carnatic music concerts. This paper presents a novel automatic transcription algorithm to classify the strokes played on the mridangam. Onset detection is first performed to segment the audio signal into individual strokes, and a feature vectors consisting of the DFT magnitude spectrum of the segmented signal are generated. A multi-layer feedforward neural network is trained using the feature vectors as inputs and the manual transcriptions as targets. Since the mridangam is a tonal instrument tuned to a given tonic, tonic invariance is an important feature of the classifier. Tonic invariance is achieved by augmenting the dataset with pitch-shifted copies of the audio. This algorithm consistently yields over 83% accuracy on a held-out test dataset.
ArXiv, 2018
This paper presents a new search algorithm called Target Image Search based on Local Features (TI... more This paper presents a new search algorithm called Target Image Search based on Local Features (TISLF) which compares target images and video source images using local features. TISLF can be used to locate frames where target images occur in a video, and by computing and comparing the matching probability matrix, estimates the time of appearance, the duration, and the time of disappearance of the target image from the video stream. The algorithm is applicable to a variety of applications such as tracking the appearance and duration of advertisements in the broadcast of a sports event, searching and labelling painting in documentaries, and searching landmarks of different cities in videos. The algorithm is compared to a deep learning method and shows competitive performance in experiments.
Version: Accepted Manuscript Link(s) to article on publisher’s website:
A primary esthetic in the performance practice of Balinese gamelan is the <i>ombak</i>... more A primary esthetic in the performance practice of Balinese gamelan is the <i>ombak</i> (Indonesian for <i>wave</i>), which is manifest in musical form, performance, and tuning. The <i>ombak</i> arises in a paired tuning system in which corresponding unisons of two instruments (or instrumental groups) are tuned to slightly different frequencies, one higher and one lower, to produce beats. Pitch classes are not necessarily tuned to octaves in an exact 2:1 frequency ratio; instead, octaves are often stretched or compressed. This paper discusses the relationship between the <i>ombak</i> rate and octave tempering, and demonstrates that the beating rate, combined with the octave tuning strategy chosen, can be modeled using a <i>tempering parameter</i> that determines the amount of stretching or compression. This model is then used to analyze tuning data of nine complete gamelan.
We present an application XronoMorph for the algorithmic generation of rhythms in the context of ... more We present an application XronoMorph for the algorithmic generation of rhythms in the context of creative composition and performance, and of musical analysis and education. XronoMorph makes use of visual and geometrical conceptualizations of rhythms, and allows the user to smoothly morph between rhythms. Sonification of the user generated geometrical constructs is possible using a built-in sampler, VST and AU plugins, or standalone synthesizers via MIDI. The algorithms are based on two underlying mathematical principles: perfect balance and well-formedness, both of which can be derived from coefficients of the discrete Fourier transform of the rhythm. The mathematical background, musical implications, and their implementation in the software are discussed.
The musical realm is a promising area in which to expect to find nontrivial topological structure... more The musical realm is a promising area in which to expect to find nontrivial topological structures. This paper describes several kinds of metrics on musical data, and explores the implications of these metrics in two ways: via techniques of classical topology where the metric space of all-possible musical data can be described explicitly, and via modern data-driven ideas of persistent homology which calculates the Betti-number bar-codes of individual musical works. Both analyses are able to recover three well known topological structures in music: the circularity of octave-reduced musical scales, the circle of fifths, and the rhythmic repetition of timelines. Applications to a variety of musical works (for example, folk music in the form of standard MIDI files) are presented, and the bar codes show many interesting features. Examples show that individual pieces may span the complete space (in which case the classical and the datadriven analyses agree), or they may span only part of the space.
Journal of Quantitative Description: Digital Media, 2021
Live-tweeting has emerged as a popular hybrid media activity during broadcasted media events. Thr... more Live-tweeting has emerged as a popular hybrid media activity during broadcasted media events. Through second screens, users are able to engage with one another and react in real time to the broadcasted content. These reactions are dynamic: they ebb and flow throughout the media event as users respond to and converse about different memorable moments. Using the first 2016 U.S. presidential debate between Hillary Clinton and Donald Trump as a case, this paper employs a temporal method for identifying resonant moments on social media during televised events by combining time series analysis, qualitative (human-in-the-loop) evaluation, and a novel natural language processing tool to identify discursive shifts before and after resonant moments. This analysis finds key differences in social media discourse about the two candidates. Notably, Trump received substantially more coverage than Clinton throughout the debate. However, a more in-depth analysis of these candidates’ resonant moments...
Handmade laid paper has the important quality that every sheet of paper formed on the same paperm... more Handmade laid paper has the important quality that every sheet of paper formed on the same papermaking mold retains a nearly identical imprint of the mold’s wire structure. These “moldmates” are identified by analyzing the recorded wire features, which are visible using transmitted light. When visual analysis is not sufficient to distinguish moldmates, three features of the mold’s wire mesh can be quantitatively analyzed using image processing techniques: watermark shape and placement, chain line intervals, and laid line density, for which a new method of analysis is introduced here. Using signal processing procedures, the frequency of the laid lines across a sheet of paper was found to fluctuate in a pattern unique to that mold. These quantitative methods were tested on a sample set of blank sheets from a 1536 edition of De re militari by Vegetius; computational analysis using any one of the three features was able to distinguish between four molds used in the group of papers. Thes...
This paper learns multi-modal embeddings from text, audio, and video views/modes of data in order... more This paper learns multi-modal embeddings from text, audio, and video views/modes of data in order to improve upon downstream sentiment classification. The experimental framework also allows investigation of the relative contributions of the individual views in the final multi-modal embedding. Individual features derived from the three views are combined into a multi-modal embedding using Deep Canonical Correlation Analysis (DCCA) in two ways i) One-Step DCCA and ii) TwoStep DCCA. This paper learns text embeddings using BERT, the current state-of-the-art in text encoders. We posit that this highly optimized algorithm dominates over the contribution of other views, though each view does contribute to the final result. Classification tasks are carried out on two benchmark data sets and on a new Debate Emotion data set, and together these demonstrate that the one-Step DCCA outperforms the current state-of-the-art in learning multi-modal embeddings.
We present a class of novel metrics for measuring the distance between any two periodic scales wh... more We present a class of novel metrics for measuring the distance between any two periodic scales whatever their precise tuning or cardinality. The metrics have some important applications: (1) finding effective lower-dimensional temperaments of higherdimensional tunings (such as just intonation); (2) finding simple scales that effectively approximate complex scales (e.g., using equally-tuned and well-formed scales to approximate Fokker periodicity blocks and pairwise well-formed scales); (3) finding ways to map the notes of any arbitrary scale to a button lattice controller (a generalized keyboard) so as to maximise geometrical consistency and playability; (4) comparing the distance of various scales (such as equal tunings with different numbers of notes per octave) for analytical, maybe even compositional, purposes; (5) a generalized method to determine the similarity of different “pitch class sets” that is not dependent on the use of a low cardinality equal tunings such 12-tone, thu...
Standard word embedding algorithms learn vector representations from large corpora of text docume... more Standard word embedding algorithms learn vector representations from large corpora of text documents in an unsupervised fashion. However, the quality of word embeddings learned from these algorithms is affected by the size of training data sets. Thus, applications of these algorithms in domains with only moderate amounts of available data is limited. In this paper we introduce an algorithm that learns word embeddings jointly with a classifier. Our algorithm is called SWESA (Supervised Word Embeddings for Sentiment Analysis). SWESA leverages document label information to learn vector representations of words from a modest corpus of text documents by solving an optimization problem that minimizes a cost function with respect to both word embeddings and the weight vector used for classification. Experiments on several real world data sets show that SWESA has superior performance on domains with limited data, when compared to previously suggested approaches to word embeddings and sentim...
ArXiv, 2019
The memory consumption of most Convolutional Neural Network (CNN) architectures grows rapidly wit... more The memory consumption of most Convolutional Neural Network (CNN) architectures grows rapidly with increasing depth of the network, which is a major constraint for efficient network training and inference on modern GPUs with limited memory. Several studies show that the feature maps (as generated after the convolutional layers) are the main bottleneck in this memory problem. Often, these feature maps mimic natural photographs in the sense that their energy is concentrated in the spectral domain. Although embedding CNN architectures in the spectral domain is widely exploited to accelerate the training process, we demonstrate that it is also possible to use the spectral domain to reduce the memory footprint by proposing a Spectral Domain Convolutional Neural Network (SpecNet) that performs both the convolution and the activation operations in the spectral domain. SpecNet exploits a configurable threshold to force small values in the feature maps to zero, allowing the feature maps to b...
Uniform strings have a harmonic sound; nonuniform strings have an inharmonic sound. This paper ex... more Uniform strings have a harmonic sound; nonuniform strings have an inharmonic sound. This paper experiments with musical instruments based on nonuniform/inharmonic strings. Given a precise description of the string, its spectrum can be calculated using standard techniques. Dissonance curves are used to motivate specific choices of spectrum. A particular inharmonic string consisting of three segments (two equal unwound segments surrounding a thicker wound portion) is used in the construction of the hyperpiano. A second experiment designs a string with overtones that lie on steps of the 10-tone equal tempered scale. The strings are sampled, and digital (software) versions of the instruments are made available along with a call for composers interested in writing for these new instruments.
Aerospace Structures and MaterialsAerospace Engineerin
The sieve-like molds used early paper manufacturing leave characteristic chain line patterns that... more The sieve-like molds used early paper manufacturing leave characteristic chain line patterns that can help determine if two pieces of paper were made using the same mold [1]. Such moldmates can help in establishing chronology, suggest paper preferences, and indicate periods of intense activity of an artist, and the study of Rembrandt’s prints has occupied a prominent place within this scholarship [2]. There are two basic steps to the automation of the analysis. First, the chain lines must be accurately marked, extracting points and lines from the x-ray images. Second, the markings from many x-rays must be compared to find matches. This paper presents improved algorithms for both of these tasks. Section 2 follows [5] in replacing the classic Radon transform approach of [3] (which requires that the chain lines be straight) with a local method that outputs marked grids of points. The method segments the images, corrects for tilt, filters, conducts a (local) Radon transform and then con...
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
Imposing the style of one image onto another is called style transfer. For example, the style of ... more Imposing the style of one image onto another is called style transfer. For example, the style of a Van Gogh painting might be imposed on a photograph to yield an interesting hybrid. This paper applies the adaptive normalization used for image style transfer to language semantics, i.e., the style is the way the words are said (tone of voice and facial expressions) and these are style-transferred onto the text. The goal is to learn richer representations for multi-modal utterances using style-transferred multi-modal features. The proposed Style-Transfer Transformer (STT) grafts a stepped styled adaptive layer-normalization onto a transformer network, the output from which is used in sentiment analysis and emotion recognition problems. In addition to achieving performance on par with the state-of-the art (but using less than a third of the model parameters), we examine the relative contributions of each mode when used in the downstream applications.
Journal of Historians of Netherlandish Art, 2021
Identifying, comparing, and matching watermarks in pre-machine-made papers has occupied scholars ... more Identifying, comparing, and matching watermarks in pre-machine-made papers has occupied scholars of prints and drawings for some time. One popular but arduous approach is to overlay, either manually or digitally, an image of the watermark in question with its presumed match from a known source. For example, a newly discovered watermark in a Rembrandt print might be compared to a similar one reproduced in Erik Hinterding’s Rembrandt as an Etcher (2006). Such an overlay can confirm the pair as identical, i.e., as moldmates, or reveal their differences. But creating an accurate overlay for two images with different scales, orientations, or resolutions using standard image-manipulation tools can be time consuming and, ultimately, unsuccessful. Part One of this article describes advances in the emerging field of computational art history, specifically the development of digital image processing software, that can be used to semi-automatically create a reliable animated overlay of two wat...
Sensors, 2021
Pyroelectric Infrared (PIR) sensors are low-cost, low-power, and highly reliable sensors that hav... more Pyroelectric Infrared (PIR) sensors are low-cost, low-power, and highly reliable sensors that have been widely used in smart environments. Indoor localization systems can be categorized as wearable and non-wearable systems, where the latter are also known as device-free localization systems. Since the binary PIR sensor detects only the presence of a human motion in its field of view (FOV) without any other information about the actual location, utilizing the information of overlapping FOV of multiple sensors can be useful for localization. In this study, a PIR detector and sensing signal processing algorithms were designed based on the characteristics of the PIR sensor. We applied the designed PIR detector as a sensor node to create a non-wearable cooperative indoor human localization system. To improve the system performance, signal processing algorithms and refinement schemes (i.e., the Kalman filter, a Transferable Belief Model, and a TBM-based hybrid approach (TBM + Kalman filte...
Journal of Mathematics and Music, 2020
A primary esthetic in the performance practice of Balinese gamelan is the ombak (Indonesian for w... more A primary esthetic in the performance practice of Balinese gamelan is the ombak (Indonesian for wave), which is manifest in musical form, performance, and tuning. The ombak arises in a paired tuning system in which corresponding unisons of two instruments (or instrumental groups) are tuned to slightly different frequencies, one higher and one lower, to produce beats. Pitch classes are not necessarily tuned to octaves in an exact 2:1 frequency ratio; instead, octaves are often stretched or compressed. This paper discusses the relationship between the ombak rate and octave tempering, and demonstrates that the beating rate, combined with the octave tuning strategy chosen, can be modeled using a tempering parameter that determines the amount of stretching or compression. This model is then used to analyze tuning data of nine complete gamelan.
Heritage
This paper introduces the Watermark Imaging System (WImSy) which can be used to photograph, docum... more This paper introduces the Watermark Imaging System (WImSy) which can be used to photograph, document, and study sheets of paper. The WImSy provides surface images, raking light images, and transmitted light images of the paper, all in perfect alignment. We develop algorithms that exploit this alignment by combining several images together in a process that mimics both the “surface image removal” technique and the method of “high dynamic range” photographs. An improved optimization criterion and an automatic parameter selection procedure streamline the process and make it practical for art historians and conservators to extract the relevant information to study watermarks. The effectiveness of the method is demonstrated in several experiments on images taken with the WImSy at the Metropolitan Museum of Art in New York and at the Getty Museum in Los Angeles, and the results are compared with manually optimized images.
arXiv (Cornell University), Nov 28, 2022
The mridangam is a double-headed percussion instrument that plays a key role in Carnatic music co... more The mridangam is a double-headed percussion instrument that plays a key role in Carnatic music concerts. This paper presents a novel automatic transcription algorithm to classify the strokes played on the mridangam. Onset detection is first performed to segment the audio signal into individual strokes, and a feature vectors consisting of the DFT magnitude spectrum of the segmented signal are generated. A multi-layer feedforward neural network is trained using the feature vectors as inputs and the manual transcriptions as targets. Since the mridangam is a tonal instrument tuned to a given tonic, tonic invariance is an important feature of the classifier. Tonic invariance is achieved by augmenting the dataset with pitch-shifted copies of the audio. This algorithm consistently yields over 83% accuracy on a held-out test dataset.
ArXiv, 2018
This paper presents a new search algorithm called Target Image Search based on Local Features (TI... more This paper presents a new search algorithm called Target Image Search based on Local Features (TISLF) which compares target images and video source images using local features. TISLF can be used to locate frames where target images occur in a video, and by computing and comparing the matching probability matrix, estimates the time of appearance, the duration, and the time of disappearance of the target image from the video stream. The algorithm is applicable to a variety of applications such as tracking the appearance and duration of advertisements in the broadcast of a sports event, searching and labelling painting in documentaries, and searching landmarks of different cities in videos. The algorithm is compared to a deep learning method and shows competitive performance in experiments.
Version: Accepted Manuscript Link(s) to article on publisher’s website:
A primary esthetic in the performance practice of Balinese gamelan is the <i>ombak</i>... more A primary esthetic in the performance practice of Balinese gamelan is the <i>ombak</i> (Indonesian for <i>wave</i>), which is manifest in musical form, performance, and tuning. The <i>ombak</i> arises in a paired tuning system in which corresponding unisons of two instruments (or instrumental groups) are tuned to slightly different frequencies, one higher and one lower, to produce beats. Pitch classes are not necessarily tuned to octaves in an exact 2:1 frequency ratio; instead, octaves are often stretched or compressed. This paper discusses the relationship between the <i>ombak</i> rate and octave tempering, and demonstrates that the beating rate, combined with the octave tuning strategy chosen, can be modeled using a <i>tempering parameter</i> that determines the amount of stretching or compression. This model is then used to analyze tuning data of nine complete gamelan.
We present an application XronoMorph for the algorithmic generation of rhythms in the context of ... more We present an application XronoMorph for the algorithmic generation of rhythms in the context of creative composition and performance, and of musical analysis and education. XronoMorph makes use of visual and geometrical conceptualizations of rhythms, and allows the user to smoothly morph between rhythms. Sonification of the user generated geometrical constructs is possible using a built-in sampler, VST and AU plugins, or standalone synthesizers via MIDI. The algorithms are based on two underlying mathematical principles: perfect balance and well-formedness, both of which can be derived from coefficients of the discrete Fourier transform of the rhythm. The mathematical background, musical implications, and their implementation in the software are discussed.
The musical realm is a promising area in which to expect to find nontrivial topological structure... more The musical realm is a promising area in which to expect to find nontrivial topological structures. This paper describes several kinds of metrics on musical data, and explores the implications of these metrics in two ways: via techniques of classical topology where the metric space of all-possible musical data can be described explicitly, and via modern data-driven ideas of persistent homology which calculates the Betti-number bar-codes of individual musical works. Both analyses are able to recover three well known topological structures in music: the circularity of octave-reduced musical scales, the circle of fifths, and the rhythmic repetition of timelines. Applications to a variety of musical works (for example, folk music in the form of standard MIDI files) are presented, and the bar codes show many interesting features. Examples show that individual pieces may span the complete space (in which case the classical and the datadriven analyses agree), or they may span only part of the space.
Journal of Quantitative Description: Digital Media, 2021
Live-tweeting has emerged as a popular hybrid media activity during broadcasted media events. Thr... more Live-tweeting has emerged as a popular hybrid media activity during broadcasted media events. Through second screens, users are able to engage with one another and react in real time to the broadcasted content. These reactions are dynamic: they ebb and flow throughout the media event as users respond to and converse about different memorable moments. Using the first 2016 U.S. presidential debate between Hillary Clinton and Donald Trump as a case, this paper employs a temporal method for identifying resonant moments on social media during televised events by combining time series analysis, qualitative (human-in-the-loop) evaluation, and a novel natural language processing tool to identify discursive shifts before and after resonant moments. This analysis finds key differences in social media discourse about the two candidates. Notably, Trump received substantially more coverage than Clinton throughout the debate. However, a more in-depth analysis of these candidates’ resonant moments...
Handmade laid paper has the important quality that every sheet of paper formed on the same paperm... more Handmade laid paper has the important quality that every sheet of paper formed on the same papermaking mold retains a nearly identical imprint of the mold’s wire structure. These “moldmates” are identified by analyzing the recorded wire features, which are visible using transmitted light. When visual analysis is not sufficient to distinguish moldmates, three features of the mold’s wire mesh can be quantitatively analyzed using image processing techniques: watermark shape and placement, chain line intervals, and laid line density, for which a new method of analysis is introduced here. Using signal processing procedures, the frequency of the laid lines across a sheet of paper was found to fluctuate in a pattern unique to that mold. These quantitative methods were tested on a sample set of blank sheets from a 1536 edition of De re militari by Vegetius; computational analysis using any one of the three features was able to distinguish between four molds used in the group of papers. Thes...
This paper learns multi-modal embeddings from text, audio, and video views/modes of data in order... more This paper learns multi-modal embeddings from text, audio, and video views/modes of data in order to improve upon downstream sentiment classification. The experimental framework also allows investigation of the relative contributions of the individual views in the final multi-modal embedding. Individual features derived from the three views are combined into a multi-modal embedding using Deep Canonical Correlation Analysis (DCCA) in two ways i) One-Step DCCA and ii) TwoStep DCCA. This paper learns text embeddings using BERT, the current state-of-the-art in text encoders. We posit that this highly optimized algorithm dominates over the contribution of other views, though each view does contribute to the final result. Classification tasks are carried out on two benchmark data sets and on a new Debate Emotion data set, and together these demonstrate that the one-Step DCCA outperforms the current state-of-the-art in learning multi-modal embeddings.
We present a class of novel metrics for measuring the distance between any two periodic scales wh... more We present a class of novel metrics for measuring the distance between any two periodic scales whatever their precise tuning or cardinality. The metrics have some important applications: (1) finding effective lower-dimensional temperaments of higherdimensional tunings (such as just intonation); (2) finding simple scales that effectively approximate complex scales (e.g., using equally-tuned and well-formed scales to approximate Fokker periodicity blocks and pairwise well-formed scales); (3) finding ways to map the notes of any arbitrary scale to a button lattice controller (a generalized keyboard) so as to maximise geometrical consistency and playability; (4) comparing the distance of various scales (such as equal tunings with different numbers of notes per octave) for analytical, maybe even compositional, purposes; (5) a generalized method to determine the similarity of different “pitch class sets” that is not dependent on the use of a low cardinality equal tunings such 12-tone, thu...
Standard word embedding algorithms learn vector representations from large corpora of text docume... more Standard word embedding algorithms learn vector representations from large corpora of text documents in an unsupervised fashion. However, the quality of word embeddings learned from these algorithms is affected by the size of training data sets. Thus, applications of these algorithms in domains with only moderate amounts of available data is limited. In this paper we introduce an algorithm that learns word embeddings jointly with a classifier. Our algorithm is called SWESA (Supervised Word Embeddings for Sentiment Analysis). SWESA leverages document label information to learn vector representations of words from a modest corpus of text documents by solving an optimization problem that minimizes a cost function with respect to both word embeddings and the weight vector used for classification. Experiments on several real world data sets show that SWESA has superior performance on domains with limited data, when compared to previously suggested approaches to word embeddings and sentim...
ArXiv, 2019
The memory consumption of most Convolutional Neural Network (CNN) architectures grows rapidly wit... more The memory consumption of most Convolutional Neural Network (CNN) architectures grows rapidly with increasing depth of the network, which is a major constraint for efficient network training and inference on modern GPUs with limited memory. Several studies show that the feature maps (as generated after the convolutional layers) are the main bottleneck in this memory problem. Often, these feature maps mimic natural photographs in the sense that their energy is concentrated in the spectral domain. Although embedding CNN architectures in the spectral domain is widely exploited to accelerate the training process, we demonstrate that it is also possible to use the spectral domain to reduce the memory footprint by proposing a Spectral Domain Convolutional Neural Network (SpecNet) that performs both the convolution and the activation operations in the spectral domain. SpecNet exploits a configurable threshold to force small values in the feature maps to zero, allowing the feature maps to b...
Uniform strings have a harmonic sound; nonuniform strings have an inharmonic sound. This paper ex... more Uniform strings have a harmonic sound; nonuniform strings have an inharmonic sound. This paper experiments with musical instruments based on nonuniform/inharmonic strings. Given a precise description of the string, its spectrum can be calculated using standard techniques. Dissonance curves are used to motivate specific choices of spectrum. A particular inharmonic string consisting of three segments (two equal unwound segments surrounding a thicker wound portion) is used in the construction of the hyperpiano. A second experiment designs a string with overtones that lie on steps of the 10-tone equal tempered scale. The strings are sampled, and digital (software) versions of the instruments are made available along with a call for composers interested in writing for these new instruments.
Aerospace Structures and MaterialsAerospace Engineerin
The sieve-like molds used early paper manufacturing leave characteristic chain line patterns that... more The sieve-like molds used early paper manufacturing leave characteristic chain line patterns that can help determine if two pieces of paper were made using the same mold [1]. Such moldmates can help in establishing chronology, suggest paper preferences, and indicate periods of intense activity of an artist, and the study of Rembrandt’s prints has occupied a prominent place within this scholarship [2]. There are two basic steps to the automation of the analysis. First, the chain lines must be accurately marked, extracting points and lines from the x-ray images. Second, the markings from many x-rays must be compared to find matches. This paper presents improved algorithms for both of these tasks. Section 2 follows [5] in replacing the classic Radon transform approach of [3] (which requires that the chain lines be straight) with a local method that outputs marked grids of points. The method segments the images, corrects for tilt, filters, conducts a (local) Radon transform and then con...
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
Imposing the style of one image onto another is called style transfer. For example, the style of ... more Imposing the style of one image onto another is called style transfer. For example, the style of a Van Gogh painting might be imposed on a photograph to yield an interesting hybrid. This paper applies the adaptive normalization used for image style transfer to language semantics, i.e., the style is the way the words are said (tone of voice and facial expressions) and these are style-transferred onto the text. The goal is to learn richer representations for multi-modal utterances using style-transferred multi-modal features. The proposed Style-Transfer Transformer (STT) grafts a stepped styled adaptive layer-normalization onto a transformer network, the output from which is used in sentiment analysis and emotion recognition problems. In addition to achieving performance on par with the state-of-the art (but using less than a third of the model parameters), we examine the relative contributions of each mode when used in the downstream applications.
Journal of Historians of Netherlandish Art, 2021
Identifying, comparing, and matching watermarks in pre-machine-made papers has occupied scholars ... more Identifying, comparing, and matching watermarks in pre-machine-made papers has occupied scholars of prints and drawings for some time. One popular but arduous approach is to overlay, either manually or digitally, an image of the watermark in question with its presumed match from a known source. For example, a newly discovered watermark in a Rembrandt print might be compared to a similar one reproduced in Erik Hinterding’s Rembrandt as an Etcher (2006). Such an overlay can confirm the pair as identical, i.e., as moldmates, or reveal their differences. But creating an accurate overlay for two images with different scales, orientations, or resolutions using standard image-manipulation tools can be time consuming and, ultimately, unsuccessful. Part One of this article describes advances in the emerging field of computational art history, specifically the development of digital image processing software, that can be used to semi-automatically create a reliable animated overlay of two wat...
Sensors, 2021
Pyroelectric Infrared (PIR) sensors are low-cost, low-power, and highly reliable sensors that hav... more Pyroelectric Infrared (PIR) sensors are low-cost, low-power, and highly reliable sensors that have been widely used in smart environments. Indoor localization systems can be categorized as wearable and non-wearable systems, where the latter are also known as device-free localization systems. Since the binary PIR sensor detects only the presence of a human motion in its field of view (FOV) without any other information about the actual location, utilizing the information of overlapping FOV of multiple sensors can be useful for localization. In this study, a PIR detector and sensing signal processing algorithms were designed based on the characteristics of the PIR sensor. We applied the designed PIR detector as a sensor node to create a non-wearable cooperative indoor human localization system. To improve the system performance, signal processing algorithms and refinement schemes (i.e., the Kalman filter, a Transferable Belief Model, and a TBM-based hybrid approach (TBM + Kalman filte...
Journal of Mathematics and Music, 2020
A primary esthetic in the performance practice of Balinese gamelan is the ombak (Indonesian for w... more A primary esthetic in the performance practice of Balinese gamelan is the ombak (Indonesian for wave), which is manifest in musical form, performance, and tuning. The ombak arises in a paired tuning system in which corresponding unisons of two instruments (or instrumental groups) are tuned to slightly different frequencies, one higher and one lower, to produce beats. Pitch classes are not necessarily tuned to octaves in an exact 2:1 frequency ratio; instead, octaves are often stretched or compressed. This paper discusses the relationship between the ombak rate and octave tempering, and demonstrates that the beating rate, combined with the octave tuning strategy chosen, can be modeled using a tempering parameter that determines the amount of stretching or compression. This model is then used to analyze tuning data of nine complete gamelan.
Models of the perceived distance between pairs of pitch collections (such as chords, scales, keys... more Models of the perceived distance between pairs of pitch collections (such as chords, scales, keys, and the virtual and spectral pitches heard in response to complex tones or chords) are a core component of broader models of the perception of tonality as a whole. Numerous different distance measures have been proposed, including voice-leading, psychoacoustic, and pitch and interval class distances; but, so far, there has been no attempt to bind these different measures into a single mathematical or conceptual framework, nor to incorporate the uncertain or probabilistic nature of pitch perception (whereby tones with similar frequencies may, or may not, be heard as having the same pitch).
To achieve this aim, we embed pitch collections in novel multi-way expectation arrays, and show how metrics between such arrays can model the perceived dissimilarity of the pitch collections they embed. By modeling the uncertainties of human pitch perception, expectation arrays indicate the expected number of tones, ordered pairs of tones, ordered triples of tones and so forth, that are heard as having any given pitch, dyad of pitches, triad of pitches, and so forth. The pitches can be either absolute or relative (in which case the arrays are invariant with respect to transposition).
We provide a number of examples that show how the metrics accord well with musical intuition, and suggest some ways in which this work may be developed.
Uniform strings have a harmonic sound; nonuniform strings have an inharmonic sound. This paper ex... more Uniform strings have a harmonic sound; nonuniform strings have an inharmonic sound. This paper experiments with musical instruments based on nonuniform/inharmonic strings. Given a precise description of the string, its spectrum can be calculated using standard techniques. Dissonance curves are used to motivate specific choices of spectrum. A particular inharmonic string consisting of three segments (two equal unwound segments surrounding a thicker wound portion) is used in the construction of the hyperpiano. A second experiment designs a string with overtones that lie on steps of the 10-tone equal tempered scale. The strings are sampled, and digital (software) versions of the instruments are made available along with a call for composers interested in writing for these new instruments.
This paper describes a design procedure for a musical instrument based on inharmonic (nonuniform)... more This paper describes a design procedure for a musical instrument based on inharmonic (nonuniform) strings. Fabricating nonuniform strings from commercially available strings constrains the possible string diameters, and hence the possible inharmonicities. Detailed simulations of the strings are combined with a measure of sensory dissonance (or roughness) to help narrow down the remaining possibilities. A particularly intriguing variation is a string that consists of three segments: two equal unwound segments surrounding a thicker wound portion. The corresponding musical scale, built on the 12th root of 4, is called the hyperoctave. A standard piano is modified to play in this tuning using these inharmonic strings; this instrument is called the hyperpiano.