Modelling long- and short-term structure in symbolic music with attention and recurrence (original) (raw)
Related papers
Sequence Generation using Deep Recurrent Networks and Embeddings: A study case in music
ArXiv, 2020
Automatic generation of sequences has been a highly explored field in the last years. In particular, natural language processing and automatic music composition have gained importance due to the recent advances in machine learning and Neural Networks with intrinsic memory mechanisms such as Recurrent Neural Networks. This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition. The proposed approach considers music theory concepts such as transposition, and uses data transformations (embeddings) to introduce semantic meaning and improve the quality of the generated melodies. A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically, measuring the tonality of the musical compositions.
.Hierarchical Recurrent Neural Networks for Conditional Melody Generation with Long-term Structure
Proc. of the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021(virtual), 2021
The rise of deep learning technologies has quickly advanced many fields, including that of generative music systems. There exist a number of systems that allow for the generation of good sounding short snippets, yet, these generated snippets often lack an overarching, longer-term structure. In this work, we propose CM-HRNN: a conditional melody generation model based on a hierarchical recurrent neural network. This model allows us to generate melodies with long-term structures based on given chord accompaniments. We also propose a novel, concise event-based representation to encode musical lead sheets while retaining the notes' relative position within the bar with respect to the musical meter. With this new data representation, the proposed architecture can simultaneously model the rhythmic, as well as the pitch structures in an effective way. Melodies generated by the proposed model were extensively evaluated in quantitative experiments as well as a user study to ensure the musical quality of the output as well as to evaluate if they contain repeating patterns. We also compared the system with the state-of-the-art AttentionRNN. This comparison shows that melodies generated by CM-HRNN contain more repeated patterns (i.e., higher compression ratio) and a lower tonal tension (i.e., more tonally concise). Results from our listening test indicate that CM-HRNN outperforms AttentionRNN in terms of long-term structure and overall rating.
Rethinking Recurrent Latent Variable Model for Music Composition
2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), 2018
We present a model for capturing musical features and creating novel sequences of music, called the Convolutional-Variational Recurrent Neural Network. To generate sequential data, the model uses an encoder-decoder architecture with latent probabilistic connections to capture the hidden structure of music. Using the sequence-to-sequence model, our generative model can exploit samples from a prior distribution and generate a longer sequence of music. We compare the performance of our proposed model with other types of Neural Networks using the criteria of Information Rate that is implemented by Variable Markov Oracle, a method that allows statistical characterization of musical information dynamics and detection of motifs in a song. Our results suggest that the proposed model has a better statistical resemblance to the musical structure of the training data, which improves the creation of new sequences of music in the style of the originals.
GRUV: Algorithmic Music Generation using Recurrent Neural Networks
We compare the performance of two different types of recurrent neural networks (RNNs) for the task of algorithmic music generation, with audio waveforms as input. In particular, we focus on RNNs that have a sophisticated gating mechanism , namely, the Long Short-Term Memory (LSTM) network and the recently introduced Gated Recurrent Unit (GRU). Our results indicate that the generated outputs of the LSTM network were significantly more musically plausible than those of the GRU.
Automatic Music Generation using Long Short-Term Memory Neural Networks
Recent developments in neural networks and sequential models has produced state of the art results in eld signal processing and sequential data generation. For us, music is a pleasing sound, and everyone listens to music very frequently, but computers represent it as an sequential data and all sequential data generation model can be used to generate music. Our work focuses on generating music using LSTMs. LSTMs have good capability to remember previous data, they have memory unlike general RNNs. A good music must not be abruptly changing tones and themes, it must be consistent and for this purpose our model must remember what was generated previously. So, LSTMs are the perfect choice for this context-based data generation. We used Keras[2], an open-source software library that provides a Python interface for artificial neural networks, to create and train model. Most impressive results were produced by Multi-layered Char-RNN with LSTM Cell. The data is represented with ABC le format for easier access and better understanding. We preprocess the data to make it more robust and understandable for neural network and decode it back for human interpretation, the preprocessing algorithms and data representation is thoroughly discussed. The model used in this paper learn the sequences of polyphonic musical notes over a Stacked-Multilayered Char-RNN with LSTM cell. The model required and do have have the potential to recall past details of a musical sequence and its structure for better learning because of memory cells present in LSTM cells. The whole architecture with data ow and training and testing scores are explained.
Music Generation Using Recurrent Neural Networks
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022
Over the past years music has been continually evolving through the its tempos and beats as well as it's melody. Traditionally, music is produced by a group of musicians with different instruments combining together to create a final synchronised product. In recent years, harmonies and beats were always considered to be generated manually. However, with the advent of digital technologies and software, it has become possible for machines to generate music automatically at an alarming pace. The purpose of this research is to propose a method for creating musical notes using Recurrent Neural Networks (RNN), specifically Long Short-Term Memory (LSTM) networks. To implement this algorithm, a model is created and the data is represented in the form of a percussive instrument digital interface (MIDI) file for easy access and interpretation. The process of preparing the data for input into the model is also discussed, as well as techniques for receiving, processing, and storing MIDI files for use as input.To enhance its learning capabilities, the model should be able to remember previous details of a musical sequence and its structure. This paper discusses the use of a layered architecture in the LSTM model and how its connections interweave to create a neural network.
Proceedings of the ... AAAI Conference on Artificial Intelligence, 2023
Following the success of the transformer architecture in the natural language domain, transformer-like architectures have been widely applied to the domain of symbolic music recently. Symbolic music and text, however, are two different modalities. Symbolic music contains multiple attributes, both absolute attributes (e.g., pitch) and relative attributes (e.g., pitch interval). These relative attributes shape human perception of musical motifs. These important relative attributes, however, are mostly ignored in existing symbolic music modeling methods with the main reason being the lack of a musically-meaningful embedding space where both the absolute and relative embeddings of the symbolic music tokens can be efficiently represented. In this paper, we propose the Fundamental Music Embedding (FME) for symbolic music based on a bias-adjusted sinusoidal encoding within which both the absolute and the relative attributes can be embedded and the fundamental musical properties (e.g., translational invariance) are explicitly preserved. Taking advantage of the proposed FME, we further propose a novel attention mechanism based on the relative index, pitch and onset embeddings (RIPO attention) such that the musical domain knowledge can be fully utilized for symbolic music modeling. Experiment results show that our proposed model: RIPO transformer which utilizes FME and RIPO attention outperforms the state-of-the-art transformers (i.e., music transformer, linear transformer) in a melody completion task. Moreover, using the RIPO transformer in a downstream music generation task, we notice that the notorious degeneration phenomenon no longer exists and the music generated by the RIPO transformer outperforms the music generated by state-of-the-art transformer models in both subjective and objective evaluations. The code of the proposed method is available online: github.com/guozixunnicolas/FundamentalMusicEmbedding
Implementation of Music Generation using Recurrent Neural Network
2021
With the development of deep learning, neural networks are increasingly used in various art fields such as music, literature and pictures, and even comparable to humans. This paper proposes a music generation model based on bidirectional recurrent neural network, which can effectively explore the complex relationship between notes and obtain the conditional probability from time and pitch dimensions. The existing system usually ignored the information in the negative time direction, however which is non-trivial in the music prediction task, so we propose a bidirectional LSTM model to generate the note sequence. Experiments with classical piano datasets have demonstrated that we achieve high performance in music generation tasks compared to the existing unidirectional biaxial LSTM method KEYWORDSmusic generation; bidirectional recurrent neural network; deep learning
MellisAI - An AI Generated Music Composer Using RNN-LSTMs
International Journal of Machine Learning and Computing, 2020
The art of composing music can be automated with deep learning along with the knowledge of a few implicit heuristics. In this proposed paper, we aim at building a model that composes Carnatic oriented contemporary tune, that is based on the features of given training song. It implements the task of automated musical composition using a pipeline where various key features of the resultant tune are constructed separately step by step and finally combined into a complete piece. LSTM models were used for creating four different modules namely, the Tune module, the Motif module, the Endnote module, and the Gamaka module. Four models were built namely-Motif, Tune, End Note, and Gamaka whose training accuracy was 86%, 98%, 60%, and 72% respectively. Our work focuses primarily on generating a user friendly Carnatic music composer that accepts the initial user phrase to compose a simultaneous sequence of notes and motifs for the required duration.
Music Generation using Recurrent Neural Network
2021
1Assistant Professor, Dept. Of CSE, SCSVMV (Deemed to be University), Kanchipuram, TamilNadu, India 2Student, Dept. Of CSE, SCSVMV (Deemed to be University), Kanchipuram, TamilNadu, India 3Student, Dept. Of CSE, SCSVMV (Deemed to be University), Kanchipuram, TamilNadu, India ---------------------------------------------------------------------***---------------------------------------------------------------------Abstract The upward thrust in computational assets and as much as the current increase in recurrent neural network architectures, music technology could currently be realistic for large-scale use of data. The most common recurrent network used for modeling long-run dependencies is the long shorttime memory (LSTM) network.