State of art of real-time singing voice synthesis (original) (raw)

Challenges and Perspectives on Real-time Singing Voice Synthesis

RITA, 2020

This paper describes the state of art of real-time singing voice synthesis and presents its concept, applications and technical aspects. A technological mapping and a literature review are made in order to indicate the latest developments in this area. We made a brief comparative analysis among the selected works. Finally, we have discussed challenges and future research problems.

Singing Voice Synthesis: History, Current Work, and Future Directions

Computer Music Journal, 1996

This article will briefly review the history of singing voice synthesis, and will highlight some currently active projects in this area. It will survey and discuss the benefits and trade-offs of using different techniques and models. Performance control, some attractions of composing with vocal models, and exciting directions for future research will be highlighted.

A singing voice synthesis system based on sinusoidal modeling

… Speech, and Signal …, 1997

Although sinusoidal models have been demonstrated to be capable of high-quality musical instrument synthesis 1], speech modi cation 2], and speech synthesis 3], little exploration of the application of these models to the synthesis of singing voice has been undertaken. In this paper, we propose a system framework similar to that employed in concatenation-based text-to-speech synthesizers, and describe its extension to the synthesis of singing voice. The power and exibility of the sinusoidal model used in the waveform synthesis portion of the system 1] enables high-quality, computationally-e cient synthesis and the incorporation of musical qualities such as vibrato and spectral tilt variation. Modeling of segmental phonetic characteristics is achieved by employing a \unit selection" procedure that selects sinusoidally-modeled segments from an inventory of singing voice data collected from a human vocalist. The system, called Lyricos, is capable of synthesizing very natural-sounding singing that maintains the characteristics and perceived identity of the analyzed vocalist.

Realtime and accurate musical control of expression in singing synthesis

Journal on Multimodal User Interfaces, 2007

In this paper, we describe a full computer-based musical instrument allowing realtime synthesis of expressive singing voice. The expression results from the continuous action of an interpreter through a gestural control interface. In this context, expressive features of voice are discussed. New real-time implementations of a spectral model of glottal flow (CALM) are described. These interactive modules are then used to identify and quantify voice quality dimensions. Experiments are conducted in order to develop a first framework for voice quality control. The representation of vocal tract and the control of several vocal tract movements are explained and a solution is proposed and integrated. Finally, some typical controllers are connected to the system and expressivity is evaluated.

Trends on the synthesis of the singing voice: technical problems and perspectives

Sample abstract In this paper we will discuss current research on the synthesis of the singing voice, its technical problems and esthetic issues, and the perspective of creating a vocal synthesizer which could be accessible to composers, musicians and musicologists, as a creative tool for composing, performing, and interpreting forgotten voice techniques of various cultures and bygone times.

Realtime and Accurate Musical Control of Expression in Voice Synthesis

In this paper, we describe a full computer-based musical instrument allowing realtime synthesis of expressive singing voice. The expression results from the continuous action of an interpreter through a gestural control interface. In this context, expressive features of voice are discussed. New real-time implementations of a spectral model of glottal flow (CALM) are described. These interactive modules are then used to identify and quantify voice quality dimensions. Experiments are conducted in order to develop a first framework for voice quality control. The representation of vocal tract and the control of several vocal tract movements are explained and a solution is proposed and integrated. Finally, some typical controllers are connected to the system and expressivity is evaluated.

PERFORMANCE-DRIVEN CONTROL FOR SAMPLE-BASED SINGING VOICE SYNTHESIS

In this paper we address the expressive control of singing voice synthesis. Singing Voice Synthesizers (SVS) traditionally require two types of inputs: a musical score and lyrics. The musical expression is then typically either generated automatically by applying a model of a certain type of expression to a high-level musical score, or achieved by manually editing low-level synthesizer parameters. We propose an alternative method, where the expression control is derived from a singing performance. In a first step, an analysis module extracts expressive information from the input voice signal, which is then adapted and mapped to the internal synthesizer controls. The presented implementation works in an off-line manner processing user input voice signals and lyrics using a phonetic segmentation module. The main contribution of this approach is to offer a direct way of controlling the expression of SVS. The further step is to run the system in real-time. The last section of this paper addresses a possible strategy for real-time operation.

Designing and Controlling a Source-filter Model for Naturalistic and Expressive singing voice synthesis

2007

In this paper, we describe a voice synthesis model developed for musical purposes. Based on a source-filter model, this synthesizer has been specifically designed to allow the synthesis of natural sounding singing voices by including pitch and amplitude variations and by careful tuning of consonant to vowel transitions. A particular attention is given to the reproduction of plosive consonants. The model covers all singing voice registers, from bass to soprano, and allows the control of several tone quality parameters such as vibrato depth and frequency, voice roughness and articulation speed. Its database is structured to synthesize whole consonant-vowel syllables. As a result, it is relatively easy to construct musically expressive phrases with just a few manipulations and control commands. The model uses Csound as the audio engine and can produce several voices at small cost in CPU.

Real-time interfaces for speech and singing

2000

This paper outlines the current limitations of such systems and then describes the methods used to give the user real-time control of the vocal synthesis device. WWW [1]

MaxMBROLA: A Max/MSP MBROLA-Based Tool for Real-Time Voice Synthesis

Proc. European Signal Processing Conference, 2005

In this paper, we present the first step of a project that is able to perform both speech and singing synthesis controlled in real-time. Our aim is to develop a flexible application allowing performers to produce complex and versatile singing–as well as speech–articulations. Thus, we have adapted an existing speech synthesizer, the MBROLA software, to realtime singing constraints. The work presented in this paper concerns the integration of the MBROLA speech synthesizer into the Max/MSP real-time environment ...