Yunbin Deng - Profile on Academia.edu (original) (raw)

Papers by Yunbin Deng

The MUTE silent speech recognition system

Conference of the International Speech Communication Association, 2013

sEMG based silent speech recognition has become a desirable communication modality because it has... more sEMG based silent speech recognition has become a desirable communication modality because it has the potential to provide natural, covert, hands-free communication in acoustically challenging environments. To enable this capability, we have developed a portable, self-contained, Android based Mouthed-speech Understanding and Transcription Engine (MUTE) system. To demonstrate the MUTE system’s ability to recognize a continuous vocabulary of 210 words we propose to conduct map task based demonstration, in which a MUTE user guides a “listener” around a schematized map. The listener draws out a map based on the received instructions; a comparison of the original and drawn map then illustrates the MUTE system’s recognition performance.

Proceedings of SPIE, Aug 18, 2005

Real time wavefront control for adaptive laser communication and imaging system requires fast mea... more Real time wavefront control for adaptive laser communication and imaging system requires fast measurement of image quality. Statistical analysis of speckle field provides effective image quality criteria for adaptive correction of phase-distorted images. We propose an analog continuous time VLSI (very-large-scale-integration) spectrum analysis chip to provide such a real time image quality measurement. The chip takes the signal sensed by a photo detector which is located in the speckle field as analog input and computes its spectrum distribution continuously. Experiment and analysis on distorted laser beam was conducted with the analog spectrum analysis chip. Target-in-the-loop system is under development to demonstrate the capability of real time adaptive imaging.

Long range standoff speaker identification using laser Doppler vibrometer

Existing studies on speaker identification are mostly performed on telephone and microphone speec... more Existing studies on speaker identification are mostly performed on telephone and microphone speech data, which are collected with subjects close to the sensor. For the first time, this study reports long range standoff automatic speaker identification experiments using laser Doppler vibrometer (LDV) sensor. The LDV sensor modality has the potential to extend the speech acquisition standoff distance far beyond microphone arrays to enable new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR). Five LDV speech corpuses, each consists of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall, using Polytec LDV model OFV-505. The distance from the LDV sensor to the vibration targets is 50 feet. State of the art i-vector speaker identification experiments on this LDV speech data show great promise of this LDV long range acoustic sensing modality.

Standoff heart rate estimation from video – a review

Acoustic and sonar analog signal processing applications require design of operational transcondu... more Acoustic and sonar analog signal processing applications require design of operational transconductance amplifiers (OTAs) that can be configured over wide frequency range in multiple bands and yet achieve low power consumption and low harmonic distortion. A fully differential, linear OTA is presented with digitally programmable transconductance ranging over three decades of dynamic range. Measurements from a prototype fabricated in a ¢ ¤ £¥ § ¦ m CMOS process demonstrate a 0.4 nA/V to 0.8 ¦ A/V transconductance range, 40 dB common-mode rejection ratio (CMRR), and-48 dB third-order harmonic distortion, at 12 ¦ W power dissipation.

Journal of Neural Engineering, Jun 25, 2018

Objective.-Speech is among the most natural forms of human communication, thereby offering an att... more Objective.-Speech is among the most natural forms of human communication, thereby offering an attractive modality for human-machine interaction through automatic speech recognition (ASR). However, the limitations of ASR-including degradation in the presence of ambient noise, limited privacy and poor accessibility for those with significant speech disorders-have motivated the need for alternative non-acoustic modalities of subvocal or silent speech recognition (SSR). Approach.-We have developed a new system of face-and neck-worn sensors and signal processing algorithms that are capable of recognizing silently mouthed words and phrases entirely from the surface electromyographic (sEMG) signals recorded from muscles of the face and neck that are involved in the production of speech. The algorithms were strategically developed by evolving speech recognition models: first for recognizing isolated words by extracting speechrelated features from sEMG signals, then for recognizing sequences of words from patterns of sEMG signals using grammar models, and finally for recognizing a vocabulary of previously untrained words using phoneme-based models. The final recognition algorithms were integrated with specially designed multi-point, miniaturized sensors that can be arranged in flexible geometries to record high-fidelity sEMG signal measurements from small articulator muscles of the face and neck. Main results.-We tested the system of sensors and algorithms during a series of subvocal speech experiments involving more than 1200 phrases generated from a 2200-word vocabulary and achieved an 8.9%-word error rate (91.1% recognition rate), far surpassing previous attempts in the field. Significance.-These results demonstrate the viability of our system as an alternative modality of communication for a multitude of applications including: persons with speech impairments

Sensor orientation invariant mobile gait biometrics

ABSTRACT Accelerometers and gyroscopes embedded in mobile devices have shown great potential for ... more ABSTRACT Accelerometers and gyroscopes embedded in mobile devices have shown great potential for non-obtrusive gait biometrics by directly capturing a user's characteristic locomotion. Despite the success in gait analysis under controlled experimental settings using these sensors, their performance in realistic scenarios is unsatisfactory due to data dependency on sensor placement. In practice, the placement of mobile devices is unconstrained. In this paper, we propose a novel gait representation for accelerometer and gyroscope data which is both sensor-orientation-invariant and highly discriminative to enable high-performance gait biometrics for real-world applications. We also adopt the i-vector paradigm, a state-of-the-art machine learning technique widely used for speaker recognition, to extract gait identities using the proposed gait representation. Performance studies using both the naturalistic McGill University gait dataset, and the Osaka University gait dataset containing 744 subjects have shown dominant superiority of this novel gait biometrics approach compared to existing methods.

IEEE/ACM transactions on audio, speech, and language processing, Dec 1, 2017

Each year thousands of individuals require surgical removal of their larynx (voice box) due to tr... more Each year thousands of individuals require surgical removal of their larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from the neck and face, and used for automatic speech recognition to provide speechto-text or synthesized speech as an alternative means of communication. This is true even when speech is mouthed or spoken in a silent (subvocal) manner, making it an appropriate communication platform after laryngectomy. In this study, 8 individuals at least 6 months after total laryngectomy were recorded using 8 sEMG sensors on their face (4) and neck (4) while reading phrases constructed from a 2,500-word vocabulary. A unique set of phrases were used for training phoneme-based recognition models for each of the 39 commonly used phonemes in English, and the remaining phrases were used for testing word recognition of the models based on phoneme identification from running speech. Word error rates were on average 10.3% for the full 8-sensor set (averaging 9.5% for the top 4 participants), and 13.6% when reducing the sensor set to 4 locations per individual (n=7). This study provides a compelling proof-of-concept for sEMGbased alaryngeal speech recognition, with the strong potential to further improve recognition performance.

Chinese spoken language understanding across domain

A robust parsing model for spontaneous Chinese based on semantic constituent spotting and concept... more A robust parsing model for spontaneous Chinese based on semantic constituent spotting and concept assembling model (SCAM) had been successfully developed in our “LOADSTAR”dialog system[1]. It is a travel information accessing system and the SCAM is rule based. Considering the domain portability, a statistical model for spoken language understanding is adopted. The statistical spoken language understanding model is developed in the domain of hotel reservation. Then the statistical model was ported to the domain of travel information accessing within four weeks.

Gate to Computer Science and Research, Jan 18, 2015

User authentication based on typing patterns offers many advantages in the domain of cyber securi... more User authentication based on typing patterns offers many advantages in the domain of cyber security, including data acquisition without extra hardware requirement, continuous monitoring as the keys are typed, and non-intrusive operation with no interruptions to a user's daily work. In this chapter, we adopt three popular voice biometrics algorithms to perform keystroke dynamics based user authentication, namely, 1) Gaussian Mixture Model with Universal Background Model (GMM-UBM), 2) identity vector (i-vector) approach to user modelling, and 3) deep machine learning approach. Unlike most existing keystroke biometrics approaches, which only use genuine users' data at training time, the proposed methods leverage data from a large pool of background users to enhance the model's discriminative capability. These algorithms make no assumption about the underlying probability distribution of the data and are amenable to real-time implementation. Although these techniques were originally developed for speech analysis, our experiments on the publicly available CMU keystroke dynamics dataset using these algorithms have shown significant reduction in the equal error

IEEE Transactions on Audio, Speech, and Language Processing, Aug 1, 2007

A robust speech feature extraction procedure, by kernel regression nonlinear predictive coding, i... more A robust speech feature extraction procedure, by kernel regression nonlinear predictive coding, is presented. Features maximally insensitive to additive noise are obtained by growth transformation of regression functions spanning a Reproducing Kernel Hilbert Space (RKHS). Experiments on TI-DIGIT demonstrate consistent robustness of the new features to noise of varying statistics, yielding significant improvements in digit recognition accuracy over identical models trained using Mel-scale cepstral features and evaluated at noise levels between 0 and 30 dB SNR.

arXiv (Cornell University), Mar 20, 2019

Recent breakthroughs in deep learning and artificial intelligence technologies have enabled numer... more Recent breakthroughs in deep learning and artificial intelligence technologies have enabled numerous mobile applications. While traditional computation paradigms rely on mobile sensing and cloud computing, deep learning implemented on mobile devices provides several advantages. These advantages include low communication bandwidth, small cloud computing resource cost, quick response time, and improved data privacy. Research and development of deep learning on mobile and embedded devices has recently attracted much attention. This paper provides a timely review of this fast-paced field to give the researcher, engineer, practitioner, and graduate student a quick grasp on the recent advancements of deep learning on mobile devices. In this paper, we discuss hardware architectures for mobile deep learning, including Field Programmable Gate Arrays (FPGA), Application Specific Integrated Circuit (ASIC), and recent mobile Graphic Processing Units (GPUs). We present Size, Weight, Area and Power (SWAP) considerations and their relation to algorithm optimizations, such as quantization, pruning, compression, and approximations that simplify computation while retaining performance accuracy. We cover existing systems and give a state-of-the-industry review of TensorFlow, MXNet, Mobile AI Compute Engine (MACE), and Paddle-mobile deep learning platform. We discuss resources for mobile deep learning practitioners, including tools, libraries, models, and performance benchmarks. We present applications of various mobile sensing modalities to industries, ranging from robotics, healthcare and multimedia, biometrics to autonomous drive and defense. We address the key deep learning challenges to overcome, including low quality data, and small training/adaptation data sets. In addition, the review provides numerous citations and links to existing code bases implementing various technologies. These resources lower the user's barrier to entry into the field of mobile deep learning.

Gate to Computer Science and Research, Jan 18, 2015

Recent popularity in mobile devices has raised concerns on mobile technology security, as not onl... more Recent popularity in mobile devices has raised concerns on mobile technology security, as not only sensitive and private data are being stored on mobile devices, but also allowing remote access to other high value assets. This drives research efforts to new mobile technology security methods. Fortunately, new mobile devices are equipped with advanced sensor suite, enabling a multi-modal biometrics authentication solution, to include voice, face, gait, signature, and keystroke authentication, among others. Compared with other modalities, keystroke authentication offer some very attractive features: 1) non-intrusive, either password or free-text typing keystroke authentication can be applied without affecting users' daily user of the device; 2) it can work on continuous authentication mode for free typing; 3) it can leverage a unique set of advanced build in sensors, including accelerometer and gyroscope to capture rich typing information than raw timing pattern. We present a deep learning approach

Signal processing advances for the MUTE sEMG-based silent speech recognition system

ABSTRACT Military speech communication often needs to be conducted in very high noise environment... more ABSTRACT Military speech communication often needs to be conducted in very high noise environments. In addition, there are scenarios, such as special-ops missions, for which it is beneficial to have covert voice communications. To enable both capabilities, we have developed the MUTE (Mouthed-speech Understanding and Transcription Engine) system, which bypasses the limitations of traditional acoustic speech communication by measuring and interpreting muscle activity of the facial and neck musculature involved in silent speech production. This article details our recent progress on automatic surface electromyography (sEMG) speech activity detection, feature parameterization, multi-task sEMG corpus development, context dependent sub-word sEMG modeling, discriminative phoneme model training, and flexible vocabulary continuous sEMG silent speech recognition. Our current system achieved recognition accuracy at developable levels for a pre-defined special ops task. We further propose research directions in adaptive sEMG feature parameterization and data driven decision question generation for context-dependent sEMG phoneme modeling.

Gate to Computer Science and Research, Jan 18, 2015

In this review paper we present a comprehensive survey of research efforts in the past couple of ... more In this review paper we present a comprehensive survey of research efforts in the past couple of decades on keystroke dynamics biometrics. We review the literature in light of various feature extraction, feature matching and classification methods for keystroke dynamics. We also discuss recent trends in keystroke dynamics research, including its use in mobile environments, as a soft biometrics, and its fusion with other biometric modalities. We further address the evaluation of keystroke biometric systems, including traditional and new performance metrics, and list publicly available keystroke datasets for performance benchmarks to promote synergy in the research community.

International Conference on Pattern Recognition, Dec 1, 2008

Robust voice activity detection (VAD) is a very crucial step and a challenging problem in develop... more Robust voice activity detection (VAD) is a very crucial step and a challenging problem in developing real-time and high-performance speech recognition systems used in noisy environments. In this paper, we present a novel and efficient VAD algorithm for robust and real-time speech activity detection. The key idea of the algorithm is considering speech energy and edge information simultaneously when processing speech signals. A new finite state Automaton is also developed for correctly detecting voice activities in noisy environments. Extensive and comparative experimental results show that the proposed VAD algorithm can greatly speed up speech recognition while reducing word error rate (WER) significantly. Compared with the state-of-the-art, the average improvement of using the proposed algorithm on noisy data is 46.5% for processing speed and 15.3% for WER.

An auditory perception model for noise-robust speech feature extraction is presented. The model a... more An auditory perception model for noise-robust speech feature extraction is presented. The model assumes continuous-time filtering and rectification, amenable to real-time, low-power analog VLSI implementation. A 3mm ¢ 3mm CMOS chip in 0.5£ ¥ ¤ CMOS technology implements the general form of the model with digitally programmable filter parameters. Experiments on the TI-DIGIT database demonstrate consistent robustness of the new features to noise of various statistics, yielding significant improvements in digit recognition accuracy over models identically trained using Mel-scale frequency cepstral coefficient (MFCC) features.

Towards a practical silent speech recognition system

Our recent efforts towards developing a practical surface electromyography (sEMG) based silent sp... more Our recent efforts towards developing a practical surface electromyography (sEMG) based silent speech recognition interface have resulted in significant advances in the hardware, software and algorithmic components of the system. In this paper, we report our algorithmic progress, specifically: sEMG feature extraction parameter optimization, advances in sEMG acoustic modeling, and sEMG sensor set reduction. The key findings are: 1) the gold-standard parameters for acoustic speech feature extraction are far from optimum for sEMG parameterization, 2) advances in state-of-the-art speech modelling can be leveraged to significantly enhance the continuous sEMG silent speech recognition accuracy, and 3) the number of sEMG sensors can be reduced by half with little impact on the final recognition accuracy, and the optimum sensor subset can be selected efficiently based on basic monophone HMM modeling.

In this paper we investigate the problem of user authentication using keystroke biometrics. A new... more In this paper we investigate the problem of user authentication using keystroke biometrics. A new distance metric that is effective in dealing with the challenges intrinsic to keystroke dynamics data, i.e., scale variations, feature interactions and redundancies, and outliers is proposed. Our keystroke biometrics algorithms based on this new distance metric are evaluated on the CMU keystroke dynamics benchmark dataset and are shown to be superior to algorithms using traditional distance metrics. 2. Literature Review Of late, keystroke dynamics has become an active research area due to the increasing importance of cyber security and computer or network access control. Most of

Fire and Gun Detection Based on Sematic Embeddings

The MUTE silent speech recognition system

Conference of the International Speech Communication Association, 2013

Proceedings of SPIE, Aug 18, 2005

Long range standoff speaker identification using laser Doppler vibrometer

Standoff heart rate estimation from video – a review

Journal of Neural Engineering, Jun 25, 2018

Sensor orientation invariant mobile gait biometrics

IEEE/ACM transactions on audio, speech, and language processing, Dec 1, 2017

Chinese spoken language understanding across domain

Gate to Computer Science and Research, Jan 18, 2015

IEEE Transactions on Audio, Speech, and Language Processing, Aug 1, 2007

arXiv (Cornell University), Mar 20, 2019

Gate to Computer Science and Research, Jan 18, 2015

Signal processing advances for the MUTE sEMG-based silent speech recognition system

Gate to Computer Science and Research, Jan 18, 2015

International Conference on Pattern Recognition, Dec 1, 2008

Towards a practical silent speech recognition system

Fire and Gun Detection Based on Sematic Embeddings