Hakan Erdogan - Profile on Academia.edu (original) (raw)

Papers by Hakan Erdogan

2015 23rd European Signal Processing Conference (EUSIPCO), 2015

This paper proposes a novel approach for denoising single-channel noisy speech signals. A speech ... more This paper proposes a novel approach for denoising single-channel noisy speech signals. A speech dictionary and multiple noise dictionaries are trained using nonnegative matrix factorization (NMF). After observing the mixed signal, first the type of noise in the mixed signal is identified. The magnitude spectrogram of the noisy signal is decomposed using NMF with the concatenated trained dictionaries of noise and speech. Our results indicate that recognizing the noise type from the mixed signal and using the corresponding specific noise dictionary provides better results than using a general noise dictionary in the NMF approach. We also compare our algorithm with other state-of-the-art denoising methods and show that it has better performance than the competitors in most cases.

Paraboloidal Surrogates for PET Transmission Scans

Presented at 1998 IEEE Nuc. Sci. Symp. and Med. Im. Conf

ABSTRACT

Fast Monotonic Algorithms for Transmission Tomography

We present a framework for designing fast and monotonic algorithms for transmission tomography pe... more We present a framework for designing fast and monotonic algorithms for transmission tomography penalized- likelihood image reconstruction. The new algorithms are based on paraboloidal surrogate functions for the log likelihood. Due to the form of the log-likelihood function it is possible to find low curvature surrogate functions that guarantee monotonicity. Unlike previous methods, the proposed surrogate functions lead to monotonic

Özetçe Bu bildiride geliştirdiğimiz Türkçe konuşma tanıma sisteminin yapısından bahsedeceğiz. Sis... more Özetçe Bu bildiride geliştirdiğimiz Türkçe konuşma tanıma sisteminin yapısından bahsedeceğiz. Sistemin inşa edilişi ve daha sonra sistem üzerinde yaptığımız deneyler hakkında bilgi vereceğiz. Sistem eğitimi için SUVoice veritabanı ile METU 1.0 veritabanını birlikte kullandık. Sınırlı dağarcıklı ve geniş dağarcıklı tanıma deneyleri yaptık. Modern saklı Markov modeli tabanlı konuşma tanıma sistemlerinin Türkçe için değişik sınama koşullarındaki başarımını gösterdik. Basit sınamalarda kelime hata oranı %1 civarında olurken geniş dağarcıklı sınamalarda daha yüksek hata oranları elde ettik. Bu çalışma Türkçe konuşma tanıma konusunda daha ileri düzeyde çalışmalara bir temel teşkil edecek ve bu konudaki bilgi birikimine katkıda bulunacaktır.

An Ordered Subsets Algorithm for Transmission Tomography

ABSTRACT

In this paper, we present methods to improve speech recognition performance of the IBM DARPA Comm... more In this paper, we present methods to improve speech recognition performance of the IBM DARPA Communicator system. Our efforts for acoustic modeling include training a domain specific yet broad acoustic model, speaker clustering and speaker adaptation using feature space transforms. For language modeling, we achieved improvements by using compound words, carefully designed LM classes and adjusting the within class probabilities, using NLU state information to enhance the language model and building a language model with embedded grammar objects. Our efforts produced a relative error rate reduction of 34.6% on the test set that consists of 1173 utterances that IBM received during the NIST evaluation of the DARPA Communicator systems in June 2000. We also tested our decoding on the data from some other sites to further demonstrate the robustness of the system improvements.

Linear Discriminant Analysis (LDA) aims to transform an original feature space to a lower dimensi... more Linear Discriminant Analysis (LDA) aims to transform an original feature space to a lower dimensional space with as little loss in discrimination as possible. We introduce a novel LDA matrix computation that incorporates confusability information between classes into the transform. Our goal is to improve discrimination in LDA. In conventional LDA, a between class covariance matrix that is based on the scatter of class means around the global mean is used. By rewriting the between class covariance expression in a more revealing way, we unveil that each class pair is considered equally confusable in the conventional LDA. We introduce a weighting factor for each pairwise scatter that enables to integrate the confusability information into the between class covariance matrix. There are many possibilities to choose the weighting factors. We consider few of them that depend on Euclidean and Kullback-Leibler distances between classes when a single Gaussian approximation is used for each class. The method combined with speaker cluster based transformation decreases the error rate by about relative 10% on a large vocabulary speech recognition task using IBM's speech recognition engine.

In this paper, we present a method for incremental on-line adaptation based on feature space Maxi... more In this paper, we present a method for incremental on-line adaptation based on feature space Maximum Likelihood Linear Regression (FMLLR) for telephony speech recognition applications. We explain how to incorporate a feature space MLLR transform into a stack decoder and perform on-line adaptation. The issues discussed are as follows: collecting adaptation data on-line and in real time; mapping adaptation data from previous feature space to the present feature space; and smoothing adaptation statistics with initial statistics based on original acoustical model to achieve stability. Testing results on various systems demonstrate that on-line incremental FM-LLR adaptation could be an effective and stable method when the adaptation statistics are mapped and smoothed.

In this paper, we present biometric person recognition experiments in a real-world car environmen... more In this paper, we present biometric person recognition experiments in a real-world car environment using speech, face, and driving signals. We have performed experiments on a subset of the in-car corpus collected at the Nagoya University, Japan. We have used Mel-frequency cepstral coefficients (MFCC) for speaker recognition. For face recognition, we have reduced the feature dimension of each face image through principal component analysis (PCA). As for modeling the driving behavior, we have employed features based on the pressure readings of acceleration and brake pedals and their time-derivatives. For each modality, we use a Gaussian mixture model (GMM) to model each person's biometric data for classification. GMM is the most appropriate tool for audio and driving signals. For face, even though a nearest-neighbor-classifier is the preferred choice, we have experimented with a single mixture GMM as well. We use background models for each modality and also normalize each modality score using an appropriate sigmoid function. At the end, all modality scores are combined using a weighted sum rule. The weights are optimized using held-out data. Depending on the ultimate application, we consider three different recognition scenarios: verification, closed-set identification, and open-set identification. We show that each modality has a positive effect on improving the recognition performance.

Optimal forward-backward pursuit for the sparse signal recovery problem

2013 21st Signal Processing and Communications Applications Conference (SIU), 2013

ABSTRACT Forward-backward pursuit (FBP) is an iterative two stage thresholding method (TST) for s... more ABSTRACT Forward-backward pursuit (FBP) is an iterative two stage thresholding method (TST) for sparse signal recovery. Due to the selection of more indices during the forward step than the ones pruned by the backward step, FBP iteratively enlarges the support estimate. With this structure, FBP does not necessitate the sparsity level to be known a priori in contrast to other TST algorithms such as subspace pursuit (SP) or compressive sampling matching pursuit. In this work, we address optimal selection of forward and backward step sizes for FBP. We analyse the empirical recovery performance of FBP with different step sizes via phase transitions. Moreover, we compare phase transitions of FBP with those of basis pursuit, SP and orthogonal matching pursuit.

A combined approach to regularized linear combiner learning

... Burada, e˘ger y = z ise, δ(y, z)=1 'dir, di˘ger durumlarda sıfırdır. Ay ve by ise, A mat... more ... Burada, e˘ger y = z ise, δ(y, z)=1 'dir, di˘ger durumlarda sıfırdır. Ay ve by ise, A matrisini ve b vektörünü N satırlı matris ve vektörler olarak parçaladı˘gımızda y'ninci alt matris ve vektöre denk gelir: wk = Ak w + bk. ... [3] David H. Wolpert, Stacked generalization, Neural Netw., vol. ...

B-uc aflhc inadequste prrfonnanw of speech recognition systems, an accurate confidence scoring me... more B-uc aflhc inadequste prrfonnanw of speech recognition systems, an accurate confidence scoring mechanism should be employed to un- dentand the user requests correctly. To determine a confidence score fora hypothesis, cemin confidence features are combined. In this work the performance offiller-model based confidence features ham bccn in- vertigtted. Five types of filler model networks were defined: mphonc- netwark phone-network, phane-elass

Comments On "Multipath Matching Pursuit" by Kwon, Wang and Shim

Straightforward combination of tree search with matching pursuits, which was suggested in 2001 by... more Straightforward combination of tree search with matching pursuits, which was suggested in 2001 by Cotter and Rao, and then later developed by some other authors, has been revisited recently as multipath matching pursuit (MMP). In this comment, we would like to point out some major issues regarding this publication. First, the idea behind MMP is not novel, and the related literature has not been properly referenced. MMP has not been compared to closely related algorithms such as A* orthogonal matching pursuit (A*OMP). The theoretical analyses do ignore the pruning strategies applied by the authors in practice. All these issues have the potential to mislead the reader and lead to misinterpretation of the results. With this short paper, we intend to clarify the relation of MMP to existing literature in the area and compare its performance with A*OMP.

A comparison of termination criteria for A∗OMP

Heuristic search has recently been utilized for compressed sensing signal recovery problem by the... more Heuristic search has recently been utilized for compressed sensing signal recovery problem by the A* Orthogonal Matching Pursuit (A*OMP) algorithm. A*OMP employs A* search on a tree with an OMP-based evaluation of the branches, where the search is terminated when the desired path length is achieved. The algorithm employs effective pruning techniques and cost models which make the tree search practical. Here, we propose two important extensions of A*OMP: We first introduce a novel dynamic cost model that reduces the search time. Second, we modify the termination criterion by stopping the search when ℓ2 norm of the residue is small enough. Following the restricted isometry property, this termination criterion is more appropriate for our purposes. We demonstrate the improvements in terms of both reconstruction accuracy and computation times via a wide range of simulations.

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001

Automatic name dialing is a practical and interesting application of speech recognition on teleph... more Automatic name dialing is a practical and interesting application of speech recognition on telephony systems. The IBM name recognition system is a large vocabulary, speaker independent system currently in use for reaching IBM employees in the United States. In this paper, we present some innovative algorithms that improve name recognition accuracy. Unlike transcription tasks, such as the Switchboard task, recognition of names poses a variety of different problems. Several of these problems arise from the fact that foreign names are hard to pronounce for speakers who are not familiar with the names and that there are no standardized methods for pronouncing proper names. Noise robustness is another very important factor as these calls are typically made in noisy environments, such as from a car, cafeteria, airport, etc. and over different kinds of cellular and land-line telephone channels. We have performed a systematic analysis of the speech recognition errors and tackled the issues separately with techniques ranging from weighted speaker clustering, massive adaptation, rapid and unsupervised adaptation methods to pronunciation modeling methods. We find that the decoding accuracy can be improved significantly (28% relative) in this manner.

Proceedings of the second international conference on Human Language Technology Research -, 2002

This paper presents a statistical speech-to-speech machine translation (MT) system for limited do... more This paper presents a statistical speech-to-speech machine translation (MT) system for limited domain applications using a cascaded approach. This architecture allows for the creation of multilingual applications. In this paper, the system architecture and its components, including the speech recognition, parsing, information extraction, translation, natural language generation (NLG) and textto-speech (TTS) components are described. We have implemented the described system for translating speech between Mandarin and English language pair in an air travel application domain. We are current porting the system to the military domain. Encouraging experimental results have been observed and are presented.

EURASIP Journal on Advances in Signal Processing, 2015

In this paper, we propose a new biometric verification and template protection system which we ca... more In this paper, we propose a new biometric verification and template protection system which we call the THRIVE system. The system includes novel enrollment and authentication protocols based on threshold homomorphic cryptosystem where the private key is shared between a user and the verifier. In the THRIVE system, only encrypted binary biometric templates are stored in the database and verification is performed via homomorphically randomized templates, thus, original templates are never revealed during the authentication stage. The THRIVE system is designed for the malicious model where the cheating party may arbitrarily deviate from the protocol specification. Since threshold homomorphic encryption scheme is used, a malicious database owner cannot perform decryption on encrypted templates of the users in the database. Therefore, security of the THRIVE system is enhanced using a two-factor authentication scheme involving the user's private key and the biometric data. We prove security and privacy preservation capability of the proposed system in the simulation-based model with no assumption. The proposed system is suitable for applications where the user does not want to reveal her biometrics to the verifier in plain form but she needs to proof her physical presence by using biometrics. The system can be used with any biometric modality and biometric feature extraction scheme whose output templates can be binarized. The overall connection time for the proposed THRIVE system is estimated to be 336 ms on average for 256-bit biohash vectors on a desktop PC running with quadcore 3.2 GHz CPUs at 10 Mbit/s up/down link connection speed. Consequently, the proposed system can be efficiently used in real life applications.

Linear Discriminant Analysis (LDA) followed by a diagonalizing maximum likelihood linear transfor... more Linear Discriminant Analysis (LDA) followed by a diagonalizing maximum likelihood linear transform (MLLT) applied to spliced static MFCC features yields important performance gains as compared to MFCC+dynamic features in most speech recognition tasks. It is reasonable to regularize LDA transform computation for stability. In this paper, we regularize LDA and heteroschedastic LDA transforms using two methods: (1) Statistical priors for

Signal Processing, 2016

Best-first search has been recently utilized for the compressed sensing (CS) signal recovery prob... more Best-first search has been recently utilized for the compressed sensing (CS) signal recovery problem by A ⋆ orthogonal matching pursuit (A ⋆ OMP). In this work we mainly concentrate on the theoretical analysis of A ⋆ OMP. First of all, we develop a restricted isometry property (RIP)-based condition for exact recovery of sparse signals via A ⋆ OMP. In addition, we present a theoretical foundation for the improved recovery performance with the residue-based termination instead of the sparsity-based one. We support our findings with extensive experiments using the adaptive-multiplicative (AMul) cost model, which effectively compensates for the path length differences in the search tree. The presented results, involving phase transitions as well as recovery rates and average error for noisy and noisefree sparse signals with different nonzero element distributions, not only reveal the superior recovery accuracy of A ⋆ OMP, but also demonstrate the improvements promised by the residuebased termination criterion. In addition, comparison of run times indicate the speed up by the AMul cost model. We also demonstrate a hybrid of OMP and A ⋆ OMP to accelerate the search further. Finally, we run A ⋆ OMP on a sparse image to illustrate its recovery performance for more realistic coefficient distributions.

2015 23rd European Signal Processing Conference (EUSIPCO), 2015

Paraboloidal Surrogates for PET Transmission Scans

Presented at 1998 IEEE Nuc. Sci. Symp. and Med. Im. Conf

ABSTRACT

Fast Monotonic Algorithms for Transmission Tomography

An Ordered Subsets Algorithm for Transmission Tomography

ABSTRACT

Optimal forward-backward pursuit for the sparse signal recovery problem

2013 21st Signal Processing and Communications Applications Conference (SIU), 2013

A combined approach to regularized linear combiner learning

Comments On "Multipath Matching Pursuit" by Kwon, Wang and Shim

A comparison of termination criteria for A∗OMP

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001

Proceedings of the second international conference on Human Language Technology Research -, 2002

EURASIP Journal on Advances in Signal Processing, 2015

Signal Processing, 2016