Pradip K. Das - Academia.edu (original) (raw)

Papers by Pradip K. Das

Computer Science & Information Technology ( CS & IT ), 2013

There has been considerable amount of work done in exploring the acoustic correlates of nasalized... more There has been considerable amount of work done in exploring the acoustic correlates of nasalized and non-nasalized vowels in the frequency domain. Nasalized vowels are characterized by the presence of extra pole-zero pairs near the first formant region and across the spectrum. Several other automatically extractable acoustic features have been proposed by researchers across the globe. This area has not been explored much in the temporal domain. In this study we have tried to find quantifiable differences/similarities between the nasal and non-nasal vowel /a/ in the temporal domain at the pitch synchronous level. The results show significant differences between nasalized and non-nasalized vowel /a/.

Studies in Computational Intelligence, 2022

The series "Studies in Computational Intelligence" (SCI) publishes new developments and advances ... more The series "Studies in Computational Intelligence" (SCI) publishes new developments and advances in the various areas of computational intelligence-quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the worldwide distribution, which enable both wide and rapid dissemination of research output.

Image Copy-Move Forgery Detection

2020 IEEE REGION 10 CONFERENCE (TENCON)

Facial landmark points that are precisely extracted from the face images improve the performance ... more Facial landmark points that are precisely extracted from the face images improve the performance of many applications in the domains of computer vision and graphics. Face swapping is one of such applications. With the availability of sophisticated image editing tools and the use of deep learning models, it is easy to create swapped face images or face swap attacks in images or videos even for non-professionals. Face swapping transfers a face from a source to a destination image, while preserving photo realism. It has potential applications in computer games, privacy protection, etc. However, it could also be used for fraudulent purposes. In this paper, we propose an approach to create face swap attacks and detect them from the original images. The augmented 81-facial landmark points are extracted for creating the face swap attacks. The feature descriptors Weighted Local Magnitude Patterns (WLMP) and Support Vector Machines (SVM) are utilized for the swapped face images detection. The performance of the proposed approach is demonstrated by different types of SVM classifiers on a real-world dataset. Experimental results show that the proposed system effectively does face swapping and detection with an accuracy of 95%.

Automatic speech recognition by machine is one of the most efficient methods for man-machine comm... more Automatic speech recognition by machine is one of the most efficient methods for man-machine communications. Because speech waveform is nonlinear and variant. Speech recognition requires a lot of intelligence and fault tolerance in the pattern recognition algorithms. Accurate vowel recognition forms the backbone of most successful speech recognition systems. A collection of techniques exists to extract the relevant features from the steady-state regions of the vowels both in time as well as in frequency domains. This paper is, introducing fuzzy techniques allow the classification of imprecise vowel data. By incorporating the acoustic attribute, the system acquires the capacity to correctly classify imprecise speech data input. Experimental results show that the fuzzy system’s performance is vastly improved over a standard Mel frequency cepstral coefficient (MFCC) features analysis of vowel recognition. The speech recognition is a particularly difficult classification problem, due to...

Recent trends have indicated the use of very high computations for solving the problem of speaker... more Recent trends have indicated the use of very high computations for solving the problem of speaker recognition. However, there are cases when gains are not commensurate to the additional computations involved. We have studied the effect of size of UBM and the total variability matrix, T, in i-vector modeling on the recognition performance. Results indicate that after T size 50, there is a very small performance improvement. For UBM size, 128 is observed as the optimal mixture count. For performing the experiments, we have used the ALIZE toolkit and TED-LIUM database.

2 Abstract - The speech recognition process involves human- machine communication via human voice... more 2 Abstract - The speech recognition process involves human- machine communication via human voice. This complex process involves understanding and differentiating the basic characteristics of speech. Lot of work has been done in this regard, but we are yet to have a system with hundred percent recognition rate thus limiting uses of such a system only to non- critical applications. The task is to recognize the human voice sent to the machine via a communication media. The paper discusses some works done in the area of speech recognition. All speech recognition algorithms consider specific characteristics of speech signal resulting into better recognition rate in the recent years. However people are still looking into different aspects of speech signal to improve their rates. We focus on an aspect that is not studied much yet. The paper reports experiments conducted to verify whether the position of a word in a sentence influences the recognition rate, and if so, how and what can be d...

The speech recognition problem deals with recognizing spoken words or utterances to interpret the... more The speech recognition problem deals with recognizing spoken words or utterances to interpret the voice message. This domain has been investigated by many researchers for more than five decades. There are numerous techniques and frameworks made available to address this problem. Hidden Markov Modeling (HMM) being a popular modeling technique has been used in different tools to build speech-based systems. In spite of its vast usage and popularity, shortcomings have introduced some new challenges in designing the feature modeling techniques. One of the solutions is using trajectory models. They are efficient in capturing the intra-segmental temporal dynamics which helps to understand the continuous nature of the speech signal. Even though trajectories have been found to be an effective solution, the complexity of trajectory modeling is yet to be improved. In this paper, two trajectory parameter extraction methods are proposed. The methods are shown to be effective for speech classific...

Speech-based communication is one of the most preferred modes of communication for humans. The hu... more Speech-based communication is one of the most preferred modes of communication for humans. The human voice contains several important information and clues that help in interpreting the voice message. The gender of the speaker can be accurately guessed by a person based on the received voice of a speaker. The knowledge of the speaker’s gender can be a great aid to design accurate speech recognition systems. GMM based classifier is a popular choice used for gender detection. In this paper, we propose a Tensor-based approach for detecting the gender of a speaker and discuss its implementation details for low resourceful languages. Experiments were conducted using the TIMIT and SHRUTI dataset. An average gender detection accuracy of 91% is recorded. Analysis of the results with the proposed method is presented in this paper.

Language detection is the first step in speech recognition systems. It helps these systems to use... more Language detection is the first step in speech recognition systems. It helps these systems to use grammar and semantics of a language in a better way. Due to these reasons, active research is being carried out in language identification. Every language has specific sound patterns, rhythm, tone, nasal features, etc. We have proposed an approach based on Tensor that uses MFCCs for determining the characteristic features of a language that can be used to identify a spoken language. Tensor based algorithms perform quite well for higher dimensions and scale quite well as compared to classic maximum likelihood estimation (MLE) used in latent variable modeling. Also, this approaches does not suffer from slow convergence and require fewer data points for learning. We have conducted language identification experiments on native Indian English and Hindi for some chosen speakers, and an accuracy of around 70% is observed.

TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON)

Characterization of vowels in spoken English plays a significant role in designing speech process... more Characterization of vowels in spoken English plays a significant role in designing speech processing systems. In this work, spoken English vowels are analysed to find features which can help in their characterization. The outcome of the analysis led to the proposal of a novel feature which uses tree structures for the representation of vowels. In this approach, the vowels are represented as trees with their structural properties being elements in the trees. These properties are extracted by understanding the geometrical shapes of acoustic events. To prove the features that they can distinguish between vowels, a comparison is shown by calculating the distances among the new features. The computation of distance is done by employing a tree matching algorithm. The performance of the proposed features are compared against the standard MFCC features. In the analysis, speech data of Indian native speakers was used. The analysis procedures and the results obtained are presented.

The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages

Speaker Recognition is the collective name of problems given to identifying a person or a set of ... more Speaker Recognition is the collective name of problems given to identifying a person or a set of persons using his/her voice. Variation of speaker speaking styles due to different languages can make speaker recognition a difficult task. In this paper, the main aim was to develop a system and compare different efficient text-independent Bengali speaker recognition systems that can give good rates of accuracy (greater than 90%) with not more than 10 minutes of speech data available for each speaker and can easily produce results without long amounts of delay. The experiments were carried out using the SHRUTI Bengali speech database and validated using TED-EX database. We have also analyzed different features of a Bengali speaker using GMM-UBM framework, Joint Factor Analysis, i-vectors, CNN and RNN. Elaborate comparisons and classifications are carried out based on training durations and languages spoken by the speakers.

IET Image Processing

In this study, the problem of detecting if an image has tampered is inquired; especially, the att... more In this study, the problem of detecting if an image has tampered is inquired; especially, the attention has been paid to the case in which the portion of an image is copied and then pasted onto another region to create a duplication or to hide some important portion of the image. The proposed copy-move forgery detection system is based on the scale-invariant feature transform (SIFT) features extraction and density-based clustering algorithm. The extracted SIFT features are matched using the generalised two nearest neighbours (2NN) procedure. Thereafter, the density-based clustering algorithm is utilised to improve the detection results. The proposed system is tested using MICC-F220, MICC-F2000 and MICC-F8multi datasets. Due to the generalised 2NN matching procedure, the proposed system is able to detect multiple forgeries present in the image. Experimental results show that the performance of the system is quite satisfactory in terms of computational time as well as detection accuracy.

International Journal of Speech Technology, 2016

An important task of speaker verification is to generate speaker specific models and match an inp... more An important task of speaker verification is to generate speaker specific models and match an input speaker’s utterance with these models. This paper focuses on comparing the performance of text dependent speaker verification system using Mel Frequency Cepstral Coefficients feature and different Vector Quantization (VQ) based speaker modelling techniques to generate the speaker specific models. Speaker-specific information is mainly represented by spectral features and using these features we have developed the model which serves as an important entity for determining the claimed identity of the speaker. In the modelling part, we used Linde, Buzo, Gray (LBG) VQ, proposed adaptive LBG VQ and Fuzzy C Means (FCM) VQ for generating speaker specific model. The experimental results that are performed on microphonic database shows that accuracy significantly depends on the size of the codebook in all VQ techniques, and on FCM VQ accuracy also depend on the value of learning parameter of the objective function. Experiment results shows that how the accuracy of speaker verification system is depend on different representations of the codebook, different size of codebook in VQ modelling techniques and learning parameter in FCM VQ.

Advances in Intelligent Systems and Computing, 2016

Most recognition systems heavily depend on the features used for representation of speech informa... more Most recognition systems heavily depend on the features used for representation of speech information. Over the years, there has been a continuous effort to generate features that can represent speech as best as possible. This has led to the use of larger feature sets in speech and speaker recognition systems. However, with the increasing size of the feature set, it is not necessary that all features are equally important for speech representation. This paper investigates the relevance of individual features in one of popular feature sets, MFCCs. The objective of the study is to identify features which are more important from speech information representation perspective. Experiments were conducted for the task of speaker recognition. Results indicate that it is possible to reduce the feature set size by more than 60 % without significant losses in accuracy.

Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826), 2004

The advances in computer hardware combined with innovative artificial intelligence (AI) technique... more The advances in computer hardware combined with innovative artificial intelligence (AI) techniques can be a powerful methodology to perform intelligent cognitive tasks. We have investigated speech recognition techniques using hidden Markov models and successfully classified speakers based on their utterances. This paper proposes a discrete probability HMMS based approach to successfully classify speakers based on their utterances. The results show that we have got high accuracy in identifying the speakers. This leads us to conclude that uncertain reasoning and learning are vital components of AI that could lead to the development of automated intelligent solutions to various complex and interesting problems.

2015 International Conference on Industrial Instrumentation and Control (ICIC), 2015

The accuracy of speech recognition systems, to a large extent, depends on the feature sets used f... more The accuracy of speech recognition systems, to a large extent, depends on the feature sets used for representing the recorded speech data. It has been a continuous process to derive better feature sets for more accurate speech recognition using ASR (Automatic Speech Recognition) systems. Many feature sets and their different combinations have been tried to achieve better accuracy but a feature set providing completely accurate results has not yet been formulated. These large feature sets consume significant amount of memory, together with computing and power requirements and they do not always contribute to improve the recognition rate. The paper investigates the relevance of individual features within the feature sets incorporated in speech recognition systems. The goal is to identify the features that do not contribute significantly in recognition or perhaps causing a fall in the recognition accuracy. The results of the experiments show that about 60% reduction of feature set is feasible with marginal loss of recognition accuracy using our method. The results of the analysis will further be used to formulate better feature sets, smaller than the traditional features with improved accuracy of ASR systems.

International Journal of Speech Technology, 2015

ABSTRACT

Advances in Intelligent Systems and Computing, 2014

Data aggregation has been used as a prominent technique for lifetime enhancement of wireless sens... more Data aggregation has been used as a prominent technique for lifetime enhancement of wireless sensor networks (WSN) for quite some time. Data aggregation reduces total number of transmissions in a WSN. Since transmitting energy is the most prominent component of energy consumption in a WSN, data aggregation reduces energy expenditure of the network and thereby enhances network lifetime. The nature of aggregation, however, may vary from one application to another. Along with this, the way source nodes are selected for transmission has an effect on the energy depletion and lifetime of the nodes. In this paper, we have studied the effect of certain non-electrical factors such as source selection, deployment pattern, packet size, and data forwarding technique on the performance of aggregation of a multi-sink WSN with varying degrees of aggregation.

2012 International Conference on Computing, Communication and Applications, 2012

Automatic facial expression recognition analysis is a challenging area, which finds applications ... more Automatic facial expression recognition analysis is a challenging area, which finds applications in human-computer interaction, human-robot interaction and online multimedia communication to name a few. From available Local Binary Pattern (LBP) variant operators, a single operator cannot take care of all properties such as scale, robustness and discriminative ability. It also does not have control on the length of the