Shyamal Kumar Das Mandal - Academia.edu (original) (raw)
Papers by Shyamal Kumar Das Mandal
Annual Meeting of the Association for Computational Linguistics, 2012
2019 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)
We have designed and developed a non-invasive blood glucose level monitoring system. The electrom... more We have designed and developed a non-invasive blood glucose level monitoring system. The electromagnetic spectrum of both visible and infrared region has been used for the study of absorption, refraction, and reflection of bio-molecules. In this article, we have shown the system model and data acquisition platform based on Android user interface. The system model includes a proposed multi-sensor signal processing sub-system, an embedded platform, and a Bluetooth wireless transmission subsystem. The proposed system has been investigated on normal and diabetic patients. The stability of the device is tested using a one-way ANOVA test. The result of the hypothesis test has shown in favour of acceptance of the datasets provided by the device.
2019 IEEE Tenth International Conference on Technology for Education (T4E), 2019
: Outcome-Based Education (OBE) is a learner-centric teaching and learning methodology in which t... more : Outcome-Based Education (OBE) is a learner-centric teaching and learning methodology in which the course delivery, assessment is planned to achieve stated outcomes. The paper describes a methodology for developing an outcome-based curriculum based on a framework design by the Indian Institute of Technology Kharagpur and methodology for measuring the effectiveness of the outcome-based curriculum based on learner feedback. The effectiveness was measured in five dimensions - (i) Learner Engagement, (ii) Active and Collaborative Learning, (iii) Continuous Updation and promotion of self-learning, (iv) Promotion of Higher-Order Thinking, (vi) Learner Satisfaction. The study indicates that the methodology was useful to promote self-learning, Engagement and promote higher-order thinking.
ArXiv, 2014
The paper presents the capability of an HMM-based TTS system to produce Bengali speech. In this s... more The paper presents the capability of an HMM-based TTS system to produce Bengali speech. In this synthesis method, trajectories of speech parameters are generated from the trained Hidden Markov Models. A final speech waveform is synthesized from those speech parameters. In our experiments, spectral properties were represented by Mel Cepstrum Coefficients. Both the training and synthesis issues are investigated in this paper using annotated Bengali speech database. Experimental evaluation depicts that the developed text-to-speech system is capable of producing adequately natural speech in terms of intelligibility and intonation for Bengali. Index Terms—Bengali Speech Synthesis, Text-ToSpeech (TTS), Hidden-Markov-Model (HMM), Bengali HTS.
Pronunciation dictionary is one of the important components for the speech technology development... more Pronunciation dictionary is one of the important components for the speech technology development for a particular language. This is because it represents the interface between speech analysis on the acoustic level and speech interpretation. The W3C Voice Browser Activity has published a Pronunciation Lexicon Specification (PLS) Version 1.0 [1] for generation of PLS in different languages. This paper proposes some modification of the published PLS specification with respect to Indian languages with Bengali as a typical case study.
Different Bengali TTS systems are already available on a resourceful platform such as a personal ... more Different Bengali TTS systems are already available on a resourceful platform such as a personal computer. However, porting these systems to a resource limited device such as a mobile phone is not an easy task. Practical aspects including application size and processing time have to be concerned. This paper describes the implementation of a Bengali speech synthesizer on a mobile device. For speech generation we used Epoch Synchronous Non Overlap Add (ESNOLA) based concatenative speech synthesis technique which uses the partnemes as the smallest signal units for concatenations.
Restoration of cave paintings is the process of improving visual quality of degraded images. Sour... more Restoration of cave paintings is the process of improving visual quality of degraded images. Source-constrained exemplar-based inpainting has been used in this work to restore the images of old degraded cave paintings. A modification to the traditional exemplar-based inpaintings, named PAtch Modified exemplar-based InpainTing (PAMIT), has been proposed. Traditional exemplar-based techniques use fixed patch size, which needs to be adjusted for different images. The proposed technique automates this process of adjustment. Results obtained by the proposed technique have been compared with various other inpainting techniques applied under the same source-constrained framework. The restored images by the proposed technique have been found to be visually better than those obtained by other exemplar-based techniques. In this regard, an objective measure of the BRISQUE score has been used to demonstrate the effectiveness of the proposed technique.
International Journal of Speech Technology
The performance of speaker recognition system is highly dependent on the amount of speech used in... more The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in presence of duration variability. This article also reports a comparison of the performance of GMM-SVM classifier with its precursor technique Gaussian mixture model-universal background model (GMM-UBM) classifier in presence of duration variability. The goal of this research work is not to propose a new algorithm for improving speaker recognition performance in presence of duration variability. However, the 2 main focus of this work is on utterance partitioning (UP), a commonly used strategy to compensate the duration variability issue. We have analysed in detailed the impact of training utterance partitioning in speaker recognition performance under GMM-SVM framework. We further investigate the reason why the utterance partitioning is important for boosting speaker recognition performance. We have also shown in which case the utterance partitioning could be useful and where not. Our study has revealed that utterance partitioning does not reduce the data imbalance problem of the GMM-SVM classifier as claimed in earlier study. Apart from these, we also discuss issues related to the impact of parameters such as number of Gaussians, supervector length, amount of splitting required for obtaining better performance in short and long duration test conditions from speech duration perspective. We have performed the experiments with telephone speech from POLYCOST corpus consisting of 130 speakers.
2017 International Conference on Next Generation Computing Technologies, 2017
This paper proposed a phoneme recognition and classification model for Bengali continuous speech.... more This paper proposed a phoneme recognition and classification model for Bengali continuous speech. A Deep Neural Network based model has been developed for the recognition and classification task where the Stacked Denoising Autoencoder is used to generatively pre-train the deep network. Autoencoders are stacked to form the deep-structured network. Mel-frequency cepstral coefficients are used as input data vector. In hidden layer, 200 numbers of hidden units have been utilized. The number of hidden layers of the deep network is kept as three. The phoneme posterior probability has been derived in the output layer. This proposed model has been trained and tested using unconstrained Bengali continuous speech data collected from the different sources (TV, Radio, and normal conversation in a laboratory). In recognition phase, the Phoneme Error Rate is reported for the deep-structured model as 24.62% and 26.37% respectively for the training and testing while in the classification task this model achieves 86.7% average phoneme classification accuracy in training and 82.53% in the testing phase.
2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), 2017
In this experiment, a phoneme classification model has been developed using a Deep Neural Network... more In this experiment, a phoneme classification model has been developed using a Deep Neural Network based framework. The experiment is conducted in two phases. In the first phase, phoneme classification task has been performed. The deep-structured model provided good overall classification accu racy of 87.8%. All the phonemes are classified with preci sion and recall values. A confusion matrix of all the Ben gali phonemes is derived. Using the confusion matrix, the phonemes are classified into nine groups. These nine groups provided better overall classification accuracy of 98.7%, and a new confusion matrix is derived for this nine groups. A lower confusion rate is observed this time. In the second phase of the experiment, the nine groups are reclassified into 15 groups using the manner of articulation based knowledge and the deep-structured model is retrained. The system provided 98.9% of overall classification accuracy this time. This result is almost equal to the overall accuracy w...
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Nov 2, 2016
In this paper we present a novel formulation of exemplar based image inpainting as a metric label... more In this paper we present a novel formulation of exemplar based image inpainting as a metric labeling problem, and solve it through simulated annealing algorithm. Due to their greedy nature exemplar based methods sometimes produce inpainted images which are visually inconsistent. These methods are highly dependent upon the initialization. To solve these problems, we generate five images with different initialization. A suitable mixture of these five images produces a good inpainted image. The cost function of the proposed metric labeling problem consists of three components, namely neighbor cost, total variation cost, and structure cost. A linear combination among these components is used to maintain better visual consistency in the inpainted region having smooth transition from the bordering regions of the source image. We use a quality measure to this end. Our experiments on a wide variety of images demonstrate that the proposed technique produces better inpainting images as compar...
2017 4th International Conference on Signal Processing and Integrated Networks (SPIN), 2017
In this paper the place and manner of articulation based phonological features are detected and c... more In this paper the place and manner of articulation based phonological features are detected and classified. Deep Neural Network based model has been used for detection and classification task. The deep structured model is pre-trained by stacked denoising autoencoder. The system obtained 89.17% overall accuracy in detection task. In case of classification task, 50.2% of classification accuracy is observed for classifying the place of articulation based features. The manner of articulation is divided into 15 groups based on some manner based knowledge combination and classification task is performed to achieve 98.9% of classification accuracy. Index Terms-Phonological features, place of articulation, manner of articulation, stacked denoising autoencoder, detection and classification, deep neural network.
— English lexical stress is acoustically related to combination of fundamental frequency (F0), du... more — English lexical stress is acoustically related to combination of fundamental frequency (F0), duration, intensity and vowel quality. Current study compares the use of these correlates by 10 L1 English and 20 L1 Bengali speakers to find out which correlates are most difficult for Bengali speakers to acquire. Results showed that English and Bengali speakers used the acoustic correlates of vowel duration, intensity and F0 in similar manner, but Bengali speakers produced significantly less English like stress patterns. English speakers reduced vowel duration significantly more in the unstressed vowels compared to Bengali speakers and degree of intensity and F0 increase in stressed vowels by English speakers was higher than that by Bengali speakers. Moreover Bengali speakers produced English like vowel quality in certain unstressed syllables, but in other cases there were significant differences in vowel quality across groups. This study supports the idea of interference from L1 to L2 (nonnative) phonology.
Methodologies and Intelligent Systems for Technology Enhanced Learning, 10th International Conference
Interspeech 2014
In Text to Speech synthesis system F 0 contour plays an important role in conveying prosodic info... more In Text to Speech synthesis system F 0 contour plays an important role in conveying prosodic information but the process of synthesizing F 0 contour from the underlying linguistic information using deep architecture has not been investigated in case of Bengali languages. This paper describes a method for synthesizing F0 contours of Bengali readout speech from the textual features of input text using Deep Boltzmann Machine (DBM) and Twin Gaussian Process (TGP) hybrid model. DBM will capture the high-level linguistic structure of input text and improve the prediction accuracy when plugged into the TGP model. Unlike Gaussian Process (GP) models which only focus on the prediction of a single output (F 0), TGP can generalize across multiple outputs (F 0 , delta F 0 , delta-delta F 0) by encoding relations between both inputs and outputs with GP priors. The performance of the proposed method is evaluated and compared with other available methods using objective and perceptual listening tests and the results are found to be satisfactory.
The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages
In detection-based, bottom-up speech recognition procedures, the segmental features like phonolog... more In detection-based, bottom-up speech recognition procedures, the segmental features like phonological feature based speech attributes act as one of the key component for the recognition model. In this study, place and manner of articulation based phonological features have been detected and they are integrated with the supra-segmental parameters of speech to develop the Auotmatic Speech Recognition (ASR) system for various under-resourced languages. For detection purpose a bank of phonological feature detector has been designed. Deep Neu-ral Network (DNN) based attribute detector performed well to detect the phonological features. This paper also reports a comparative distribution of the (DNN) based attribute detector and the same using multi layer Perceptron (MLP). For continuous spoken speech, the Bengali CDAC speech corpus has been used. The deep neural based attribute detector achieved an average frame level accuracy of 88.26% is achieved whereas the same for MLP based detector is measured as 86.18%.
Contemporary Educational Technology
Cognitive learning complexity identification of assessment questions is an essential task in the ... more Cognitive learning complexity identification of assessment questions is an essential task in the domain of education, as it helps both the teacher and the learner to discover the thinking process required to answer a given question. Bloom's Taxonomy cognitive levels are considered as a benchmark standard for the classification of cognitive thinking (learning complexity) in an educational environment. However, it was observed that some of the action verbs of Bloom's Taxonomy are overlapping in multiple levels of the hierarchy, causing ambiguity about the real sense of cognition required. The paper describes two methodologies to automatically identify the cognitive learning complexity of given questions. The first methodology uses labelled Latent Dirichlet Allocation (LDA) as a machine learning approach. The second methodology uses the BERT framework for multi-class text classification for deep learning. The experiments were performed on an ensemble of 3000+ educational questions, which were based on previously published datasets along with the TREC question corpus and AI2 Biology How/Why question corpus datasets. The labelled LDA reached an accuracy of 83% while BERT based approach reached 89% accuracy. An analysis of both the results is shown, evaluating the significant factors responsible for determining cognitive knowledge.
Procedia Computer Science
Language Resources and Evaluation
Procedia Computer Science
The phonological features are the most basic unit of a speech knowledge hierarchy. This paper rep... more The phonological features are the most basic unit of a speech knowledge hierarchy. This paper reports about detection and classification of phonological features from Bengali continuous speech. The phonological features are based on place and manner of articulation. All the experiments are performed by a deep neural network based framework. Two different models are designed for the detection and classification task. The deep-structured models are pre-trained by stacked autoencoder. The C-DAC speech corpus is used for continuous spoken Bengali speech data. Frame wise cepstral representation is provided in the input layer of the deep-structured model. Speech data from multiple speakers has been used to confirm speaker-independency. In detection task, the system achieved 86.19% average overall accuracy. In the classification task, accuracy for the classification of place of articulation remains low with 50.2% while in manner-based classification, the system delivered an improved performance with 98.9% accuracy.
Annual Meeting of the Association for Computational Linguistics, 2012
2019 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)
We have designed and developed a non-invasive blood glucose level monitoring system. The electrom... more We have designed and developed a non-invasive blood glucose level monitoring system. The electromagnetic spectrum of both visible and infrared region has been used for the study of absorption, refraction, and reflection of bio-molecules. In this article, we have shown the system model and data acquisition platform based on Android user interface. The system model includes a proposed multi-sensor signal processing sub-system, an embedded platform, and a Bluetooth wireless transmission subsystem. The proposed system has been investigated on normal and diabetic patients. The stability of the device is tested using a one-way ANOVA test. The result of the hypothesis test has shown in favour of acceptance of the datasets provided by the device.
2019 IEEE Tenth International Conference on Technology for Education (T4E), 2019
: Outcome-Based Education (OBE) is a learner-centric teaching and learning methodology in which t... more : Outcome-Based Education (OBE) is a learner-centric teaching and learning methodology in which the course delivery, assessment is planned to achieve stated outcomes. The paper describes a methodology for developing an outcome-based curriculum based on a framework design by the Indian Institute of Technology Kharagpur and methodology for measuring the effectiveness of the outcome-based curriculum based on learner feedback. The effectiveness was measured in five dimensions - (i) Learner Engagement, (ii) Active and Collaborative Learning, (iii) Continuous Updation and promotion of self-learning, (iv) Promotion of Higher-Order Thinking, (vi) Learner Satisfaction. The study indicates that the methodology was useful to promote self-learning, Engagement and promote higher-order thinking.
ArXiv, 2014
The paper presents the capability of an HMM-based TTS system to produce Bengali speech. In this s... more The paper presents the capability of an HMM-based TTS system to produce Bengali speech. In this synthesis method, trajectories of speech parameters are generated from the trained Hidden Markov Models. A final speech waveform is synthesized from those speech parameters. In our experiments, spectral properties were represented by Mel Cepstrum Coefficients. Both the training and synthesis issues are investigated in this paper using annotated Bengali speech database. Experimental evaluation depicts that the developed text-to-speech system is capable of producing adequately natural speech in terms of intelligibility and intonation for Bengali. Index Terms—Bengali Speech Synthesis, Text-ToSpeech (TTS), Hidden-Markov-Model (HMM), Bengali HTS.
Pronunciation dictionary is one of the important components for the speech technology development... more Pronunciation dictionary is one of the important components for the speech technology development for a particular language. This is because it represents the interface between speech analysis on the acoustic level and speech interpretation. The W3C Voice Browser Activity has published a Pronunciation Lexicon Specification (PLS) Version 1.0 [1] for generation of PLS in different languages. This paper proposes some modification of the published PLS specification with respect to Indian languages with Bengali as a typical case study.
Different Bengali TTS systems are already available on a resourceful platform such as a personal ... more Different Bengali TTS systems are already available on a resourceful platform such as a personal computer. However, porting these systems to a resource limited device such as a mobile phone is not an easy task. Practical aspects including application size and processing time have to be concerned. This paper describes the implementation of a Bengali speech synthesizer on a mobile device. For speech generation we used Epoch Synchronous Non Overlap Add (ESNOLA) based concatenative speech synthesis technique which uses the partnemes as the smallest signal units for concatenations.
Restoration of cave paintings is the process of improving visual quality of degraded images. Sour... more Restoration of cave paintings is the process of improving visual quality of degraded images. Source-constrained exemplar-based inpainting has been used in this work to restore the images of old degraded cave paintings. A modification to the traditional exemplar-based inpaintings, named PAtch Modified exemplar-based InpainTing (PAMIT), has been proposed. Traditional exemplar-based techniques use fixed patch size, which needs to be adjusted for different images. The proposed technique automates this process of adjustment. Results obtained by the proposed technique have been compared with various other inpainting techniques applied under the same source-constrained framework. The restored images by the proposed technique have been found to be visually better than those obtained by other exemplar-based techniques. In this regard, an objective measure of the BRISQUE score has been used to demonstrate the effectiveness of the proposed technique.
International Journal of Speech Technology
The performance of speaker recognition system is highly dependent on the amount of speech used in... more The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in presence of duration variability. This article also reports a comparison of the performance of GMM-SVM classifier with its precursor technique Gaussian mixture model-universal background model (GMM-UBM) classifier in presence of duration variability. The goal of this research work is not to propose a new algorithm for improving speaker recognition performance in presence of duration variability. However, the 2 main focus of this work is on utterance partitioning (UP), a commonly used strategy to compensate the duration variability issue. We have analysed in detailed the impact of training utterance partitioning in speaker recognition performance under GMM-SVM framework. We further investigate the reason why the utterance partitioning is important for boosting speaker recognition performance. We have also shown in which case the utterance partitioning could be useful and where not. Our study has revealed that utterance partitioning does not reduce the data imbalance problem of the GMM-SVM classifier as claimed in earlier study. Apart from these, we also discuss issues related to the impact of parameters such as number of Gaussians, supervector length, amount of splitting required for obtaining better performance in short and long duration test conditions from speech duration perspective. We have performed the experiments with telephone speech from POLYCOST corpus consisting of 130 speakers.
2017 International Conference on Next Generation Computing Technologies, 2017
This paper proposed a phoneme recognition and classification model for Bengali continuous speech.... more This paper proposed a phoneme recognition and classification model for Bengali continuous speech. A Deep Neural Network based model has been developed for the recognition and classification task where the Stacked Denoising Autoencoder is used to generatively pre-train the deep network. Autoencoders are stacked to form the deep-structured network. Mel-frequency cepstral coefficients are used as input data vector. In hidden layer, 200 numbers of hidden units have been utilized. The number of hidden layers of the deep network is kept as three. The phoneme posterior probability has been derived in the output layer. This proposed model has been trained and tested using unconstrained Bengali continuous speech data collected from the different sources (TV, Radio, and normal conversation in a laboratory). In recognition phase, the Phoneme Error Rate is reported for the deep-structured model as 24.62% and 26.37% respectively for the training and testing while in the classification task this model achieves 86.7% average phoneme classification accuracy in training and 82.53% in the testing phase.
2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), 2017
In this experiment, a phoneme classification model has been developed using a Deep Neural Network... more In this experiment, a phoneme classification model has been developed using a Deep Neural Network based framework. The experiment is conducted in two phases. In the first phase, phoneme classification task has been performed. The deep-structured model provided good overall classification accu racy of 87.8%. All the phonemes are classified with preci sion and recall values. A confusion matrix of all the Ben gali phonemes is derived. Using the confusion matrix, the phonemes are classified into nine groups. These nine groups provided better overall classification accuracy of 98.7%, and a new confusion matrix is derived for this nine groups. A lower confusion rate is observed this time. In the second phase of the experiment, the nine groups are reclassified into 15 groups using the manner of articulation based knowledge and the deep-structured model is retrained. The system provided 98.9% of overall classification accuracy this time. This result is almost equal to the overall accuracy w...
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Nov 2, 2016
In this paper we present a novel formulation of exemplar based image inpainting as a metric label... more In this paper we present a novel formulation of exemplar based image inpainting as a metric labeling problem, and solve it through simulated annealing algorithm. Due to their greedy nature exemplar based methods sometimes produce inpainted images which are visually inconsistent. These methods are highly dependent upon the initialization. To solve these problems, we generate five images with different initialization. A suitable mixture of these five images produces a good inpainted image. The cost function of the proposed metric labeling problem consists of three components, namely neighbor cost, total variation cost, and structure cost. A linear combination among these components is used to maintain better visual consistency in the inpainted region having smooth transition from the bordering regions of the source image. We use a quality measure to this end. Our experiments on a wide variety of images demonstrate that the proposed technique produces better inpainting images as compar...
2017 4th International Conference on Signal Processing and Integrated Networks (SPIN), 2017
In this paper the place and manner of articulation based phonological features are detected and c... more In this paper the place and manner of articulation based phonological features are detected and classified. Deep Neural Network based model has been used for detection and classification task. The deep structured model is pre-trained by stacked denoising autoencoder. The system obtained 89.17% overall accuracy in detection task. In case of classification task, 50.2% of classification accuracy is observed for classifying the place of articulation based features. The manner of articulation is divided into 15 groups based on some manner based knowledge combination and classification task is performed to achieve 98.9% of classification accuracy. Index Terms-Phonological features, place of articulation, manner of articulation, stacked denoising autoencoder, detection and classification, deep neural network.
— English lexical stress is acoustically related to combination of fundamental frequency (F0), du... more — English lexical stress is acoustically related to combination of fundamental frequency (F0), duration, intensity and vowel quality. Current study compares the use of these correlates by 10 L1 English and 20 L1 Bengali speakers to find out which correlates are most difficult for Bengali speakers to acquire. Results showed that English and Bengali speakers used the acoustic correlates of vowel duration, intensity and F0 in similar manner, but Bengali speakers produced significantly less English like stress patterns. English speakers reduced vowel duration significantly more in the unstressed vowels compared to Bengali speakers and degree of intensity and F0 increase in stressed vowels by English speakers was higher than that by Bengali speakers. Moreover Bengali speakers produced English like vowel quality in certain unstressed syllables, but in other cases there were significant differences in vowel quality across groups. This study supports the idea of interference from L1 to L2 (nonnative) phonology.
Methodologies and Intelligent Systems for Technology Enhanced Learning, 10th International Conference
Interspeech 2014
In Text to Speech synthesis system F 0 contour plays an important role in conveying prosodic info... more In Text to Speech synthesis system F 0 contour plays an important role in conveying prosodic information but the process of synthesizing F 0 contour from the underlying linguistic information using deep architecture has not been investigated in case of Bengali languages. This paper describes a method for synthesizing F0 contours of Bengali readout speech from the textual features of input text using Deep Boltzmann Machine (DBM) and Twin Gaussian Process (TGP) hybrid model. DBM will capture the high-level linguistic structure of input text and improve the prediction accuracy when plugged into the TGP model. Unlike Gaussian Process (GP) models which only focus on the prediction of a single output (F 0), TGP can generalize across multiple outputs (F 0 , delta F 0 , delta-delta F 0) by encoding relations between both inputs and outputs with GP priors. The performance of the proposed method is evaluated and compared with other available methods using objective and perceptual listening tests and the results are found to be satisfactory.
The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages
In detection-based, bottom-up speech recognition procedures, the segmental features like phonolog... more In detection-based, bottom-up speech recognition procedures, the segmental features like phonological feature based speech attributes act as one of the key component for the recognition model. In this study, place and manner of articulation based phonological features have been detected and they are integrated with the supra-segmental parameters of speech to develop the Auotmatic Speech Recognition (ASR) system for various under-resourced languages. For detection purpose a bank of phonological feature detector has been designed. Deep Neu-ral Network (DNN) based attribute detector performed well to detect the phonological features. This paper also reports a comparative distribution of the (DNN) based attribute detector and the same using multi layer Perceptron (MLP). For continuous spoken speech, the Bengali CDAC speech corpus has been used. The deep neural based attribute detector achieved an average frame level accuracy of 88.26% is achieved whereas the same for MLP based detector is measured as 86.18%.
Contemporary Educational Technology
Cognitive learning complexity identification of assessment questions is an essential task in the ... more Cognitive learning complexity identification of assessment questions is an essential task in the domain of education, as it helps both the teacher and the learner to discover the thinking process required to answer a given question. Bloom's Taxonomy cognitive levels are considered as a benchmark standard for the classification of cognitive thinking (learning complexity) in an educational environment. However, it was observed that some of the action verbs of Bloom's Taxonomy are overlapping in multiple levels of the hierarchy, causing ambiguity about the real sense of cognition required. The paper describes two methodologies to automatically identify the cognitive learning complexity of given questions. The first methodology uses labelled Latent Dirichlet Allocation (LDA) as a machine learning approach. The second methodology uses the BERT framework for multi-class text classification for deep learning. The experiments were performed on an ensemble of 3000+ educational questions, which were based on previously published datasets along with the TREC question corpus and AI2 Biology How/Why question corpus datasets. The labelled LDA reached an accuracy of 83% while BERT based approach reached 89% accuracy. An analysis of both the results is shown, evaluating the significant factors responsible for determining cognitive knowledge.
Procedia Computer Science
Language Resources and Evaluation
Procedia Computer Science
The phonological features are the most basic unit of a speech knowledge hierarchy. This paper rep... more The phonological features are the most basic unit of a speech knowledge hierarchy. This paper reports about detection and classification of phonological features from Bengali continuous speech. The phonological features are based on place and manner of articulation. All the experiments are performed by a deep neural network based framework. Two different models are designed for the detection and classification task. The deep-structured models are pre-trained by stacked autoencoder. The C-DAC speech corpus is used for continuous spoken Bengali speech data. Frame wise cepstral representation is provided in the input layer of the deep-structured model. Speech data from multiple speakers has been used to confirm speaker-independency. In detection task, the system achieved 86.19% average overall accuracy. In the classification task, accuracy for the classification of place of articulation remains low with 50.2% while in manner-based classification, the system delivered an improved performance with 98.9% accuracy.