KOMAL NAAZ | Bit Mesra (original) (raw)
Papers by KOMAL NAAZ
ACM Transactions on Asian and Low-Resource Language Information Processing, Jun 16, 2023
Literary compositions are very often analyzed using various constituent units like words, phrases... more Literary compositions are very often analyzed using various constituent units like words, phrases, sentences, and paragraphs. Unlike the conventional research that focuses on the aforementioned constituent units, our task is a statistical effort carried out on the most fundamental unit of any literary composition called varna , or character, followed by automated classification using learning algorithms. This article is a case study on the Hindi adaptations of two significant literary pieces, namely, Jana-Gaṇa-Mana and Vande-Mātaram , and acknowledging that the two songs being studied belong to different classes based on their bhava, i.e., the inherent emotion of the poem. The present task is the first of its kind that uses the concept of komala and kaṭhora varna to establish diversity between the two. The two-proportion Z-test is successfully applied to statistical data pertaining to the candidate songs, thereby reestablishing the theoretical assertions by investigating real pieces of literature. Taking the statistical verification as ground, a learning-based classification system is designed to yield the best accuracy of 85%, which further compliments the theory reestablished statistically.
Digital Scholarship in the Humanities, Jul 21, 2023
Indian literary heritage is vast and is of great importance; to explore it, one has to devote the... more Indian literary heritage is vast and is of great importance; to explore it, one has to devote them in studying dialects (Khariboli, Haryanavi, Brajbhasha, Awadhi, Bhojpuri, Marwari, etc.) especially when old Hindi is under the lens of observation. Chanda are poetic compositions that have well-defined structures. Dohā is a kind of chanda which, in our work, has been explored using Kabir’s compositions as a case study. Kabir represents the cult of poets who relied on oral means for the propagation and consumption of poetry. The poems were communicated to the later generation through simple acts of recitation and hearing, leading to obvious mutation and multiple versions of the same compositions when documented later and hence needs restoration (at least metrically). Using the knowledge from the state-of-art models metadata generator, Text2Mātrā, and RPaGen and extending beyond them, this article is first of its kind to present Kabir’s dohā within the scope of restoration, metrical computation, and statistics. Starting with the restoration process, the proposed algorithms generate data that are subjected to suitable statistical models. These models highlight the trends in the dataset, which is helpful in realizing abstract patterns inside the textual data.
Digital Scholarship in the Humanities, Mar 20, 2023
With the advancement in technology and digitalization of resources, computation of humanities pro... more With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fixed number of features out of it (feature extraction) made the classification possible using machine learning approach which was yet untouched and can act as benchmark for the concerned area of research. The article deals with six popular and similar types of Hindi poems. The dataset is collected and processed to form an early dataset that undergoes two levels of data transformation and feature engineering, resulting in the pre-processed dataset. The pre-processed dataset is then fed as input to selected machine learning models (Bernoulli Naïve Bayes, k-nearest neighbour, random forest, and support vector machine) producing classification result with best accuracy of 99%, that further undergoes a post-processing step based on observed misclassifications.
ACM Transactions on Asian and Low-Resource Language Information Processing
Literary compositions are very often analyzed using various constituent units like words, phrases... more Literary compositions are very often analyzed using various constituent units like words, phrases, sentences, and paragraphs. Unlike the conventional research that focuses on the aforementioned constituent units, our task is a statistical effort carried out on the most fundamental unit of any literary composition called varna , or character, followed by automated classification using learning algorithms. This article is a case study on the Hindi adaptations of two significant literary pieces, namely, Jana-Gaṇa-Mana and Vande-Mātaram , and acknowledging that the two songs being studied belong to different classes based on their bhava, i.e., the inherent emotion of the poem. The present task is the first of its kind that uses the concept of komala and kaṭhora varna to establish diversity between the two. The two-proportion Z-test is successfully applied to statistical data pertaining to the candidate songs, thereby reestablishing the theoretical assertions by investigating real pieces...
Digital Scholarship in the Humanities
With the advancement in technology and digitalization of resources, computation of humanities pro... more With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fix...
IEEE Access
Poetry writing is a qualitative subject and so is its analysis. Mapping of these poetic elements ... more Poetry writing is a qualitative subject and so is its analysis. Mapping of these poetic elements onto a scale of real number is a lacking necessity. Albeit, Hindi literary heritage, being so huge and glorified, there is remarkably very few computational works done exploring the underlying structures. Out of which most of them is to detect a particular metre rather than a generalized approach. The state-of-art metadata generator fails to provide any measures of underlying structural elements of poetry. There is no automated system that generates rhyming pattern hidden in a poem for Hindi language or a system to detect and estimate the extent of figure of speech in a given text of any language. In this article, to extract and evaluate elements of poetry, three efficient tools, namely Text2Mātrā, RPaGen and FoSCal, have been designed and developed. The Text2Mātrā tool provides the numeral scansion for any Hindi input text, which can serve as basis for copious analytical and detection work. RPaGen detects the poem type of any input poem and outputs its rhyming pattern. FoSCal gives a quantitative representation of detected figures of speech in any input text, using the scoring scheme formulated using fuzzy approach and weighted analysis. These tools may find their utility in various fields such as education, literary criticism, philology, authorship-attribution, etc. There have been various computational activities done in the field of poetry analysis over the various languages across the world. However, quantifying the extent of Figure of Speech in poetic compositions, in any language, is entirely a novel approach. Mapping the aesthetic properties of a subjective idea (like poetry) onto a numeral scale, to the best of our knowledge, is first of its kind for Hindi language.
ACM Transactions on Asian and Low-Resource Language Information Processing, Jun 16, 2023
Literary compositions are very often analyzed using various constituent units like words, phrases... more Literary compositions are very often analyzed using various constituent units like words, phrases, sentences, and paragraphs. Unlike the conventional research that focuses on the aforementioned constituent units, our task is a statistical effort carried out on the most fundamental unit of any literary composition called varna , or character, followed by automated classification using learning algorithms. This article is a case study on the Hindi adaptations of two significant literary pieces, namely, Jana-Gaṇa-Mana and Vande-Mātaram , and acknowledging that the two songs being studied belong to different classes based on their bhava, i.e., the inherent emotion of the poem. The present task is the first of its kind that uses the concept of komala and kaṭhora varna to establish diversity between the two. The two-proportion Z-test is successfully applied to statistical data pertaining to the candidate songs, thereby reestablishing the theoretical assertions by investigating real pieces of literature. Taking the statistical verification as ground, a learning-based classification system is designed to yield the best accuracy of 85%, which further compliments the theory reestablished statistically.
Digital Scholarship in the Humanities, Jul 21, 2023
Indian literary heritage is vast and is of great importance; to explore it, one has to devote the... more Indian literary heritage is vast and is of great importance; to explore it, one has to devote them in studying dialects (Khariboli, Haryanavi, Brajbhasha, Awadhi, Bhojpuri, Marwari, etc.) especially when old Hindi is under the lens of observation. Chanda are poetic compositions that have well-defined structures. Dohā is a kind of chanda which, in our work, has been explored using Kabir’s compositions as a case study. Kabir represents the cult of poets who relied on oral means for the propagation and consumption of poetry. The poems were communicated to the later generation through simple acts of recitation and hearing, leading to obvious mutation and multiple versions of the same compositions when documented later and hence needs restoration (at least metrically). Using the knowledge from the state-of-art models metadata generator, Text2Mātrā, and RPaGen and extending beyond them, this article is first of its kind to present Kabir’s dohā within the scope of restoration, metrical computation, and statistics. Starting with the restoration process, the proposed algorithms generate data that are subjected to suitable statistical models. These models highlight the trends in the dataset, which is helpful in realizing abstract patterns inside the textual data.
Digital Scholarship in the Humanities, Mar 20, 2023
With the advancement in technology and digitalization of resources, computation of humanities pro... more With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fixed number of features out of it (feature extraction) made the classification possible using machine learning approach which was yet untouched and can act as benchmark for the concerned area of research. The article deals with six popular and similar types of Hindi poems. The dataset is collected and processed to form an early dataset that undergoes two levels of data transformation and feature engineering, resulting in the pre-processed dataset. The pre-processed dataset is then fed as input to selected machine learning models (Bernoulli Naïve Bayes, k-nearest neighbour, random forest, and support vector machine) producing classification result with best accuracy of 99%, that further undergoes a post-processing step based on observed misclassifications.
ACM Transactions on Asian and Low-Resource Language Information Processing
Literary compositions are very often analyzed using various constituent units like words, phrases... more Literary compositions are very often analyzed using various constituent units like words, phrases, sentences, and paragraphs. Unlike the conventional research that focuses on the aforementioned constituent units, our task is a statistical effort carried out on the most fundamental unit of any literary composition called varna , or character, followed by automated classification using learning algorithms. This article is a case study on the Hindi adaptations of two significant literary pieces, namely, Jana-Gaṇa-Mana and Vande-Mātaram , and acknowledging that the two songs being studied belong to different classes based on their bhava, i.e., the inherent emotion of the poem. The present task is the first of its kind that uses the concept of komala and kaṭhora varna to establish diversity between the two. The two-proportion Z-test is successfully applied to statistical data pertaining to the candidate songs, thereby reestablishing the theoretical assertions by investigating real pieces...
Digital Scholarship in the Humanities
With the advancement in technology and digitalization of resources, computation of humanities pro... more With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fix...
IEEE Access
Poetry writing is a qualitative subject and so is its analysis. Mapping of these poetic elements ... more Poetry writing is a qualitative subject and so is its analysis. Mapping of these poetic elements onto a scale of real number is a lacking necessity. Albeit, Hindi literary heritage, being so huge and glorified, there is remarkably very few computational works done exploring the underlying structures. Out of which most of them is to detect a particular metre rather than a generalized approach. The state-of-art metadata generator fails to provide any measures of underlying structural elements of poetry. There is no automated system that generates rhyming pattern hidden in a poem for Hindi language or a system to detect and estimate the extent of figure of speech in a given text of any language. In this article, to extract and evaluate elements of poetry, three efficient tools, namely Text2Mātrā, RPaGen and FoSCal, have been designed and developed. The Text2Mātrā tool provides the numeral scansion for any Hindi input text, which can serve as basis for copious analytical and detection work. RPaGen detects the poem type of any input poem and outputs its rhyming pattern. FoSCal gives a quantitative representation of detected figures of speech in any input text, using the scoring scheme formulated using fuzzy approach and weighted analysis. These tools may find their utility in various fields such as education, literary criticism, philology, authorship-attribution, etc. There have been various computational activities done in the field of poetry analysis over the various languages across the world. However, quantifying the extent of Figure of Speech in poetic compositions, in any language, is entirely a novel approach. Mapping the aesthetic properties of a subjective idea (like poetry) onto a numeral scale, to the best of our knowledge, is first of its kind for Hindi language.