Lincy Mathews - Academia.edu (original) (raw)
Papers by Lincy Mathews
Proceedings of International Conference on Computational Intelligence and Data Engineering, 2022
The creation of an abstract over a text document prepared by a computer program is defined as an ... more The creation of an abstract over a text document prepared by a computer program is defined as an Automatic Text Summarizer. This abstract of the text document must however contain all the salient features of the original document. This paper tries to cover the necessary functional modules that complete an automatic text summarizer. It also highlights the trends and challenges in text summarization. Surveys of certain text summarization techniques are also mentioned.
— The creation of an abstract over a text document prepared by a computer program is defined as a... more — The creation of an abstract over a text document prepared by a computer program is defined as an Automatic Text Summarizer. This abstract of the text document must however contain all the salient features of the original document. This paper tries to cover the necessary functional modules that complete an automatic text summarizer. It also highlights the trends and challenges in text summarization. Surveys of certain text summarization techniques are also mentioned.
When data classes are differently represented in one v. other data segment to be mined, it genera... more When data classes are differently represented in one v. other data segment to be mined, it generates the imbalanced two-class data challenge. Many health-related datasets comprising categorical data are faced with the class imbalance challenge. This paper aims to address the limitations of imbalanced two-class categorical data and presents a re-sampling solution known as ‘Syn_Gen_Min' (SGM) to improve the class imbalance ratio. SGM involves finding the greedy neighbors for a given minority sample. To the best of one's knowledge, the accepted approach for a classifier is to find the numeric equivalence for categorical attributes, resulting in the loss of information. The novelty of this contribution is that the categorical attributes are kept in their raw form. Five distinct categorical similarity measures are employed and tested against six real-world datasets derived within the healthcare sector. The application of these similarity methods leads to the generation of differe...
Encyclopedia of Information Science and Technology, Fourth Edition
A very challenging issue in real world data is that in many domains like medicine, finance, marke... more A very challenging issue in real world data is that in many domains like medicine, finance, marketing, web, telecommunication, management etc., the distribution of data among classes is inherently imbalanced. A widely accepted researched issue is that the traditional classifier algorithms assume a balanced distribution among the classes. Data imbalance is evident when the number of instances representing the class of concern is much lesser than other classes. Hence, the classifiers tend to bias towards the well-represented class. This leads to a higher misclassification rate among the lesser represented class. Hence, there is a need of efficient learners to classify imbalanced data. This chapter aims to address the need, challenges, existing methods and evaluation metrics identified when learning from imbalanced data sets. Future research challenges and directions are highlighted.
Indian Journal of Science and Technology, 2016
Cybernetics and Information Technologies, 2017
Mining of imbalanced data isachallenging task due to its complex inherent characteristics. The co... more Mining of imbalanced data isachallenging task due to its complex inherent characteristics. The conventional classifiers such as the nearest neighbor severely bias towards the majority class, as minority class data are under-represented and outnumbered. This paper focuses on building an improved Nearest Neighbor Classifier foratwo class imbalanced data. Three oversampling techniques are presented, for generation of artificial instances for the minority class for balancing the distribution among the classes. Experimental results showed that the proposed methods outperformed the conventional classifier.
Abstract???The education system in rural and semi-rural areas of developing and underdeveloped co... more Abstract???The education system in rural and semi-rural areas of developing and underdeveloped countries are facing many challenges. The limited accessibility and challenges to the education are attributed mainly to political, economic and social issues ...
Proceedings of International Conference on Computational Intelligence and Data Engineering, 2022
The creation of an abstract over a text document prepared by a computer program is defined as an ... more The creation of an abstract over a text document prepared by a computer program is defined as an Automatic Text Summarizer. This abstract of the text document must however contain all the salient features of the original document. This paper tries to cover the necessary functional modules that complete an automatic text summarizer. It also highlights the trends and challenges in text summarization. Surveys of certain text summarization techniques are also mentioned.
— The creation of an abstract over a text document prepared by a computer program is defined as a... more — The creation of an abstract over a text document prepared by a computer program is defined as an Automatic Text Summarizer. This abstract of the text document must however contain all the salient features of the original document. This paper tries to cover the necessary functional modules that complete an automatic text summarizer. It also highlights the trends and challenges in text summarization. Surveys of certain text summarization techniques are also mentioned.
When data classes are differently represented in one v. other data segment to be mined, it genera... more When data classes are differently represented in one v. other data segment to be mined, it generates the imbalanced two-class data challenge. Many health-related datasets comprising categorical data are faced with the class imbalance challenge. This paper aims to address the limitations of imbalanced two-class categorical data and presents a re-sampling solution known as ‘Syn_Gen_Min' (SGM) to improve the class imbalance ratio. SGM involves finding the greedy neighbors for a given minority sample. To the best of one's knowledge, the accepted approach for a classifier is to find the numeric equivalence for categorical attributes, resulting in the loss of information. The novelty of this contribution is that the categorical attributes are kept in their raw form. Five distinct categorical similarity measures are employed and tested against six real-world datasets derived within the healthcare sector. The application of these similarity methods leads to the generation of differe...
Encyclopedia of Information Science and Technology, Fourth Edition
A very challenging issue in real world data is that in many domains like medicine, finance, marke... more A very challenging issue in real world data is that in many domains like medicine, finance, marketing, web, telecommunication, management etc., the distribution of data among classes is inherently imbalanced. A widely accepted researched issue is that the traditional classifier algorithms assume a balanced distribution among the classes. Data imbalance is evident when the number of instances representing the class of concern is much lesser than other classes. Hence, the classifiers tend to bias towards the well-represented class. This leads to a higher misclassification rate among the lesser represented class. Hence, there is a need of efficient learners to classify imbalanced data. This chapter aims to address the need, challenges, existing methods and evaluation metrics identified when learning from imbalanced data sets. Future research challenges and directions are highlighted.
Indian Journal of Science and Technology, 2016
Cybernetics and Information Technologies, 2017
Mining of imbalanced data isachallenging task due to its complex inherent characteristics. The co... more Mining of imbalanced data isachallenging task due to its complex inherent characteristics. The conventional classifiers such as the nearest neighbor severely bias towards the majority class, as minority class data are under-represented and outnumbered. This paper focuses on building an improved Nearest Neighbor Classifier foratwo class imbalanced data. Three oversampling techniques are presented, for generation of artificial instances for the minority class for balancing the distribution among the classes. Experimental results showed that the proposed methods outperformed the conventional classifier.
Abstract???The education system in rural and semi-rural areas of developing and underdeveloped co... more Abstract???The education system in rural and semi-rural areas of developing and underdeveloped countries are facing many challenges. The limited accessibility and challenges to the education are attributed mainly to political, economic and social issues ...