Rupam Gupta - Academia.edu (original) (raw)
Related Authors
Uploads
Papers by Rupam Gupta
International Journal of Computer Technology and Applications, Mar 2014
This paper contains analysis of four different affix-removal stemmers after empirically executing... more This paper contains analysis of four different affix-removal stemmers after empirically executing them on different text data. The stemmers were Porter, Lovins, Paice and Krovetz-Stemmer. Each algorithm’s individual step’s and its functionality shows how each one identifies affix for removing and recoding the words to generate stem term. The strength of individual stemmer and computational time are also calculated here. This paper also focuses on variation in stem terms generation by these four stemmers from same input data-set. All set of stem terms shown in this paper, are created by execution of our comparative stemming tool implemented in java using standard input data-set.
LemmaQuest Lemmatizer: A Morphological Analyzer Handling Nominalization
IETE Journal of Research, 2022
This paper contains analysis of four different affix-removal stemmers after empirically executing... more This paper contains analysis of four different affix-removal stemmers after empirically executing them on different text data. The stemmers were Porter, Lovins, Paice and Krovetz-Stemmer. Each algorithm's individual step's and its functionality shows how each one identifies affix for removing and recoding the words to generate stem term. The strength of individual stemmer and computational time are also calculated here. This paper also focuses on variation in stem terms generation by these four stemmers from same input data-set. All set of stem terms shown in this paper, are created by execution of our comparative stemming tool implemented in java using standard input data-set.
Analyzing the Stemming Paradigm
This paper discusses affix removal and statistical based Stemming algorithms in detail with stemm... more This paper discusses affix removal and statistical based Stemming algorithms in detail with stemmer-generated output from some Standard English text and dictionary. Comparative empirical studies of all these stemmers are also discussed here with respect to number of stem token generation from single root morphed word variants and computation time. First part of the paper deals with introductory discussion of stemming and lemmatization. Second part of the paper focuses on algorithms of affix and statistical based stemmers with their empirical output. Last part describes the steps of the comparative tool for the same. Finally conclusion section wraps up whole discussion about stemming. This paper can assist researchers working in the field of text mining.
Text Mining is a discovery or technique through which interesting information and hidden knowledg... more Text Mining is a discovery or technique through which interesting information and hidden knowledge is automatically extracted from un-structured or semi-structured text. The critical part of understanding the textual data and giving an appropriate output as per the user requirement needs an initial important task of text pre-processing. Amongst the different pre-processing steps is an important one called Stemming. An improvisation to Stemming is Lemmatizing. This paper proposes a Lemmatization model which attempts to eliminate the shortcomings of the currently available popular Lemmatizers like the Stanford LemmaProcessor, Spacy Lemmatizer, LemmaGen, MorphAdorner etc. This model takes into account the nominalised/derived words for which correct lemmas are currently not generated by any available Lemmatizer. To develop a lemmatizer, the foremost challenge lies in understanding the morphological structure of any input English word and especially comprehending the derivational word’s ...
International Journal of Computer Technology and Applications, Mar 2014
This paper contains analysis of four different affix-removal stemmers after empirically executing... more This paper contains analysis of four different affix-removal stemmers after empirically executing them on different text data. The stemmers were Porter, Lovins, Paice and Krovetz-Stemmer. Each algorithm’s individual step’s and its functionality shows how each one identifies affix for removing and recoding the words to generate stem term. The strength of individual stemmer and computational time are also calculated here. This paper also focuses on variation in stem terms generation by these four stemmers from same input data-set. All set of stem terms shown in this paper, are created by execution of our comparative stemming tool implemented in java using standard input data-set.
LemmaQuest Lemmatizer: A Morphological Analyzer Handling Nominalization
IETE Journal of Research, 2022
This paper contains analysis of four different affix-removal stemmers after empirically executing... more This paper contains analysis of four different affix-removal stemmers after empirically executing them on different text data. The stemmers were Porter, Lovins, Paice and Krovetz-Stemmer. Each algorithm's individual step's and its functionality shows how each one identifies affix for removing and recoding the words to generate stem term. The strength of individual stemmer and computational time are also calculated here. This paper also focuses on variation in stem terms generation by these four stemmers from same input data-set. All set of stem terms shown in this paper, are created by execution of our comparative stemming tool implemented in java using standard input data-set.
Analyzing the Stemming Paradigm
This paper discusses affix removal and statistical based Stemming algorithms in detail with stemm... more This paper discusses affix removal and statistical based Stemming algorithms in detail with stemmer-generated output from some Standard English text and dictionary. Comparative empirical studies of all these stemmers are also discussed here with respect to number of stem token generation from single root morphed word variants and computation time. First part of the paper deals with introductory discussion of stemming and lemmatization. Second part of the paper focuses on algorithms of affix and statistical based stemmers with their empirical output. Last part describes the steps of the comparative tool for the same. Finally conclusion section wraps up whole discussion about stemming. This paper can assist researchers working in the field of text mining.
Text Mining is a discovery or technique through which interesting information and hidden knowledg... more Text Mining is a discovery or technique through which interesting information and hidden knowledge is automatically extracted from un-structured or semi-structured text. The critical part of understanding the textual data and giving an appropriate output as per the user requirement needs an initial important task of text pre-processing. Amongst the different pre-processing steps is an important one called Stemming. An improvisation to Stemming is Lemmatizing. This paper proposes a Lemmatization model which attempts to eliminate the shortcomings of the currently available popular Lemmatizers like the Stanford LemmaProcessor, Spacy Lemmatizer, LemmaGen, MorphAdorner etc. This model takes into account the nominalised/derived words for which correct lemmas are currently not generated by any available Lemmatizer. To develop a lemmatizer, the foremost challenge lies in understanding the morphological structure of any input English word and especially comprehending the derivational word’s ...