Sheena Kurian K | KMEA Engineering College (original) (raw)

Papers by Sheena Kurian K

Research paper thumbnail of Novel Hybrid Methods for Journal Article Summarization Combining Graph Method and Rough Set TFIDF Method with Pegasus Model

Research paper thumbnail of Data Processing for Big Data Applications using Hadoop Framework

nternational journal of advanced research in computer and communication engineering, Mar 30, 2015

The big data is the concept of largespectrum of data, which is being created day by day. In recen... more The big data is the concept of largespectrum of data, which is being created day by day. In recent years handling these datais the biggest challenge. Hadoop is an open source platform which is used effectively to handle the big data applications. The two core concepts of the hadoop are Mapreduce and Hadoop distributed file system (HDFS). HDFS is the storage mechanism and map reduce is the programming language. Results are produced faster than other traditional database operations. Pig and Hive are the two language which helps us to program the mapreduce framework within short period of time.

Research paper thumbnail of Statistical Machine Translation from English to Malayalam

In this paper we present an overview of a system that translates English into Malayalam by means ... more In this paper we present an overview of a system that translates English into Malayalam by means of Statistical Machine Translation (SMT) models. The knowledge source that is used to build the translation system includes a monolingual corpus of Malayalam and a bilingual corpus of English/ Malayalam. Various pre-processing mechanisms like suffix separation of words in the Malayalam sentence and stop word elimination from the Malayalam corpus has proven to be effective in bringing about better training results. In the translation process, a set of syntactic tags are coupled with the words in the English sentence to signify the parts of speech factor. The order conversion rules, which is applied to the tagged English sentence, aids in bridging the disparity that exist between the sentence structure of the English and Malayalam language. Post editing technique such as formulating mending rules for Malayalam enhances the quality of the statistical outcome of the SMT system. By this approach of imparting morphological knowledge into SMT and by forming hand crafted rules to achieve precise output, we were able to produce reasonably good Malayalam sentences even with a limited amount of bilingual training data.

Research paper thumbnail of A Framework of Statistical Machine Translator from English to Malayalam

In this paper we describe the methodology and the structural design of a system that translates E... more In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.

Research paper thumbnail of Techniques to Improve the word alignments in Statistical Machine Translation from English to Malayalam

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is trans... more In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.

Research paper thumbnail of Freight audit using mapreduce framework for big-data application

International journal of latest trends in engineering and technology, Mar 1, 2015

Freight audit is an emerging area of concern. The freight audit verification can be vulnerable to... more Freight audit is an emerging area of concern. The freight audit verification can be vulnerable to process and human errors. The freight forwarders/ freight carriers often make mistake in billing the freight services. Due to this if organizations does not do the proper auditing they will have to overpay for the services they have not incurred. Freight audit is the process of examining, verifying and adjusting the audit reports for better accuracy. For the implementation of the freight audit system, Mapreduce framework is applied. It consists of two functions, map function and reduce function. The map function will group the input data based on the order and status details which is collected. The output of the mapped records are given to the reducer. Using the join function we combine the output of the mapped records and the result is produced.

Research paper thumbnail of Wormhole Detection and Prevention in MANETs

Mobile nodes are responsible for route establishment in MANET using wireless link. MANET is an op... more Mobile nodes are responsible for route establishment in MANET using wireless link. MANET is an open entrusted environment so it encounters a number of security threats with little security arrangement. Wormhole is considered to be a very serious security threat among others in MANET. In wormhole, a tunnel is made between two selfish nodes which are geographically very far away to each other, in order to hide their actual location and try to believe that they are true neighbours and makes conversation through the wormhole tunnel. Researchers are going on to detect and prevent Wormhole attack in efficient manner. There are different techniques to detect and prevent Wormhole attack in MANETs, but some of them cause routing overhead and delays. A model that encapsulate neighbor node and hop count method is considered in this paper for the Wormhole detection and prevention.

Research paper thumbnail of DOMS: Disease Outbreak Monitoring System

Disease Outbreak Monitoring System (DOMS) is a real-time epidemic outbreak monitoring system. It ... more Disease Outbreak Monitoring System (DOMS) is a real-time epidemic outbreak monitoring system. It can track various disease outbreaks by collecting health status of people. An android application is used for collecting the data. A location-based representation is also provided about the outbreaks. According to the data collected from all people from different locations, it can create a relation with the dataset in the server. The dataset contains previously observed values. If a location is affected by an outbreak of a disease, the proposed application will enquire more about the detailed health status of the people in that location. If the symptoms do not match with any disease in the database, then it will be a new or a not popular disease which need to be seriously considered. If the disease is already covered in the dataset then required actions are taken according to the severity of the disease. If a person is planning a trip to a particular location, the application will provid...

Research paper thumbnail of Alignment Model and Training Technique in SMT from English to Malayalam

Communications in Computer and Information Science, 2010

This paper investigates certain methods of training adopted in the Statistical Machine Translator... more This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy.

Research paper thumbnail of Freight audit using mapreduce framework for big-data application

Freight audit is an emerging area of concern. The freight audit verification can be vulnerable to... more Freight audit is an emerging area of concern. The freight audit verification can be vulnerable to process and human errors. The freight forwarders/ freight carriers often make mistake in billing the freight services. Due to this if organizations does not do the proper auditing they will have to overpay for the services they have not incurred. Freight audit is the process of examining, verifying and adjusting the audit reports for better accuracy. For the implementation of the freight audit system, Mapreduce framework is applied. It consists of two functions, map function and reduce function. The map function will group the input data based on the order and status details which is collected. The output of the mapped records are given to the reducer. Using the join function we combine the output of the mapped records and the result is produced.

Research paper thumbnail of A Framework of Statistical Machine Translator from English to Malayalam

Research paper thumbnail of Statistical Machine Translation from English to Malayalam

In this paper we present an overview of a system that translates English into Malayalam by means ... more In this paper we present an overview of a system that translates English into Malayalam by means of Statistical Machine Translation (SMT) models. The knowledge source that is used to build the translation system includes a monolingual corpus of Malayalam and a bilingual corpus of English/ Malayalam. Various pre-processing mechanisms like suffix separation of words in the Malayalam sentence and stop word elimination from the Malayalam corpus has proven to be effective in bringing about better training results. In the translation process, a set of syntactic tags are coupled with the words in the English sentence to signify the parts of speech factor. The order conversion rules, which is applied to the tagged English sentence, aids in bridging the disparity that exist between the sentence structure of the English and Malayalam language. Post editing technique such as formulating mending rules for Malayalam enhances the quality of the statistical outcome of the SMT system. By this appro...

Research paper thumbnail of Techniques to Improve the word alignments in Statistical Machine Translation from English to Malayalam

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is trans... more In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results...

Research paper thumbnail of Enhancing Security With Fingerprint Combination Using RSA Algorithm

Fingerprint recognition is an active research area nowadays. In many areas we are using fingerpri... more Fingerprint recognition is an active research area nowadays. In many areas we are using fingerprint recognition to improve the security and privacy. In fingerprint recognition system the recognition can be done by fingerprint matching techniques. Fingerprint matching techniques are classified in two categories namely:fingerprint verification and fingerprint identification. In this system we are using the fingerprint verification. For this we propose here a novel system for protecting fingerprint privacy by combining two different fingerprints into a new identity. In the enrollment, two fingerprints are captured from two different fingers. We extract the minutiae positions from one fingerprint, the orientation from the other fingerprint, and the reference points from both fingerprints. Based on this extracted information combined minutiae template is generated and stored. The combined minutiae template is used to generate key using RSA algorithm. In the authentication, the system req...

Research paper thumbnail of Local K-Nearest Neighbors Model using Z-Order R-Tree for Big Data

K-nearest neighbors classification and regression is widely used in data mining due to its simpli... more K-nearest neighbors classification and regression is widely used in data mining due to its simplicity and accuracy. When a prediction is required for an unseen data instance, the KNN algorithm will search through the training dataset for the k most similar instances. Finding the value k is application dependent, hence a local value is set which maximizes the accuracy of the problem. Classifying the object to the majority class of its k neighbors is called K-nearest neighbors classification. In this paper the instance or object to be classified is called the problem object or pobject in short. KNN search calculates the pair wise distance between the p-object and each data using distance metric to find the k neighbors. Global KNN approach uses the whole data for searching the k-nearest neighbors of the pobject. For big data local KNN approach is used where sample objects are randomly selected from the training data space. In order to improve the accuracy of finding the exact k-neighbo...

Research paper thumbnail of Extrinsic Plagiarism Detection in Text Combining Vector Space Model and Fuzzy Semantic Similarity Scheme

The proposed work combines Vector Space Model with Fuzzy similarity measure to detect plagiarism ... more The proposed work combines Vector Space Model with Fuzzy similarity measure to detect plagiarism cases in documents. For a given suspicious document the aim is to identify the set of source documents from which the suspicious document is copied. In the first step, all the documents need to be processed to perform tokenization, stop word removal, stemming, etc. In the next step, a subset of documents that may possibly be the sources of plagiarism need to be selected. Vector Space Model (VSM) can be used for this candidate selection. Similarity between a suspicious document and a source document can be computed using cosine similarity measure between the document vectors weighted by tf-idf scoring. Thirdly, a sentence-wise in-depth analysis using fuzzy semantic based approach to find the plagiarized parts in the suspicious documents. This can detect similar, yet not necessarily the same, statements based on the similarity degree between words in the statements and the fuzzy set. Adjac...

Research paper thumbnail of A framework for translating English text into Malayalam using statistical models

Procedia Technology 00 (2011) 000–000,2nd International Conference on Communication, Computing & ... more Procedia Technology 00 (2011) 000–000,2nd International Conference on Communication, Computing & Security

Research paper thumbnail of Alignment Model and Training Technique in SMT from English

Abstract. This paper investigates certain methods of training adopted in the Statistical Machine... more Abstract. This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy.

Research paper thumbnail of A Classification of Sandhi Rules for Suffix Separation in Malayalam

Suffix separation plays a vital role in improving the quality of training in the Statistical Mach... more Suffix separation plays a vital role in improving the quality of training in the Statistical Machine Translation from English into Malayalam. The morphological richness and the agglutinative nature of Malayalam make it necessary to retrieve the root word from its inflected form in the training process. The suffix separation process accomplishes this task by scrutinizing the Malayalam words and by applying sandhi rules. In this paper, various handcrafted rules designed for the suffix separation process in the English Malayalam SMT are presented. A classification of these rules is done based on the Malayalam syllable preceding the suffix in the inflected form of the word (check_letter). The suffixes beginning with the vowel sounds like ആല, ഉെെ, ഇല etc are mainly considered in this process. By examining the check_letter in a word, the suffix separation rules can be directly applied to extract the root words. The quick look up table provided in this paper can be used as a guideline in i...

Research paper thumbnail of IJARCCE - Computer and Communication Engineering

IJARCCE, 2014

In todays era, information security is an important aspect in which data is secure from unauthent... more In todays era, information security is an important aspect in which data is secure from unauthenticated user. Unsuspected victim attacks the information for economic gain, individual gain and for other illegal activities. Phishing is one of them in which unauthenticated person tries to thieve personal confidential information. To avoid these illegal activities we have projected a new paper "An Anti-Phishing Framework Using Visual Cryptography". In this, image is generated which after exploit, decomposed into two shares. One share is kept with user and other with server. And when it requires that is at the time of login at particular site two shares are combined together to form original image. The image form by combining two shares will state that current site is not a Phishing site and also identify that user is authenticated one. So data can be secured from unsuspected person.

Research paper thumbnail of Novel Hybrid Methods for Journal Article Summarization Combining Graph Method and Rough Set TFIDF Method with Pegasus Model

Research paper thumbnail of Data Processing for Big Data Applications using Hadoop Framework

nternational journal of advanced research in computer and communication engineering, Mar 30, 2015

The big data is the concept of largespectrum of data, which is being created day by day. In recen... more The big data is the concept of largespectrum of data, which is being created day by day. In recent years handling these datais the biggest challenge. Hadoop is an open source platform which is used effectively to handle the big data applications. The two core concepts of the hadoop are Mapreduce and Hadoop distributed file system (HDFS). HDFS is the storage mechanism and map reduce is the programming language. Results are produced faster than other traditional database operations. Pig and Hive are the two language which helps us to program the mapreduce framework within short period of time.

Research paper thumbnail of Statistical Machine Translation from English to Malayalam

In this paper we present an overview of a system that translates English into Malayalam by means ... more In this paper we present an overview of a system that translates English into Malayalam by means of Statistical Machine Translation (SMT) models. The knowledge source that is used to build the translation system includes a monolingual corpus of Malayalam and a bilingual corpus of English/ Malayalam. Various pre-processing mechanisms like suffix separation of words in the Malayalam sentence and stop word elimination from the Malayalam corpus has proven to be effective in bringing about better training results. In the translation process, a set of syntactic tags are coupled with the words in the English sentence to signify the parts of speech factor. The order conversion rules, which is applied to the tagged English sentence, aids in bridging the disparity that exist between the sentence structure of the English and Malayalam language. Post editing technique such as formulating mending rules for Malayalam enhances the quality of the statistical outcome of the SMT system. By this approach of imparting morphological knowledge into SMT and by forming hand crafted rules to achieve precise output, we were able to produce reasonably good Malayalam sentences even with a limited amount of bilingual training data.

Research paper thumbnail of A Framework of Statistical Machine Translator from English to Malayalam

In this paper we describe the methodology and the structural design of a system that translates E... more In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.

Research paper thumbnail of Techniques to Improve the word alignments in Statistical Machine Translation from English to Malayalam

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is trans... more In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.

Research paper thumbnail of Freight audit using mapreduce framework for big-data application

International journal of latest trends in engineering and technology, Mar 1, 2015

Freight audit is an emerging area of concern. The freight audit verification can be vulnerable to... more Freight audit is an emerging area of concern. The freight audit verification can be vulnerable to process and human errors. The freight forwarders/ freight carriers often make mistake in billing the freight services. Due to this if organizations does not do the proper auditing they will have to overpay for the services they have not incurred. Freight audit is the process of examining, verifying and adjusting the audit reports for better accuracy. For the implementation of the freight audit system, Mapreduce framework is applied. It consists of two functions, map function and reduce function. The map function will group the input data based on the order and status details which is collected. The output of the mapped records are given to the reducer. Using the join function we combine the output of the mapped records and the result is produced.

Research paper thumbnail of Wormhole Detection and Prevention in MANETs

Mobile nodes are responsible for route establishment in MANET using wireless link. MANET is an op... more Mobile nodes are responsible for route establishment in MANET using wireless link. MANET is an open entrusted environment so it encounters a number of security threats with little security arrangement. Wormhole is considered to be a very serious security threat among others in MANET. In wormhole, a tunnel is made between two selfish nodes which are geographically very far away to each other, in order to hide their actual location and try to believe that they are true neighbours and makes conversation through the wormhole tunnel. Researchers are going on to detect and prevent Wormhole attack in efficient manner. There are different techniques to detect and prevent Wormhole attack in MANETs, but some of them cause routing overhead and delays. A model that encapsulate neighbor node and hop count method is considered in this paper for the Wormhole detection and prevention.

Research paper thumbnail of DOMS: Disease Outbreak Monitoring System

Disease Outbreak Monitoring System (DOMS) is a real-time epidemic outbreak monitoring system. It ... more Disease Outbreak Monitoring System (DOMS) is a real-time epidemic outbreak monitoring system. It can track various disease outbreaks by collecting health status of people. An android application is used for collecting the data. A location-based representation is also provided about the outbreaks. According to the data collected from all people from different locations, it can create a relation with the dataset in the server. The dataset contains previously observed values. If a location is affected by an outbreak of a disease, the proposed application will enquire more about the detailed health status of the people in that location. If the symptoms do not match with any disease in the database, then it will be a new or a not popular disease which need to be seriously considered. If the disease is already covered in the dataset then required actions are taken according to the severity of the disease. If a person is planning a trip to a particular location, the application will provid...

Research paper thumbnail of Alignment Model and Training Technique in SMT from English to Malayalam

Communications in Computer and Information Science, 2010

This paper investigates certain methods of training adopted in the Statistical Machine Translator... more This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy.

Research paper thumbnail of Freight audit using mapreduce framework for big-data application

Freight audit is an emerging area of concern. The freight audit verification can be vulnerable to... more Freight audit is an emerging area of concern. The freight audit verification can be vulnerable to process and human errors. The freight forwarders/ freight carriers often make mistake in billing the freight services. Due to this if organizations does not do the proper auditing they will have to overpay for the services they have not incurred. Freight audit is the process of examining, verifying and adjusting the audit reports for better accuracy. For the implementation of the freight audit system, Mapreduce framework is applied. It consists of two functions, map function and reduce function. The map function will group the input data based on the order and status details which is collected. The output of the mapped records are given to the reducer. Using the join function we combine the output of the mapped records and the result is produced.

Research paper thumbnail of A Framework of Statistical Machine Translator from English to Malayalam

Research paper thumbnail of Statistical Machine Translation from English to Malayalam

In this paper we present an overview of a system that translates English into Malayalam by means ... more In this paper we present an overview of a system that translates English into Malayalam by means of Statistical Machine Translation (SMT) models. The knowledge source that is used to build the translation system includes a monolingual corpus of Malayalam and a bilingual corpus of English/ Malayalam. Various pre-processing mechanisms like suffix separation of words in the Malayalam sentence and stop word elimination from the Malayalam corpus has proven to be effective in bringing about better training results. In the translation process, a set of syntactic tags are coupled with the words in the English sentence to signify the parts of speech factor. The order conversion rules, which is applied to the tagged English sentence, aids in bridging the disparity that exist between the sentence structure of the English and Malayalam language. Post editing technique such as formulating mending rules for Malayalam enhances the quality of the statistical outcome of the SMT system. By this appro...

Research paper thumbnail of Techniques to Improve the word alignments in Statistical Machine Translation from English to Malayalam

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is trans... more In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results...

Research paper thumbnail of Enhancing Security With Fingerprint Combination Using RSA Algorithm

Fingerprint recognition is an active research area nowadays. In many areas we are using fingerpri... more Fingerprint recognition is an active research area nowadays. In many areas we are using fingerprint recognition to improve the security and privacy. In fingerprint recognition system the recognition can be done by fingerprint matching techniques. Fingerprint matching techniques are classified in two categories namely:fingerprint verification and fingerprint identification. In this system we are using the fingerprint verification. For this we propose here a novel system for protecting fingerprint privacy by combining two different fingerprints into a new identity. In the enrollment, two fingerprints are captured from two different fingers. We extract the minutiae positions from one fingerprint, the orientation from the other fingerprint, and the reference points from both fingerprints. Based on this extracted information combined minutiae template is generated and stored. The combined minutiae template is used to generate key using RSA algorithm. In the authentication, the system req...

Research paper thumbnail of Local K-Nearest Neighbors Model using Z-Order R-Tree for Big Data

K-nearest neighbors classification and regression is widely used in data mining due to its simpli... more K-nearest neighbors classification and regression is widely used in data mining due to its simplicity and accuracy. When a prediction is required for an unseen data instance, the KNN algorithm will search through the training dataset for the k most similar instances. Finding the value k is application dependent, hence a local value is set which maximizes the accuracy of the problem. Classifying the object to the majority class of its k neighbors is called K-nearest neighbors classification. In this paper the instance or object to be classified is called the problem object or pobject in short. KNN search calculates the pair wise distance between the p-object and each data using distance metric to find the k neighbors. Global KNN approach uses the whole data for searching the k-nearest neighbors of the pobject. For big data local KNN approach is used where sample objects are randomly selected from the training data space. In order to improve the accuracy of finding the exact k-neighbo...

Research paper thumbnail of Extrinsic Plagiarism Detection in Text Combining Vector Space Model and Fuzzy Semantic Similarity Scheme

The proposed work combines Vector Space Model with Fuzzy similarity measure to detect plagiarism ... more The proposed work combines Vector Space Model with Fuzzy similarity measure to detect plagiarism cases in documents. For a given suspicious document the aim is to identify the set of source documents from which the suspicious document is copied. In the first step, all the documents need to be processed to perform tokenization, stop word removal, stemming, etc. In the next step, a subset of documents that may possibly be the sources of plagiarism need to be selected. Vector Space Model (VSM) can be used for this candidate selection. Similarity between a suspicious document and a source document can be computed using cosine similarity measure between the document vectors weighted by tf-idf scoring. Thirdly, a sentence-wise in-depth analysis using fuzzy semantic based approach to find the plagiarized parts in the suspicious documents. This can detect similar, yet not necessarily the same, statements based on the similarity degree between words in the statements and the fuzzy set. Adjac...

Research paper thumbnail of A framework for translating English text into Malayalam using statistical models

Procedia Technology 00 (2011) 000–000,2nd International Conference on Communication, Computing & ... more Procedia Technology 00 (2011) 000–000,2nd International Conference on Communication, Computing & Security

Research paper thumbnail of Alignment Model and Training Technique in SMT from English

Abstract. This paper investigates certain methods of training adopted in the Statistical Machine... more Abstract. This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy.

Research paper thumbnail of A Classification of Sandhi Rules for Suffix Separation in Malayalam

Suffix separation plays a vital role in improving the quality of training in the Statistical Mach... more Suffix separation plays a vital role in improving the quality of training in the Statistical Machine Translation from English into Malayalam. The morphological richness and the agglutinative nature of Malayalam make it necessary to retrieve the root word from its inflected form in the training process. The suffix separation process accomplishes this task by scrutinizing the Malayalam words and by applying sandhi rules. In this paper, various handcrafted rules designed for the suffix separation process in the English Malayalam SMT are presented. A classification of these rules is done based on the Malayalam syllable preceding the suffix in the inflected form of the word (check_letter). The suffixes beginning with the vowel sounds like ആല, ഉെെ, ഇല etc are mainly considered in this process. By examining the check_letter in a word, the suffix separation rules can be directly applied to extract the root words. The quick look up table provided in this paper can be used as a guideline in i...

Research paper thumbnail of IJARCCE - Computer and Communication Engineering

IJARCCE, 2014

In todays era, information security is an important aspect in which data is secure from unauthent... more In todays era, information security is an important aspect in which data is secure from unauthenticated user. Unsuspected victim attacks the information for economic gain, individual gain and for other illegal activities. Phishing is one of them in which unauthenticated person tries to thieve personal confidential information. To avoid these illegal activities we have projected a new paper "An Anti-Phishing Framework Using Visual Cryptography". In this, image is generated which after exploit, decomposed into two shares. One share is kept with user and other with server. And when it requires that is at the time of login at particular site two shares are combined together to form original image. The image form by combining two shares will state that current site is not a Phishing site and also identify that user is authenticated one. So data can be secured from unsuspected person.