Paraphrase Detection based on Vector Space Model: A Study of Utilization of Semantic Network for Improving Information (original) (raw)

From Lexical to Semantic Features in Paraphrase Identification

2019

The task of paraphrase identification has been applied to diverse scenarios in Natural Language Processing, such as Machine Translation, summarization, or plagiarism detection. In this paper we present a comparative study on the performance of lexical, syntactic and semantic features in the task of paraphrase identification in the Microsoft Research Paraphrase Corpus. In our experiments, semantic features do not represent a gain in results, and syntactic features lead to the best results, but only if combined with lexical features. 2012 ACM Subject Classification Computing methodologies → Natural language processing; Theory of computation → Support vector machines; Information systems → Near-duplicate and plagiarism detection

IRJET- RESEARCH ON PARAPHRASE IDENTIFICATION

IRJET, 2021

In the natural language system, the identification of paraphrases plays a critical role. As a result of this research, we used an immersive representation to model the interaction between two sentences not just at the word level, but also at the expression and phrase level, by employing a convolutional neural network, recurrent neural and multihead attention neural network to conduct paraphrase detection using semantic characteristics at the same time. The most important factors are semantic equivalence and similarity. Paraphrasing methods find, create, or extract sentences that express nearly the same content. The identification of paraphrases will discern various worded sentences that have the same meaning. Textual statements that use different surface types to communicate the same context are known as paraphrases. Paraphrase identification is important since it helps with a variety of NLP activities, including text summarization, document clustering, query response, inference of natural language, knowledge retrieval, plagiarism detection, and text simplification. The aim of the paper was to compile a list of all the methods, techniques, and current developments for detecting paranormal activity. Not only can detection be used to address a text's identity and protect its context, but it can also be used to provide a metric for analysing a text's computer translations. The existing available requests fail to verify the authenticity of a text if it is paraphrased and fails to mark it as plagiarised. Text mining, text summarization, plagiarism identification, authorship verification, and question answering all include the ability to detect identical sentences written in natural language. The aim is to determine if two sentences are semantically similar. An significant takeaway from this research is that current parasystems function well when put to use. We will use already proven conventional algorithms to identify whether the content is a copy of an existing work, and we will use our application to determine whether the content has been paraphrased in some way.

A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics

Language Resources and Evaluation, 2019

In this paper, we propose a hybrid approach for sentence paraphrase identification. The proposal addresses the problem of evaluating sentence-to-sentence semantic similarity when the sentences contain a set of named-entities. The essence of the proposal is to distinguish the computation of the semantic similarity of named-entity tokens from the rest of the sentence text. More specifically, this is based on the integration of word semantic similarity derived from WordNet taxonomic relations, and named-entity semantic relatedness inferred from Wikipedia entity co-occurrences and underpinned by Normalized Google Distance. In addition, the WordNet similarity measure is enriched with word part-of-speech (PoS) conversion aided with a Categorial Variation database (CatVar), which enhances the lexico-semantics of words. We validated our hybrid approach using two different datasets; Microsoft Research Paraphrase Corpus (MSRPC) and TREC-9 Question Variants. In our empirical evaluation, we showed that our system outperforms baselines and most of the related state-of-the-art systems for paraphrase detection. We also conducted a misidentification analysis to disclose the primary sources of our system errors. Keywords Paraphrase identification Á Named-entity semantic relatedness Á WordNet Á Wikipedia Á Word category subsumption Much of this work was done while a Ph.D. student at the University of Birmingham.

Paraphrase Identification using Semantic Heuristic Features

2012

Paraphrase Identification (PI) problem is to classify that whether or not two sentences are close enough in meaning to be termed as paraphrases. PI is an important research dimension with practical applications in Information Extraction (IE), Machine Translation, Information Retrieval, Automatic Identification of Copyright Infringement, Question Answering Systems and Intelligent Tutoring Systems, to name a few. This study presents a novel approach of paraphrase identification using semantic heuristic features envisaging improving the accuracy compared to state-of-the-art PI systems. Finally, a comprehensive critical analysis of misclassifications is carried out to provide insightful evidence about the proposed approach and the corpora used in the experiments.

UMCC_DLSI-(EPS): Paraphrases Detection Based on Semantic Distance

This paper describes the specifications and results of UMCC_DLSI-(EPS) system, which participated in the first Evaluating Phrasal Semantics of SemEval-2013. Our supervised system uses different kinds of semantic features to train a bagging classifier used to select the correct similarity option. Related to the different features we can highlight the resource WordNet used to extract semantic relations among words and the use of different algorithms to establish semantic similarities. Our system obtains promising results with a precision value around 78% for the English corpus and 71.84% for the Italian corpus.

Paraphrase Identification on the Basis of Supervised Machine Learning Techniques

Lecture Notes in Computer Science, 2006

This paper presents a machine learning approach for paraphrase identification which uses lexical and semantic similarity information. In the experimental studies, we examine the limitations of the designed attributes and the behavior of three machine learning classifiers. With the objective to increase the final performance of the system, we scrutinize the influence of the combination of lexical and semantic information, as well as techniques for classifier combination.

Paraphrase Recognition using Neural Network Classification

International Journal of Computer Applications, 2010

Paraphrasing refers to conveying the same content in several ways. The successful recognition of paraphrases is crucial to various natural language processing tasks such as Information Extraction, Document Summarization, Question Answering etc. Several techniques have been employed for paraphrase recognition using lexical, syntactic and semantic features. Many of these systems have been tested on the MicroSoft Research Paraphrase Corpus. But the performance of these systems has scope for further improvement. Since neural network architectures model the human brain structure which excels at natural language processing tasks, this paper presents a neural network classifier for recognizing paraphrases. A combination of lexical, syntactic and semantic features has been used to train a Back Propagation network. The system can be utilized for detecting similar sentences in applications such as Question Answering and detection of plagiarized content.

Semantic analysis for paraphrase identification using semantic role labeling

2019

Reuse of documents has been prominently appeared during the course of digitalization of information contents owing to the widespread of internet and smartphones in various complex forms such as inserting words, omitting and substituting, changing word order, and etc. Especially, when a word in document is substituted with a similar word, it would be an issue not to consider it as a subject of measurement for the existing morphological similarity measurement method. In order to resolve this kind of problem, various researches have been conducted on the similarity measurement considering semantic information. This study is to propose a measurement method on semantic similarity being characterized as semantic role information in sentences acquired by semantic role labeling. To assess the performance of this proposed method, it was compared with the method of substring similarity being utilized for similarity measurement for existing documents. As a result, we could identify that the proposed method performed similar with the conventional method for the plagiarized documents which were rarely modified whereas it had improved results for paraphrasing sentences which were changed in structure.

Semantic Paraphrasing for Information Retrieval and Extraction

Lecture Notes in Computer Science, 2009

The paper is devoted to the development of a system of synonymous and quasi-synonymous paraphrasing and its practical applications, first of all in the domain of search engine optimization and information extraction. This system is part of the ETAP-3 multifunctional NLP environment created by the Laboratory of Computational Linguistics of the Kharkevich Institute for Information Transmission Problems. Combinatorial dictionaries of Russian, English and some other languages and a rule-driven parser constitute the core of ETAP-3 while a variety of generating modules are used in a number of applications. The paraphrase generator, based on the apparatus of lexical functions, is one such module. We describe the general layout of the paraphrase generator and discuss an experiment that demonstrates its potential as a tool for search optimization.

Paraphrase recognition via dissimilarity significance classification

Proceedings of the 2006 Conference on …, 2006

We propose a supervised, two-phase framework to address the problem of paraphrase recognition (PR). Unlike most PR systems that focus on sentence similarity, our framework detects dissimilarities between sentences and makes its paraphrase judgment based on the significance of such dissimilarities. The ability to differentiate significant dissimilarities not only reveals what makes two sentences a nonparaphrase, but also helps to recall additional paraphrases that contain extra but insignificant information. Experimental results show that while being accurate at discerning non-paraphrasing dissimilarities, our implemented system is able to achieve higher paraphrase recall (93%), at an overall performance comparable to the alternatives.

Paraphrase Detection based on Vector Space Model: A Study of Utilization of Semantic Network for Improving Information (original) (raw)

Related papers