NLP Research Papers - Academia.edu (original) (raw)

This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment... more

This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.

Programarea neuro lingvistica presupune ca limbajul non-verbal că poate fi descifrat prin sistemul reprezentațional VAKOG. Conform teoriei sistemului VAKOG fiecare persoană are un mod dominant pentru reprezentarea internă a realității... more

Programarea neuro lingvistica presupune ca limbajul non-verbal că poate fi descifrat prin sistemul reprezentațional VAKOG. Conform teoriei sistemului VAKOG fiecare persoană are un mod dominant pentru reprezentarea internă a realității (vizual, auditiv, kinestezic, olfactiv și gustativ) și că, dacă folosim cuvinte care să corespundă acesteia, comunicarea noastră cu persoana în cauză se va îmbunătăți.

How can we use NLP for quarrels, misunderstandings, and communication problems?

Punjabi language is most widely spoken language of Pakistan (Abbas, Chohan, Ahmed, & Kaleem, 2016). Punjabi is under developed language because of which, upcoming generations are shifting to other technically and digitally developed... more

Punjabi language is most widely spoken language of Pakistan (Abbas, Chohan, Ahmed, & Kaleem, 2016). Punjabi is under developed language because of which, upcoming generations are shifting to other technically and digitally developed languages such as Urdu and English. In result of which, the sound shift is being observed in Punjabi language. Sounds which used to be present in the past in Punjabi language are found missing now. This leads to a problematic situation that this sound shift may result in language extinction and sound loss. This study is about the sound change and it has been studied in Punjabi language. On the basis of observation of speech in surrounding, researcher made a hypothesis that those speakers of Punjabi language who acquired Punjabi as L1 are able to produce few distinctive sounds that are not produced by the speakers who acquired Urdu as a mother tongue. For this purpose, a corpus of 2 million words was collected and the words including the sounds |n| ‫ن‬ an...

Most coreference resolution models determine if two mentions are coreferent using a single function over a set of constraints or features. This approach can lead to incorrect decisions as lower precision features often overwhelm the... more

Most coreference resolution models determine
if two mentions are coreferent using a single
function over a set of constraints or features.
This approach can lead to incorrect decisions
as lower precision features often overwhelm
the smaller number of high precision ones. To
overcome this problem, we propose a simple
coreference architecture based on a sieve that
applies tiers of deterministic coreference models
one at a time from highest to lowest precision.
Each tier builds on the previous tier’s entity cluster output. Further, our model propagates global information by sharing attributes (e.g., gender and number) across mentions in the same cluster. This cautious sieve guarantees that stronger features are given precedence over weaker ones and that each decision is made using all of the information available
at the time. The framework is highly modular: new coreference modules can be plugged in without any change to the other modules. In spite of its simplicity, our approach outperforms many state-of-the-art supervised and unsupervised models on several
standard corpora. This suggests that sieve based approaches could be applied to other NLP tasks.

In this paper, we introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word... more

In this paper, we introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word pairs (including hypernymy, synonymy, antonymy, meronymy). The dataset is enriched with a large amount of additional information (i.e. relation domain, word frequency, word POS, word semantic field, etc.) that can be used for either filtering the pairs or performing an in-depth analysis of the results. The tuples were extracted from a combination of ConceptNet 5.0 andWordNet 4.0, and subsequently filtered through automatic methods and crowdsourcing in order to ensure their quality. The dataset is freely downloadable. An extension in RDF format, including also scripts for data processing, is under development.

In this digital world, artificial intelligence has provided solutions to many problems, likewise to encounter problems related to digital images and operations related to the extensive set of images. We should learn how to analyze an... more

In this digital world, artificial intelligence has provided solutions to many problems, likewise to encounter problems related to digital images and operations related to the extensive set of images. We should learn how to analyze an image, and for that, we need feature extraction of the content of that image. Image description methods involve natural language processing and concepts of computer vision. The purpose of this work is to provide an efficient and accurate image description of an unknown image by using deep learning methods. We propose a novel generative robust model that trains a Deep Neural Network to learn about image features after extracting information about the content of images, for that we used the novel combination of CNN and LSTM. We trained our model on MSCOCO dataset, which provides set of annotations for a particular image, and after the model is fully automated, we tested it by providing raw images. And also several experiments are performed to check effici...

The present paper describes a three stage technique to parse Hindi sentences. In the first stage we create a model with the features of head words of each chunk and their dependency relations. Here, the dependency relations are... more

The present paper describes a three stage technique to parse Hindi sentences. In the first stage we create a model with the features of head words of each chunk and their dependency relations. Here, the dependency relations are inter-chunk dependency relations. We have experimentally fixed a feature set for learning this model. In the second stage, we extract the intra-chunk dependency relations using a set of rules. The first stage is combined with the second stage to build a two-stage word level Hindi dependency parser. In the third stage, we formulate some rules based on features and used them to post-process the output given by the two-stage parser. 1

Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM~2013~and SemEval-2014~tasks on semantic textual similarity. At the... more

Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM~2013~and SemEval-2014~tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM~2013~task on Semantic Textual Similarity, our best performing system ranked first among the~89~submitted runs. In the SemEval-2014~task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014~task on Cross--Level Semantic Similarity, we ranked first in Sentence--Phrase, Phrase--Word, and Word--Sense subtasks and second in the Paragraph--Sentence subtask.

The paper defines and shows how to use the Minimal Ratio – an exact metric that expresses the ratio between the measured value and the limits of the confidence interval calculated according to the formula Fischer’s exact test is based on.... more

The paper defines and shows how to use the Minimal Ratio – an exact metric that expresses the ratio between the measured value and the limits of the confidence interval calculated according to the formula Fischer’s exact test is based on. The metric is meant to assist with keywords and collocations extraction and comparing texts or corpora according to the word types distribution or other similar criteria.

Low-density languages are also known as lesser-known, poorly-described, less-resourced, minority or less-computerized language because they have fewer resources available. Collection and annotation of a voluminous corpus for the purpose... more

Low-density languages are also known as lesser-known, poorly-described, less-resourced, minority or less-computerized language because they have fewer resources available. Collection and annotation of a voluminous corpus for the purpose of NLP application for these languages prove to be quite challenging. For the development of any NLP application for a low-density language, one needs to have an annotated corpus and a standard scheme for annotation. Because of their non-standard usage in text and other linguistic nuances, they pose significant challenges that are of linguistic and technical in nature. The present paper highlights some of the underlying issues and challenges in developing statistical POS taggers applying SVM and CRF++ for Sambalpuri, a less-resourced Eastern Indo-Aryan language. A corpus of approximately 121 k is collected from the web and converted into Unicode encoding. The whole corpus is annotated under the BIS (Bureau of Indian Standards) annotation scheme devised for Odia under the ILCI (Indian Languages Corpora Initiative) Project. Both the taggers are trained and tested with approximately 80 k and 13 k respectively. The SVM tagger provides 83% accuracy while the CRF++ has 71.56% which is less in comparison to the former.

This paper describes an experimental, strengths-based program for the treatment of substance abusing offenders under criminal justice supervision in the United States Probation Department. The program is based upon new physiological... more

This paper describes an experimental, strengths-based program for the treatment of substance abusing offenders under criminal justice supervision in the United States Probation Department. The program is based upon new physiological evidence that links addictions to the experience of hope, and consistent research that identifies self-efficacy, futurity and self-esteem as crucial elements in recovery. Rooted in concepts taken from Jungian and Maslowian ideas of the Self, the program uses techniques gleaned from Neuro-linguistic Programming (NLP) and Ericksonian hypnosis to provide a continuing sense of Self and the possibility of positive, self-actualizing futures. This article explores the theoretical background of the program, specific tools employed, program results and suggestions for further research.

The paper deals with the concepts of fragmentation and reconstruction in the field of portraiture. Taking a portrait as a large fragment of information, we look into ways in which it can be optimised and reduced such that it remains valid... more

The paper deals with the concepts of fragmentation and reconstruction in the field of portraiture. Taking a portrait as a large fragment of information, we look into ways in which it can be optimised and reduced such that it remains valid but becomes more efficient. The paper commences by exploring the concept of the fragment from various facets, including historically, especially from the modernist point of view, and goes forth to investigate various techniques from practices both adjunct and outside of the field of art in order to inform the portraiture process itself on how information can be collected, optimised and presented to the viewer.

(Available also .doc older version written in Arabic) "Since 1997, the MS Arabic spell checker was integrated by Coltec-Egypt in the MS-Office suite and till now many Arabic users find it worthless. In this study, we show why the... more

(Available also .doc older version written in Arabic)
"Since 1997, the MS Arabic spell checker was integrated by Coltec-Egypt in the MS-Office suite and till now many Arabic users find it worthless.
In this study, we show why the MS-spell checker fails to attract Arabic users. After spell-checking a document (10 pages - 3300 words in Arabic), the assessment procedure spots 78 false positive errors. They reveal the lexical resource flaws: an unsystematic lexical coverage of the feminine and the broken plural of nouns and adjectives, and an arbitrary coverage of verbs and nouns with prefixed or suffixed particles.
This unsystematic and arbitrary lexical coverage of the language resources pinpoints the absence of a clear definition of a lexical entry and an inadequate design of the related agglutination rules. Finally, this assessment reveals in general the failure of scientific and technological policies in big companies and in research institutions regarding Arabic.
"

This paper presents methodologies involved in text normalization and diphone preparation for Bangla Text to Speech (TTS) synthesis. A Concatenation based TTS system comprises basically two modules- one is natural language processing and... more

This paper presents methodologies involved in text normalization and diphone preparation for Bangla Text to Speech (TTS) synthesis. A Concatenation based TTS system comprises basically two modules- one is natural language processing and the other is Digital Signal Processing (DSP). Natural language processing deals with converting text to its pronounceable form, called Text Normalization and the diphone selection method based on the normalized text is called Grapheme to Phoneme (G2P) conversion. Text normalization issues addressed in this paper include tokenization, conjuncts, null modified characters, numerical words, abbreviations and acronyms. Issues related with diphone preparation include diphone categorization, corpus preparation, diphone labeling and diphone selection. Appropriate rules and algorithms are proposed to tackle all the above mentioned issues. We developed a speech synthesizer for Bangla using diphone based concatenative approach which is demonstrated to produce much natural sounding synthetic speech.

Hate speech and fringe ideologies are social phenomena that thrives on-line. Members of the political and religious fringe are able, via the Internet, to propagate their ideas with less effort than in traditional media. In this article we... more

Hate speech and fringe ideologies are social phenomena that thrives on-line. Members of the political and religious fringe are able, via the Internet, to propagate their ideas with less effort than in traditional media. In this article we attempt to use linguistic cues such as parts of speech occurrence in order to distinguish the language of fringe groups from strictly informative sources. The aim of this research is to provide a preliminary model for identifying deceptive materials on-line. Examples of such would include aggressive marketing and hate speech. For the sake of this paper we aim to focus on the political aspect. Our research has shown that information about sentence length and the occurrence of adjectives and adverbs can provide information for the identification of differences between the language of fringe political groups and mainstream media.

The information resulting from the use of the organization's products and services is a valuable resource for business analytics. Therefore, it is necessary to have systems to analyze customer reviews. This article is about categorizing... more

The information resulting from the use of the organization's products and services is a valuable resource for business analytics. Therefore, it is necessary to have systems to analyze customer reviews. This article is about categorizing and predicting customer sentiments. In this article, a new framework for categorizing and predicting customer sentiments was proposed. The customer reviews were collected from an international hotel. In the next step, the customer reviews processed, and then entered into various machine learning algorithms. The algorithms used in this paper were support vector machine (SVM), artificial neural network (ANN), naive bayes (NB), decision tree (DT), C4.5 and k-nearest neighbor (K-NN). Among these algorithms, the DT provided better results. In addition, the most important factors influencing the great customer experience were extracted with the help of the DT. Finally, very interesting results were observed in terms of the effect of the number of features on the performance of machine learning algorithms.

The article introduces discourse analysis as a fruitful approach to psychotherapy change-process research. Extracts are presented from a successfully resolved, client-specified, problematic theme that was selected from a successful... more

The article introduces discourse analysis as a fruitful approach to psychotherapy change-process research. Extracts are presented from a successfully resolved, client-specified, problematic theme that was selected from a successful 8-session psychodynamic-interpersonal therapy of a female client presenting with a major depressive episode. The study offers a heuristic demonstrating (a) how resolution of the client's problem evolved from implications raised by the client's own description of her predicament and involved the therapist's legitimation of a morally defensible account of the client's actions and (b) how cultural meanings can be brought into the consulting room as intimately personal problems. The implications of studying therapeutic change as a discursive activity comprising the use and negotiation of sociocultural meanings are discussed.

In chapter 10, Linday Finlay writes with Anna Madill , to describe and explain some different ways to analyse data. Four contrasting types of analysis are outlined with practical exemplars: narrative, thematic, discursive and creative.... more

In chapter 10, Linday Finlay writes with Anna Madill , to describe and explain some different ways to analyse data. Four contrasting types of analysis are outlined with practical exemplars: narrative, thematic, discursive and creative. Each type of analysis highlights different aspects and so enables different insights. As such, qualitative analysis is always tentative, partial and emergent....

The theory of Neuro Linguistic Programming focuses on interpersonal relationships and designs strategies focused on the evolution of the human mind, the construction of an effective message, the perception of unconscious physiological... more

I was pleased to be invited to provide this article because it gives me the opportunity to share with my NLP colleagues how useful transactional analysis can be when the two approaches are combined. I often think of it as the Enneagram... more

I was pleased to be invited to provide this article because it gives me the opportunity to share with my NLP colleagues how useful transactional analysis can be when the two approaches are combined. I often think of it as the Enneagram model of nine personality types lets us know the nature of our real essence, transactional analysis helps us understand how we lost touch with that, and NLP helps us make changes to get back to our real selves. All three approaches are of course comprehensive in their own right so all I can hope to do here is share a few ideas about the links I see between some of the TA and NLP concepts.

In Your Hands: NLP in ELT by Jane Revell and Susan Norman is an attractive book which presents the core concepts and principles of Neuro-linguistic Programming (NLP). For those who may not already know, NLP is an approach that is... more

In Your Hands: NLP in ELT by Jane Revell and Susan Norman is an attractive book which presents the core concepts and principles of Neuro-linguistic Programming (NLP). For those who may not already know, NLP is an approach that is dedicated to modeling exceptional performance in various fields. In other words, it is the art and science of human excellence. The authors begin with an overview of NLP and suggestions for getting the most out of NLP, the book and learning/teaching success.
Handing Over: NLP-based activities for language learning by Jane Revell and Susan Norman is the practically oriented follow-up to their first book In Your Hands: NLP in ELT. Their latest book will help anyone to become more aware of their own and others’ greater potential for learning and specifically answers the question: “yes, but ... what do I actually do with NLP in the classroom.” (p. 3)

Einleitung 1 Zur Entstehungsgeschichte des NLP 2 Eine erste persönliche Erfahrung mit NLP 3 Eine erste inhaltliche Begegnung 4 Erster Ausbildungsversuch... more

Einleitung 1
Zur Entstehungsgeschichte des NLP 2
Eine erste persönliche Erfahrung mit NLP 3
Eine erste inhaltliche Begegnung 4
Erster Ausbildungsversuch 5
Ein zweiter Versuch 5
NLP – interne Kritik 6
Vermarktungsaspekte 7
NLP und Wissenschaft 8
Resumee und Ausblick 9
Literaturhinweise und einige andere Informationen 10

ED377009 - Ericksonian Approach to Experiential Education, Part 1: Developing the Stance of the Practitioner; Part 2: Tailoring Interventions; Part 3: Applying Specific Ericksonian Techniques. ... Ericksonian Approach to Experiential... more

ED377009 - Ericksonian Approach to Experiential Education, Part 1: Developing the Stance of the Practitioner; Part 2: Tailoring Interventions; Part 3: Applying Specific Ericksonian Techniques. ... Ericksonian Approach to Experiential Education, Part 1: Developing the ...

Text can be analysed by splitting the text and extracting the keywords .These may be represented as summaries, tabular representation, graphical forms, and images. In order to provide a solution to large amount of information present in... more

Text can be analysed by splitting the text and extracting the keywords .These may be represented as summaries, tabular representation, graphical forms, and images. In order to provide a solution to large amount of information present in textual format led to a research of extracting the text and transforming the unstructured form to a structured format. The paper presents the importance of Natural Language Processing (NLP) and its two interesting applications in Python Language: 1. Automatic text summarization [Domain: Newspaper Articles] 2. Text to Graph Conversion [Domain: Stock news]. The main challenge in NLP is natural language understanding i.e. deriving meaning from human or natural language input which is done using regular expressions, artificial intelligence and database concepts. Automatic Summarization tool converts the newspaper articles into summary on the basis of frequency of words in the text. Text to Graph Converter takes in the input as stock article, tokenize them on various index (points and percent) and time and then tokens are mapped to graph. This paper proposes a business solution for users for effective time management.

NLP ve sair self help tekniklerinin doğası üzerine bir itiraz

Sana'ani Dialect to Modern Standard Arabic: Rule-based Direct Machine Translation ICAI'11 Vol I - ISBN #: 1-60132-183-X ICAI'11 Vol II - ISBN #: 1-60132-184-8 ICAI'11 Set - ISBN #: 1-60132-185-6 Authors: Yahya Alamlahi,... more