Parsing Research Papers - Academia.edu (original) (raw)

Parsing methods attack a wide range of linguistic problems, from understanding mechanisms universal to all language to describing specific constructions in particular sublanguages. Here we present a low-level, robust parser that is... more

Parsing methods attack a wide range of linguistic problems, from
understanding mechanisms universal to all language to describing
specific constructions in particular sublanguages. Here we present
a low-level, robust parser that is typically used over a large
quantity of text to extract simple, common syntactic
structures. This parser is applied here to a small sample of
technical documentation from three different sources. A minimal
set of binary dependency relations is defined for evaluating
the parser output, and output from the different versions
of the parser implemented is compared to a manually parsed
portion of the test collection. In summary, we describe
the limitations and strengths of this simple method.

We present a compiler which can be used to automatically obtain efficient Java implementations of parsing algorithms from formal specifications expressed as parsing schemata. The system performs an analysis of the inference rules in the... more

We present a compiler which can be used to automatically obtain efficient Java implementations of parsing algorithms from formal specifications expressed as parsing schemata. The system performs an analysis of the inference rules in the input schemata in order to determine the best data structures and indexes to use, and ensure that the generated implementations are efficient. The system described is general enough to be able to handle all kinds of schemata for different grammar formalisms, such as context-free grammars and tree-adjoining grammars, and it provides an extensibilitymechanismallowing the user to define custom notational elements. This compiler has proven very useful for analyzing, prototyping and comparing natural language parsers in real domains, as can be seen in the empirical examples provided at the end of the article.

Форум «Оценка методов АОТ» (http://ru-eval.ru) — новая инициатива, целью которой является независимая оценка методов и алгоритмов работы русскоязычных лингвистических ресурсов. В статье описываются принципы и процедура проведения дорожек... more

Форум «Оценка методов АОТ» (http://ru-eval.ru) — новая инициатива, целью которой является независимая оценка методов и алгоритмов работы русскоязычных лингвистических ресурсов. В статье описываются принципы и процедура проведения дорожек форума, состав участников, тестовая коллекция, организация экспертизы и полученные результаты.

Parsing of Arabic phrases is a crucial requirement for many applications such as question answering and machine translation. The calculus of pregroup introduced by Lambek as an algebraic computational machinery for the grammatical... more

Parsing of Arabic phrases is a crucial requirement for many applications such as question answering and machine translation. The calculus of pregroup introduced by Lambek as an algebraic computational machinery for the grammatical analysis of natural languages. Pregroup grammar used to analyse sentence structure in many European languages such as English, and non-European languages such as Japanese. In Arabic language, Lambek employed the notions of pregroup to analyse some grammatical structures such as conjugating the verb, tense modifiers and equational sentences. This work attempts to develop an initial phase of an efficient automatic pregroup grammar parser by using linear approach to analysethe verbal phrases of Modern Standard Arabic (MSA). The proposed system starts building Arabic lexicon contains all possible categories of Arabic verbs, then analysing the input verbal Arabic phrase to check if it is well-formed by using linear parsing algorithm based on pregroup grammar rules.

Student obligations imply writing a large number of homework assignments and term papers. Usually, they are submitted in electronic form. Checking papers for plagiarism isn’t an easy task. Quantity prevents teachers and professors to... more

Student obligations imply writing a large number of homework assignments and term papers. Usually, they are submitted in electronic form. Checking papers for plagiarism isn’t an easy task. Quantity prevents teachers and professors to check all of them by hand. Therefore, there is a need for a system that will perform this task automatically. This paper describes principles behind one such system. Text contained in papers written by students, but also ones that can be found on internet, is converted in n-gram models, which are kept, and used later for comparison with newly generated ones. Potential application of this system is at the Faculty of Electronic Engineering in Niš, Department of Computer Science, where it can be used to check student papers, written in Serbian language.

Maalesef Log Yönetimi ve SIEM aynı şeymiş gibi algılanmaktadır. Hatta SIEM i log yönetiminin bir alt kümesi veya biraz özelleşmiş hali olarak görenler de çoktur. Bu iki görüş de hatalıdır. Hatta maalesef bu iki görüş sahipleri bu... more

Maalesef Log Yönetimi ve SIEM aynı şeymiş gibi algılanmaktadır. Hatta SIEM i log yönetiminin bir alt kümesi veya biraz özelleşmiş hali olarak görenler de çoktur. Bu iki görüş de hatalıdır. Hatta maalesef bu iki görüş sahipleri bu görüşleriyle kurumsal güvenlik politikalarına negatif etki yaparlar ve güvenlik politikalarında indirgeyici bir yönetim sergilemiş olurlar

In this research, we would like to build an initial model for semantic parsing of simple Vietnamese sentences. With a semantic parsing model like that, we can analyse simple Vietnamese sentences to determine their semantic structures that... more

In this research, we would like to build an initial model for semantic parsing of simple Vietnamese sentences. With a semantic parsing model like that, we can analyse simple Vietnamese sentences to determine their semantic structures that are represented in a form that was defined by our point of view. So, we try to solve two tasks: first, building an our taxonomy of Vietnamese nouns, then we use it to define the feature structures of nouns and verbs; second, to build a Unification-Based Vietnamese Grammar we define the syntactic and semantic unification rules for the Vietnamese phrases, clauses and sentences based on the Unification-Based Grammar. This Vietnamese grammar has been used to build a semantic parser for single Vietnamese sentences. This semantic parser has been experienced and the experiment results get precision and recall all over 84%.

We report on a fuzzy logic-based language understanding system applied to speech recognition. This system acquires conceptual knowledge from corpus data and organizes such knowledge into fuzzy logic inference rules. The system parses... more

We report on a fuzzy logic-based language understanding system applied to speech recognition. This system acquires conceptual knowledge from corpus data and organizes such knowledge into fuzzy logic inference rules. The system parses speech recognition results into conceptual structures in a robust manner, and thus is able to tolerate noise caused by speech recognition errors. We discuss the fuzzy inference rule learning method and explain its organization. Experimental results that demonstrate the ability of the system to deal with complex speech input instances are reported

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.

This paper presents a full procedure for the development of a segmented, POS-tagged and chunk-parsed corpus of Old Tibetan. As an extremely low-resource language, Old Tibetan poses non-trivial problems in every step towards the... more

This paper presents a full procedure for the development of a segmented, POS-tagged and chunk-parsed corpus of Old Tibetan. As an extremely low-resource language, Old Tibetan poses non-trivial problems in every step towards the development of a search-able treebank. We demonstrate, however , that a carefully developed, semi-supervised method of optimising and extending existing tools for Classical Tibetan, as well as creating specific ones for Old Tibetan, can address these issues. We thus also present the very first Tibetan Treebank in a variety of formats to facilitate research in the fields of NLP, historical linguistics and Tibetan Studies.

3 rd International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural... more

3
rd International Conference on Natural Language Processing and Computational Linguistics
(NLPCL 2022) will provide an excellent international forum for sharing knowledge and results in
theory, methodology and applications of Natural Language Computing and Computational
Linguistics. The Conference looks for significant contributions to all major fields of the Natural
Language processing and Computational Linguistics in theoretical and practical aspects.
Authors are solicited to contribute to the Conference by submitting articles that illustrate
research results, projects, surveying works and industrial experiences that describe significant
advances in the areas of Computer Science and Information Technology.

The correct placement of punctuation characters is in many languages, including Czech, driven by complex guidelines. Although those guidelines use information of morphology, syntax and semantics, state-of-art systems for punctuation... more

The correct placement of punctuation characters is in many languages, including Czech, driven by complex guidelines. Although those guidelines use information of morphology, syntax and semantics, state-of-art systems for punctuation detection and correction are limited to simple rule-based backbones. In this paper we present a syntax-based approach by utilizing the Czech parser synt. This parser uses an adapted chart parsing technique for building the chart structure for the sentence. synt can then process the chart and provide several kinds of output information. The implemented punctuation detection technique utilizes the synt output in the form of automatic and unambiguous extraction of optimal syntactic structures from the sentence (noun phrases, verb phrases, clauses, relative clauses or inserted clauses). Using this feature it is possible to obtain information about syntactic structures related to expected punctuation placement. We also present experiments proving that this method makes it possible to cover most syntactic phenomena needed for punctuation detection or correction.

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.

Parsing of Arabic sentences has considered a necessary condition in many applications that rely on the natural language processing (NLP) techniques, such as automatic translation, information retrieval, and automatic summarization, etc.... more

Parsing of Arabic sentences has considered a necessary condition in many applications that rely on the natural language processing (NLP) techniques, such as automatic translation, information retrieval, and automatic summarization, etc. The study of diacritical marks has an important role in the formation of meaning, because parsing helps to understand the meaning and the relationships between sentence parts.
In this paper, Intelligent hybrid Arabic parser relies on Genetic Algorithm (GA) and expert system contains the Arabic language grammar has been designed and implemented. Text has been segmented into group of sentences that composed from it. Initial solutions (chromosomes) have been encrypted, initial population has been generated and genetic operations have been applied. Search and inference engine applied hybrid control structure that combines forward chaining and backward chaining has been designed. The results of morphological and syntactic analyzers have been relied on to evaluate solutions, Evaluation have been done through fitness function to reach to the optimal solution which is the true parsing of sentence words. Parser has been tested on many of Arabic sentences, and Results have been evaluated using precision criterion. Results of this new and novel system have proved that the system is able to parse Arabic sentences correctly and with high accuracy. This is open a broad horizon for understanding and auto processing of Arabic text.

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.

Cet article est la traduction en français d’une série de trois articles courts publiés originellement en anglais, puis traduits en arabe et en indonésien sur LinkedIn. Ces essais dressent le bilan du travail des informaticiens, après 25... more

Cet article est la traduction en français d’une série de trois articles courts publiés originellement en anglais, puis traduits en arabe et en indonésien sur LinkedIn. Ces essais dressent le bilan du travail des informaticiens, après 25 ans, dans le domaine de la morphologie computationnelle arabe. Nous avons divisé notre exposé en répondant aux questions suivantes:
1. Pourquoi les informaticiens n'arrivent-ils pas à produire des ressources linguistiques arabes précises?
2. Les informaticiens comprennent-ils en profondeur la morphologie arabe traditionnelle?
3. Les informaticiens peuvent-ils repenser la morphologie arabe en sortant des sentiers battus?

Any program written in high level programming language must be meant object code before going to be executed. Compiler has various stages or passes. Each stage having significant criticalness however Parsing or syntactic investigation is... more

Any program written in high level programming language must be meant object code before going to be executed. Compiler has various stages or passes. Each stage having significant criticalness however Parsing or syntactic investigation is the way toward breaking down a series of input string, either in normal language, programming languages or information structures, adjusting to the principles of a proper sentence structure. This paper clarifies the sorts of parsers to create string of executable code. This paper centers on comparison between top down and bottom up parsing approaches. Each approach having critical advantage and downside, and furthermore endure with some bottleneck.

8th International Conference on Natural Language Computing (NATL 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing. The Conference... more

8th International Conference on Natural Language Computing (NATL 2022) will
provide an excellent international forum for sharing knowledge and results in theory,
methodology and applications of Natural Language Computing. The Conference looks for
significant contributions to all major fields of the Natural Language Computing in theoretical
and practical aspects.
Authors are solicited to contribute to the conference by submitting articles that illustrate
research results, projects, surveying works and industrial experiences that describe significant
advances in the following areas, but are not limited to.

Parsing is a step for understanding a natural language to find out about the words and their grammatical relations in a sentence. Statistical parsers require a set of annotated data, called a treebank, to learn the grammar of a language... more

Parsing is a step for understanding a natural language to find out about the words and their grammatical relations in a sentence. Statistical parsers require a set of annotated data, called a treebank, to learn the grammar of a language and apply the learnt model on new, unseen data. This set of annotated data is not available for all languages, and its development is very time-consuming, tedious, and expensive. In this dissertation, we propose a method for treebanking from scratch using machine learning methods.
We first propose a bootstrapping approach to initialize the data annotation process. We aim at reducing human intervention to annotate the data. After developing a small data set, we use this data to train a statistical parser. This small data set suffers from the sparseness of data at the lexical and syntactic construction levels. Therefore, a parser trained with this amount of data might have a low performance in a real application. To resolve the data sparsity problem at the lexical level, we propose an unsupervised word clustering approach to provide a more coarse-grained representation of the lexical items. To resolve the data sparsity problem at the syntactic construction level, we propose active learning which is a promising supervised method to seek informative samples in a data pool. The data that is annotated through an active learning approach helps a learner to obtain performance similar to that of a learner trained with the complete set of annotated data. Consequently, active learning is a great help to reduce the amount of required annotated data.

3 rd International Conference on NLP & Information Retrieval (NLPI 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing and Information... more

3
rd International Conference on NLP & Information Retrieval (NLPI 2022) will provide an
excellent international forum for sharing knowledge and results in theory, methodology and
applications of Natural Language Computing and Information Retrieval.
Authors are solicited to contribute to the conference by submitting articles that illustrate research
results, projects, surveying works and industrial experiences that describe significant advances in
the following areas, but are not limited to.

3 rd International conference on Advanced Natural Language Processing (AdNLP 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing and... more

3
rd International conference on Advanced Natural Language Processing
(AdNLP 2022) will provide an excellent international forum for sharing
knowledge and results in theory, methodology and applications of Natural
Language Computing and its advances.
Authors are solicited to contribute to the conference by submitting articles that
illustrate research results, projects, surveying works and industrial experiences that
describe significant advances in NLP.

Today, automatic generation of questions is a considerable problem by many researchers. Automatic question generation from text has two major activity domains: dialogue/interactive question answering systems and educational assessment. In... more

Today, automatic generation of questions is a considerable
problem by many researchers. Automatic question generation
from text has two major activity domains: dialogue/interactive
question answering systems and educational assessment. In this paper, a proposed system is designed, implemented and tested to automate question generation process. The proposed system generates questions by selecting one sentence at a time, extracts sections of the source sentence, then applies transformation rules or patterns for constructing a question. It uses pure syntactic pattern-matching approach to generate content-related questions in order to improve the independent study of any textual material. The proposed system is powered by OpenNLP open source statistical parser to generate the questions using pattern matching strategy. The system is able to learn dynamically, ready to use in
ease manner and accepts large-scale number of rules.

Researchers of many nations have developed automatic speech recognition (ASR) to show their national improvement in information and communication technology for their languages. This work intends to improve the ASR performance for Myanmar... more

Researchers of many nations have developed automatic speech recognition (ASR) to show their national
improvement in information and communication technology for their languages. This work intends to
improve the ASR performance for Myanmar language by changing different Convolutional Neural Network
(CNN) hyperparameters such as number of feature maps and pooling size. CNN has the abilities of
reducing in spectral variations and modeling spectral correlations that exist in the signal due to the locality
and pooling operation. Therefore, the impact of the hyperparameters on CNN accuracy in ASR tasks is
investigated. A 42-hr-data set is used as training data and the ASR performance was evaluated on two open
test sets: web news and recorded data. As Myanmar language is a syllable-timed language, ASR based on
syllable was built and compared with ASR based on word. As the result, it gained 16.7% word error rate
(WER) and 11.5% syllable error rate (SER) on TestSet1. And it also achieved 21.83% WER and 15.76%
SER on TestSet2.

Two eye-tracking experiments were designed to investigate a novel temporal ambiguity between object relative clauses (object RCs; 'the claim that John made is false)' and complement clauses of a noun (CCs; 'the claim that John made a... more

Two eye-tracking experiments were designed to investigate a novel temporal ambiguity between
object relative clauses (object RCs; 'the claim that John made is false)' and complement clauses of a
noun (CCs; 'the claim that John made a mistake...') in Italian and English. This study has three main
goals: the first is to assess whether a temporary ambiguity between a RC and CC structure gives rise
to a garden path effect; the second, is to consider the potential implications of this effect in relation to
current parsing theories and determine whether it is compatible with the predictions drawn from the
family of reanalysis-based two-stages models (a.o. Frazier 1987; Traxler, Pickering & Clifton 1998;
Van Gompel, Pickering & Traxler 1999); the third is to evaluate competing syntactic analysis for CC
structures. A more traditional analysis of CCs will be compared with a recent proposal presented in
Cecchetto & Donati (2011) and Donati & Cecchetto (2015). We will show that only this latter
account is consistent with our experimental findings.

The Marpa recognizer is described. Marpa is a practical and fully implemented algorithm for the recognition, parsing and evaluation of context-free grammars. The Marpa recognizer is the first to unite the improvements to Earley's... more

The Marpa recognizer is described. Marpa is a practical and fully implemented algorithm for the recognition, parsing and evaluation of context-free grammars. The Marpa recognizer
is the first to unite the improvements to Earley's algorithm found
in Joop Leo's 1991 paper to those in Aycock and Horspool's 2002
paper. New with Marpa is that full knowledge of the state
of the parse, including the list of acceptable tokens, is available
when tokens are scanned. Advantageous for error detection, thisforeknowledge also allows "Ruby Slippers" parsing -- alteration of the input in reaction to the parser's expectations.

This paper presents the development of an open-source Spanish Dependency Grammar implemented in FreeLing environment. This grammar was designed as a resource for NLP applications that require a step further in natural language automatic... more

This paper presents the development of an open-source Spanish Dependency Grammar implemented in FreeLing environment. This grammar was designed as a resource for NLP applications that require a step further in natural language automatic analysis, as is the case of Spanish-to-Basque translation. The development of wide-coverage rule-based grammars using linguistic knowledge contributes to extend the existing Spanish deep parsers collection, which sometimes is limited. Spanish FreeLing Dependency Grammar, named EsTxala, provides deep and robust parse trees, solving attachments for any structure and assigning syntactic functions to dependencies. These steps are dealt with hand-written rules based on linguistic knowledge. As a result, FreeLing Dependency Parser gives a unique analysis as a dependency tree for each sentence analyzed. Since it is a resource open to the scientific community, exhaustive grammar evaluation is being done to determine its accuracy as well as strategies for its...

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.