Meihua Chen - Academia.edu (original) (raw)

Papers by Meihua Chen

Research paper thumbnail of Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

arXiv (Cornell University), Oct 5, 2020

Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.... more Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs. little; briefly vs. shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but has difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners' performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agent's performance. To enable the agent to behave like a learner, we leverage entailment modeling's capability of inferring answers from the provided materials. Experimental results show that the proposed agent is equipped with good learner-like behavior to achieve the best performance in both fill-inthe-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners. The results of the user study show that the proposed agent can find out example sentences that help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning.

Research paper thumbnail of Fake News Classification Based on Content Level Features

Applied Sciences, 2022

Due to the openness and easy accessibility of online social media (OSM), anyone can easily contri... more Due to the openness and easy accessibility of online social media (OSM), anyone can easily contribute a simple paragraph of text to express their opinion on an article that they have seen. Without access control mechanisms, it has been reported that there are many suspicious messages and accounts spreading across multiple platforms. Accordingly, identifying and labeling fake news is a demanding problem due to the massive amount of heterogeneous content. In essence, the functions of machine learning (ML) and natural language processing (NLP) are to enhance, speed up, and automate the analytical process. Therefore, this unstructured text can be transformed into meaningful data and insights. In this paper, the combination of ML and NLP are implemented to classify fake news based on an open, large and labeled corpus on Twitter. In this case, we compare several state-of-the-art ML and neural network models based on content-only features. To enhance classification performance, before the ...

Research paper thumbnail of Towards a Better Learning of Near-Synonyms

Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion, 2017

Language learners are confused by near-synonyms and often look for answers from the Web. However,... more Language learners are confused by near-synonyms and often look for answers from the Web. However, there is little to aid them in sorting through the overwhelming load of information that is offered. In this paper, we propose a new research problem: suggesting example sentences for learning word distinctions. We focus on near-synonyms as the first step. Two kinds of one-class classifiers, the GMM and BiLSTM models, are used to solve fill-in-the-blank (FITB) questions and further to select example sentences which best differentiate groups of near-synonyms. Experiments are conducted on both an open benchmark and a private dataset for the FITB task. Experiments show that the proposed approach yields an accuracy of 73.05% and 83.59% respectively, comparable to state-of-the-art multi-class classifiers. Learner study further shows the results of the example sentence suggestion by the learning effectiveness and demonstrates the proposed model indeed is more effective in learning near-synonyms compared to the resource-based models.

Research paper thumbnail of Extracting Formulaic Expressions and Grammar and Edit Patterns to Assist Academic Writing

Proceedings of the Conference EUROPHRAS 2017 - Computational and Corpus-based Phraseology: Recent Advances and Interdisciplinary Approaches, Volume II (short papers, posters and student workshop papers), 2017

We present a method for extracting formulaic expressions, grammar patterns, and editing rules fro... more We present a method for extracting formulaic expressions, grammar patterns, and editing rules from a given corpus to assist learners in learning to write at the level required in English for Academic Purposes. In our method, sentences in a given corpus are parsed into chunks of base phrases, with the arguments sense disambiguated to derive syntactic and semantic grammar patterns. The method involves executing shallow parsing, transforming phrases into grammar patterns, and filtering and ranking grammar patterns for each headword. We applied the proposed method to a corpus annotated with writing errors and their corrections to derive editing rules. Experiments based on a large-scale academic English corpus and WikEd Error Corpus showed that the proposed method produces reasonable correct grammar patterns as well as edit rules. Thus, the method has the potential to assist learners in writing and self-editing.

Research paper thumbnail of Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.... more Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs. little; briefly vs. shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but have difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners' performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agent's performance. To enable the agent to behave like a learner, we leverages entailment modeling's capability of inferring answers from the provided materials. Experimental results show that the proposed agent is equipped with good learner-like behavior to achieve the best performance in both fill-inthe-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners. The results of the user study show that the proposed agent can find out example sentences that help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning.

Research paper thumbnail of From Receptive to Productive: Learning to Use Confusing Words through Automatically Selected Example Sentences

arXiv (Cornell University), Jun 6, 2019

Knowing how to use words appropriately has been a key to improving language proficiency. Previous... more Knowing how to use words appropriately has been a key to improving language proficiency. Previous studies typically discuss how students learn receptively to select the correct candidate from a set of confusing words in the fill-in-the-blank task where specific context is given. In this paper, we go one step further, assisting students to learn to use confusing words appropriately in a productive task: sentence translation. We leverage the GiveMe-Example system, which suggests example sentences for each confusing word, to achieve this goal. In this study, students learn to differentiate the confusing words by reading the example sentences, and then choose the appropriate word(s) to complete the sentence translation task. Results show students made substantial progress in terms of sentence structure. In addition, highly proficient students better managed to learn confusing words. In view of the influence of the first language on learners, we further propose an effective approach to improve the quality of the suggested sentences.

Research paper thumbnail of Augmentable Paraphrase Extraction Framework

International Joint Conference on Natural Language Processing, Oct 1, 2013

Paraphrase extraction relying on a single factor such as distribution similarity or translation s... more Paraphrase extraction relying on a single factor such as distribution similarity or translation similarity might lead to the loss of some linguistic properties. In this paper, we propose a paraphrase extraction framework, which accommodates various linguistically motivated factors to optimize the quality of paraphrase extraction. The major contributions of this study lie in the augmentable paraphrasing framework and the three kinds of factors conducive to both semantic and syntactic correctness. A manual evaluation showed that our model achieves more successful results than the state-of-the-art methods.

Research paper thumbnail of Extending Bilingual WordNet via Hierarchical Word Translation Classification

Pacific Asia Conference on Language, Information, and Computation, Dec 1, 2009

We introduce a method for learning to assign word senses to translation pairs. In our approach, t... more We introduce a method for learning to assign word senses to translation pairs. In our approach, this sense assignment or disambiguation problem is transformed into one on how to navigate through a sense network like WordNet aimed at distinguishing the more adequate senses from others. The method involves automatically constructing classification models for branching nodes in the network, and automatically learning to reject less probable senses, based on the translation characteristics of word senses and semanticallyrelated word groups (e.g., lexicographer files) respectively. At run-time, translation pairs are expanded with their synonyms and sense ambiguity is resolved using a greedy algorithm choosing the most likely branches based on the trained classification models. Evaluation shows that our method significantly outperforms the strong baseline of assigning most frequent sense to the translation pairs and effectively determines suitable word senses for given translation pairs, suggesting the possibility of employing our method as a computerassisted tool for speeding up the process of lexicography or of using our method to assist machine translation systems in word selection.

Research paper thumbnail of Bilingual Keyword Extraction and its Educational Application

We introduce a method that extracts keywords in a language with the help of the other. The method... more We introduce a method that extracts keywords in a language with the help of the other. The method involves estimating preferences for topical keywords and fusing language-specific word statistics. At run-time, we transform parallel articles into word graphs, build crosslingual edges for word statistics integration, and exploit PageRank with word keyness information for keyword extraction. We apply our method to keyword analysis and language learning. Evaluation shows that keyword extraction benefits from cross-language information and language learners benefit from our keywords in reading comprehension test.

Research paper thumbnail of Computational method for collocation and phrase learning

We introduce a method for learning to find the representative syntax-based context of a given col... more We introduce a method for learning to find the representative syntax-based context of a given collocation/phrase. In our approach, grammatical patterns are extracted for query terms aimed at accelerating lexicographers' and language learners' navigation through the word usage and learning process. The method involves automatically lemmatizing, part-of-speech tagging and shallowly parsing the sentences of a large-sized general corpus, and automatically constructing inverted files for quick search. At run-time, contextual grammar ...

Research paper thumbnail of A Computer-Assisted Translation and Writing System

ACM Transactions on Asian Language Information Processing, 2013

We introduce a method for learning to predict text and grammatical construction in a computer-ass... more We introduce a method for learning to predict text and grammatical construction in a computer-assisted translation and writing framework. In our approach, predictions are offered on the fly to help the user make appropriate lexical and grammar choices during the translation of a source text, thus improving translation quality and productivity. The method involves automatically generating general-to-specific word usage summaries (i.e., writing suggestion module), and automatically learning high-confidence word- or phrase-level translation equivalents (i.e., translation suggestion module). At runtime, the source text and its translation prefix entered by the user are broken down into n-grams to generate grammar and translation predictions, which are further combined and ranked via translation and language models. These ranked prediction candidates are iteratively and interactively displayed to the user in a pop-up menu as translation or writing hints. We present a prototype writing as...

Research paper thumbnail of Helping Our Own: NTHU NLPLAB System Description

Grammatical error correction has been an active research area in the field of Natural Language Pr... more Grammatical error correction has been an active research area in the field of Natural Language Processing. In this paper, we integrated four distinct learning-based modules to correct determiner and preposition errors in leaners' writing. Each module focuses on a particular type of error. Our modules were tested in well-formed data and learners' writing. The results show that our system achieves high recall while preserves satisfactory precision.

Research paper thumbnail of GRASP: Grammar-and Syntax-based Pattern-Finder for Collocation and Phrase Learning

We introduce a method for learning to find the representative syntax-based context of a given col... more We introduce a method for learning to find the representative syntax-based context of a given collocation/phrase. In our approach, grammatical patterns are extracted for query terms aimed at accelerating lexicographers' and language learners' navigation through the word usage and learning process. The method involves automatically lemmatizing, part-of-speech tagging and shallowly parsing the sentences of a large-sized general corpus, and automatically constructing inverted files for quick search. At run-time, contextual grammar patterns are retrieved and presented to users with their corresponding statistical analyses. We present a prototype system, GRASP (grammar-and syntax-based pattern-finder), that applies the method to computer-assisted language learning. Preliminary results show that the extracted patterns not only resemble phrases in grammar books (e.g., make up one's mind) but help to assist the process of language learning and sentence composition/translation.

Research paper thumbnail of An online lexical tutor for promoting formulaic language acquisition

ABSTRACT The issue of formulaic language in L2 acquisition has attracted the interest of research... more ABSTRACT The issue of formulaic language in L2 acquisition has attracted the interest of researchers recently, as language learners are often reported to have problems with formulaic language. With this in mind, we developed a formulaic sequence reference system, GRASP providing learners with a comprehensive view of phrase usages which aims for native-like fluency in writing. We conducted n-gram computation of likelihood for the key phrase and its neighboring words which can extend the phrase by giving its preceding, following, and in-between usage patterns. Consider the collocation “reach agreement”, GRASP characterizes its surrounding contexts using patterns “reach ARTICLE ADJ agreement” and “reach ~ agreement PREP DETERMINER”. Each pattern has its corresponding lexical usages such as “reach a preliminary agreement” and “reach ~ agreement on the” with example sentences respectively. Such information lexically and syntactically depicts how the formulaic sequences are commonly used in context. Evaluation of GRASP was conducted with 150 Chinese-speaking EFL college freshmen in an Asian country. Encouragingly, the results show that GRASP boosted participants’ achievements one and a half times as much as the gain using the traditional dictionary and they were satisfied as shown in the responses to a survey. Overall, GRASP is promising and effective in assisting language learners in formulaic language learning.

Research paper thumbnail of From Receptive to Productive: Learning to Use Confusing Words through Automatically Selected Example Sentences

Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, 2019

Knowing how to use words appropriately has been a key to improving language proficiency. Previous... more Knowing how to use words appropriately has been a key to improving language proficiency. Previous studies typically discuss how students learn receptively to select the correct candidate from a set of confusing words in the fill-in-the-blank task where specific context is given. In this paper, we go one step further, assisting students to learn to use confusing words appropriately in a productive task: sentence translation. We leverage the GiveMe-Example system, which suggests example sentences for each confusing word, to achieve this goal. In this study, students learn to differentiate the confusing words by reading the example sentences, and then choose the appropriate word(s) to complete the sentence translation task. Results show students made substantial progress in terms of sentence structure. In addition, highly proficient students better managed to learn confusing words. In view of the influence of the first language on learners, we further propose an effective approach to improve the quality of the suggested sentences.

Research paper thumbnail of Phrasal Paraphrase Learning: Exploring an Effective Strategy to Consolidate Vocabulary Knowledge

Установление интересов субъектов отношений и конкретизация объекта преступления Аннотация. В стат... more Установление интересов субъектов отношений и конкретизация объекта преступления Аннотация. В статье рассматривается значение интересов субъектов отношений для установления объекта преступления. Концепция «объект преступления-общественные отношения» универсальна, но имеет недостаток: как явление, общественные отношения неосязаемы, а как понятие-абстрактны. Это представляет сложности для правоприменителя. По мнению автора, одним из способов конкретизации объекта преступления является рассмотрение его через интересы субъектов, как возможности действовать или пребывать в определенном состоянии. Уголовный закон, абстрагируясь от конкретных, частных случаев, устанавливает охрану потенциальных возможностей граждан, реализация которых способствует развитию общества. Общественные отношения и интересы нет смысла противопоставлять, поскольку отношения возникают и развиваются для реализации субъектом своих интересов. Особое значение приобретает интерес при установлении социального вреда общественным отношениям, которые выражаются в деятельности их субъектов, например отношениям в сфере экономической деятельности. Негативные последствия при воздействии на эти отношения проявляются не сразу, а через определенное время, что затрудняет установление осознания общественной опасности. По мнению автора, для конкретизации предмета предвидения правоприменителю следует устанавливать осознание субъектом не общественной опасности, а общественной значимости своих действий и предвидение наступления не общественно опасных последствий, а возможности причинения вреда интересам личности, общества и государства.

Research paper thumbnail of A Cross-Lingual Pattern Retrieval Framework

Polibits, 2011

We introduce a method for learning to grammatically categorize and organize the contexts of a giv... more We introduce a method for learning to grammatically categorize and organize the contexts of a given query. In our approach, grammatical descriptions, from general word groups to specific lexical phrases, are imposed on the query's contexts aimed at accelerating lexicographers' and language learners' navigation through and GRASP upon the word usages. The method involves lemmatizing, part-of-speech tagging and shallowly parsing a general corpus and constructing its inverted files for monolingual queries, and word-aligning parallel texts and extracting and pruning translation equivalents for cross-lingual ones. At run-time, grammar-like patterns are generated, organized to form a thesaurus index structure on query words' contexts, and presented to users along with their instantiations. Experimental results show that the extracted predominant patterns resemble phrases in grammar books and that the abstract-to-concrete context hierarchy of querying words effectively assists the process of language learning, especially in sentence translation or composition. Index terms-Grammatical constructions, lexical phrases, context, language learning, inverted files, phrase pairs, crosslingual pattern retrieval.

Research paper thumbnail of Using a Paraphrase Reference Tool to Improve EFL Learners’ Writing Skills

Paraphrasing, or restating information in other words, is an important writing skill for the acad... more Paraphrasing, or restating information in other words, is an important writing skill for the academic genre. Mastering paraphrasing skills helps language learners write and self-edit their works. However, there has been little research on developing automatic reference tools to assist language learners’ paraphrasing for better writing quality. In the light of this pressing need, we developed an automatic paraphrase reference tool, PREFER (PREFabricate Expression Recognizer) to help English learners vary their expressions and further improve their writing skills. PREFER is designed to automatically generate and display phrasal paraphrases for EFL learners’ reference (with Chinese as their first language). The method involves using phrasal translations in a bilingual parallel corpus and machine translation techniques. We infer the semantic equivalence between English phrases if they are aligned to the same Chinese phrase (i.e., the “pivot”). For example, given a phrase “on the whole”,...

Research paper thumbnail of The Spiral Spin State in LiCu 2 O 2

Journal of the Physical Society of Japan, 2014

ABSTRACT

Research paper thumbnail of An Automatic Reference Aid for Improving EFL Learners' Formulaic Expressions in Productive Language Use

IEEE Transactions on Learning Technologies, 2014

ABSTRACT Formulaic language is important to language acquisition; however, English language learn... more ABSTRACT Formulaic language is important to language acquisition; however, English language learners are often reported to have problems with formulaic expressions. Several lists of formulaic sequences have been proposed, mainly for developing teaching and testing materials. However, their limited numbers and insufficient usage information seem unable to benefit formulaic language use. To address these issues we have developed GRASP, a reference aid for formulaic expressions, to promote learners' productive competence. Users are allowed multi-word inputs to target their desired phrases or collocations. Utilizing natural language processing techniques, our system categorizes and displays the structures and sequences in a hierarchical way. The corresponding example sentences are also provided. The formulaic structures serve as a quick access index. The formulaic sequences and corpus examples illustrate the real world language use. Importantly, automatic summarization from language data lends support to the idea of data-driven learning. A single-group pre-posttest design was adopted to assess the effectiveness of GRASP on 150 Chinese-speaking college freshmen. The results indicated that our reference aid made a substantial contribution to students' performance on formulaic expression use in a sentence completion task, compared with the existing tools. Notably, the less proficient students showed marked improvement.

Research paper thumbnail of Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

arXiv (Cornell University), Oct 5, 2020

Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.... more Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs. little; briefly vs. shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but has difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners' performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agent's performance. To enable the agent to behave like a learner, we leverage entailment modeling's capability of inferring answers from the provided materials. Experimental results show that the proposed agent is equipped with good learner-like behavior to achieve the best performance in both fill-inthe-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners. The results of the user study show that the proposed agent can find out example sentences that help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning.

Research paper thumbnail of Fake News Classification Based on Content Level Features

Applied Sciences, 2022

Due to the openness and easy accessibility of online social media (OSM), anyone can easily contri... more Due to the openness and easy accessibility of online social media (OSM), anyone can easily contribute a simple paragraph of text to express their opinion on an article that they have seen. Without access control mechanisms, it has been reported that there are many suspicious messages and accounts spreading across multiple platforms. Accordingly, identifying and labeling fake news is a demanding problem due to the massive amount of heterogeneous content. In essence, the functions of machine learning (ML) and natural language processing (NLP) are to enhance, speed up, and automate the analytical process. Therefore, this unstructured text can be transformed into meaningful data and insights. In this paper, the combination of ML and NLP are implemented to classify fake news based on an open, large and labeled corpus on Twitter. In this case, we compare several state-of-the-art ML and neural network models based on content-only features. To enhance classification performance, before the ...

Research paper thumbnail of Towards a Better Learning of Near-Synonyms

Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion, 2017

Language learners are confused by near-synonyms and often look for answers from the Web. However,... more Language learners are confused by near-synonyms and often look for answers from the Web. However, there is little to aid them in sorting through the overwhelming load of information that is offered. In this paper, we propose a new research problem: suggesting example sentences for learning word distinctions. We focus on near-synonyms as the first step. Two kinds of one-class classifiers, the GMM and BiLSTM models, are used to solve fill-in-the-blank (FITB) questions and further to select example sentences which best differentiate groups of near-synonyms. Experiments are conducted on both an open benchmark and a private dataset for the FITB task. Experiments show that the proposed approach yields an accuracy of 73.05% and 83.59% respectively, comparable to state-of-the-art multi-class classifiers. Learner study further shows the results of the example sentence suggestion by the learning effectiveness and demonstrates the proposed model indeed is more effective in learning near-synonyms compared to the resource-based models.

Research paper thumbnail of Extracting Formulaic Expressions and Grammar and Edit Patterns to Assist Academic Writing

Proceedings of the Conference EUROPHRAS 2017 - Computational and Corpus-based Phraseology: Recent Advances and Interdisciplinary Approaches, Volume II (short papers, posters and student workshop papers), 2017

We present a method for extracting formulaic expressions, grammar patterns, and editing rules fro... more We present a method for extracting formulaic expressions, grammar patterns, and editing rules from a given corpus to assist learners in learning to write at the level required in English for Academic Purposes. In our method, sentences in a given corpus are parsed into chunks of base phrases, with the arguments sense disambiguated to derive syntactic and semantic grammar patterns. The method involves executing shallow parsing, transforming phrases into grammar patterns, and filtering and ranking grammar patterns for each headword. We applied the proposed method to a corpus annotated with writing errors and their corrections to derive editing rules. Experiments based on a large-scale academic English corpus and WikEd Error Corpus showed that the proposed method produces reasonable correct grammar patterns as well as edit rules. Thus, the method has the potential to assist learners in writing and self-editing.

Research paper thumbnail of Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.... more Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs. little; briefly vs. shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but have difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners' performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agent's performance. To enable the agent to behave like a learner, we leverages entailment modeling's capability of inferring answers from the provided materials. Experimental results show that the proposed agent is equipped with good learner-like behavior to achieve the best performance in both fill-inthe-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners. The results of the user study show that the proposed agent can find out example sentences that help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning.

Research paper thumbnail of From Receptive to Productive: Learning to Use Confusing Words through Automatically Selected Example Sentences

arXiv (Cornell University), Jun 6, 2019

Knowing how to use words appropriately has been a key to improving language proficiency. Previous... more Knowing how to use words appropriately has been a key to improving language proficiency. Previous studies typically discuss how students learn receptively to select the correct candidate from a set of confusing words in the fill-in-the-blank task where specific context is given. In this paper, we go one step further, assisting students to learn to use confusing words appropriately in a productive task: sentence translation. We leverage the GiveMe-Example system, which suggests example sentences for each confusing word, to achieve this goal. In this study, students learn to differentiate the confusing words by reading the example sentences, and then choose the appropriate word(s) to complete the sentence translation task. Results show students made substantial progress in terms of sentence structure. In addition, highly proficient students better managed to learn confusing words. In view of the influence of the first language on learners, we further propose an effective approach to improve the quality of the suggested sentences.

Research paper thumbnail of Augmentable Paraphrase Extraction Framework

International Joint Conference on Natural Language Processing, Oct 1, 2013

Paraphrase extraction relying on a single factor such as distribution similarity or translation s... more Paraphrase extraction relying on a single factor such as distribution similarity or translation similarity might lead to the loss of some linguistic properties. In this paper, we propose a paraphrase extraction framework, which accommodates various linguistically motivated factors to optimize the quality of paraphrase extraction. The major contributions of this study lie in the augmentable paraphrasing framework and the three kinds of factors conducive to both semantic and syntactic correctness. A manual evaluation showed that our model achieves more successful results than the state-of-the-art methods.

Research paper thumbnail of Extending Bilingual WordNet via Hierarchical Word Translation Classification

Pacific Asia Conference on Language, Information, and Computation, Dec 1, 2009

We introduce a method for learning to assign word senses to translation pairs. In our approach, t... more We introduce a method for learning to assign word senses to translation pairs. In our approach, this sense assignment or disambiguation problem is transformed into one on how to navigate through a sense network like WordNet aimed at distinguishing the more adequate senses from others. The method involves automatically constructing classification models for branching nodes in the network, and automatically learning to reject less probable senses, based on the translation characteristics of word senses and semanticallyrelated word groups (e.g., lexicographer files) respectively. At run-time, translation pairs are expanded with their synonyms and sense ambiguity is resolved using a greedy algorithm choosing the most likely branches based on the trained classification models. Evaluation shows that our method significantly outperforms the strong baseline of assigning most frequent sense to the translation pairs and effectively determines suitable word senses for given translation pairs, suggesting the possibility of employing our method as a computerassisted tool for speeding up the process of lexicography or of using our method to assist machine translation systems in word selection.

Research paper thumbnail of Bilingual Keyword Extraction and its Educational Application

We introduce a method that extracts keywords in a language with the help of the other. The method... more We introduce a method that extracts keywords in a language with the help of the other. The method involves estimating preferences for topical keywords and fusing language-specific word statistics. At run-time, we transform parallel articles into word graphs, build crosslingual edges for word statistics integration, and exploit PageRank with word keyness information for keyword extraction. We apply our method to keyword analysis and language learning. Evaluation shows that keyword extraction benefits from cross-language information and language learners benefit from our keywords in reading comprehension test.

Research paper thumbnail of Computational method for collocation and phrase learning

We introduce a method for learning to find the representative syntax-based context of a given col... more We introduce a method for learning to find the representative syntax-based context of a given collocation/phrase. In our approach, grammatical patterns are extracted for query terms aimed at accelerating lexicographers' and language learners' navigation through the word usage and learning process. The method involves automatically lemmatizing, part-of-speech tagging and shallowly parsing the sentences of a large-sized general corpus, and automatically constructing inverted files for quick search. At run-time, contextual grammar ...

Research paper thumbnail of A Computer-Assisted Translation and Writing System

ACM Transactions on Asian Language Information Processing, 2013

We introduce a method for learning to predict text and grammatical construction in a computer-ass... more We introduce a method for learning to predict text and grammatical construction in a computer-assisted translation and writing framework. In our approach, predictions are offered on the fly to help the user make appropriate lexical and grammar choices during the translation of a source text, thus improving translation quality and productivity. The method involves automatically generating general-to-specific word usage summaries (i.e., writing suggestion module), and automatically learning high-confidence word- or phrase-level translation equivalents (i.e., translation suggestion module). At runtime, the source text and its translation prefix entered by the user are broken down into n-grams to generate grammar and translation predictions, which are further combined and ranked via translation and language models. These ranked prediction candidates are iteratively and interactively displayed to the user in a pop-up menu as translation or writing hints. We present a prototype writing as...

Research paper thumbnail of Helping Our Own: NTHU NLPLAB System Description

Grammatical error correction has been an active research area in the field of Natural Language Pr... more Grammatical error correction has been an active research area in the field of Natural Language Processing. In this paper, we integrated four distinct learning-based modules to correct determiner and preposition errors in leaners' writing. Each module focuses on a particular type of error. Our modules were tested in well-formed data and learners' writing. The results show that our system achieves high recall while preserves satisfactory precision.

Research paper thumbnail of GRASP: Grammar-and Syntax-based Pattern-Finder for Collocation and Phrase Learning

We introduce a method for learning to find the representative syntax-based context of a given col... more We introduce a method for learning to find the representative syntax-based context of a given collocation/phrase. In our approach, grammatical patterns are extracted for query terms aimed at accelerating lexicographers' and language learners' navigation through the word usage and learning process. The method involves automatically lemmatizing, part-of-speech tagging and shallowly parsing the sentences of a large-sized general corpus, and automatically constructing inverted files for quick search. At run-time, contextual grammar patterns are retrieved and presented to users with their corresponding statistical analyses. We present a prototype system, GRASP (grammar-and syntax-based pattern-finder), that applies the method to computer-assisted language learning. Preliminary results show that the extracted patterns not only resemble phrases in grammar books (e.g., make up one's mind) but help to assist the process of language learning and sentence composition/translation.

Research paper thumbnail of An online lexical tutor for promoting formulaic language acquisition

ABSTRACT The issue of formulaic language in L2 acquisition has attracted the interest of research... more ABSTRACT The issue of formulaic language in L2 acquisition has attracted the interest of researchers recently, as language learners are often reported to have problems with formulaic language. With this in mind, we developed a formulaic sequence reference system, GRASP providing learners with a comprehensive view of phrase usages which aims for native-like fluency in writing. We conducted n-gram computation of likelihood for the key phrase and its neighboring words which can extend the phrase by giving its preceding, following, and in-between usage patterns. Consider the collocation “reach agreement”, GRASP characterizes its surrounding contexts using patterns “reach ARTICLE ADJ agreement” and “reach ~ agreement PREP DETERMINER”. Each pattern has its corresponding lexical usages such as “reach a preliminary agreement” and “reach ~ agreement on the” with example sentences respectively. Such information lexically and syntactically depicts how the formulaic sequences are commonly used in context. Evaluation of GRASP was conducted with 150 Chinese-speaking EFL college freshmen in an Asian country. Encouragingly, the results show that GRASP boosted participants’ achievements one and a half times as much as the gain using the traditional dictionary and they were satisfied as shown in the responses to a survey. Overall, GRASP is promising and effective in assisting language learners in formulaic language learning.

Research paper thumbnail of From Receptive to Productive: Learning to Use Confusing Words through Automatically Selected Example Sentences

Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, 2019

Knowing how to use words appropriately has been a key to improving language proficiency. Previous... more Knowing how to use words appropriately has been a key to improving language proficiency. Previous studies typically discuss how students learn receptively to select the correct candidate from a set of confusing words in the fill-in-the-blank task where specific context is given. In this paper, we go one step further, assisting students to learn to use confusing words appropriately in a productive task: sentence translation. We leverage the GiveMe-Example system, which suggests example sentences for each confusing word, to achieve this goal. In this study, students learn to differentiate the confusing words by reading the example sentences, and then choose the appropriate word(s) to complete the sentence translation task. Results show students made substantial progress in terms of sentence structure. In addition, highly proficient students better managed to learn confusing words. In view of the influence of the first language on learners, we further propose an effective approach to improve the quality of the suggested sentences.

Research paper thumbnail of Phrasal Paraphrase Learning: Exploring an Effective Strategy to Consolidate Vocabulary Knowledge

Установление интересов субъектов отношений и конкретизация объекта преступления Аннотация. В стат... more Установление интересов субъектов отношений и конкретизация объекта преступления Аннотация. В статье рассматривается значение интересов субъектов отношений для установления объекта преступления. Концепция «объект преступления-общественные отношения» универсальна, но имеет недостаток: как явление, общественные отношения неосязаемы, а как понятие-абстрактны. Это представляет сложности для правоприменителя. По мнению автора, одним из способов конкретизации объекта преступления является рассмотрение его через интересы субъектов, как возможности действовать или пребывать в определенном состоянии. Уголовный закон, абстрагируясь от конкретных, частных случаев, устанавливает охрану потенциальных возможностей граждан, реализация которых способствует развитию общества. Общественные отношения и интересы нет смысла противопоставлять, поскольку отношения возникают и развиваются для реализации субъектом своих интересов. Особое значение приобретает интерес при установлении социального вреда общественным отношениям, которые выражаются в деятельности их субъектов, например отношениям в сфере экономической деятельности. Негативные последствия при воздействии на эти отношения проявляются не сразу, а через определенное время, что затрудняет установление осознания общественной опасности. По мнению автора, для конкретизации предмета предвидения правоприменителю следует устанавливать осознание субъектом не общественной опасности, а общественной значимости своих действий и предвидение наступления не общественно опасных последствий, а возможности причинения вреда интересам личности, общества и государства.

Research paper thumbnail of A Cross-Lingual Pattern Retrieval Framework

Polibits, 2011

We introduce a method for learning to grammatically categorize and organize the contexts of a giv... more We introduce a method for learning to grammatically categorize and organize the contexts of a given query. In our approach, grammatical descriptions, from general word groups to specific lexical phrases, are imposed on the query's contexts aimed at accelerating lexicographers' and language learners' navigation through and GRASP upon the word usages. The method involves lemmatizing, part-of-speech tagging and shallowly parsing a general corpus and constructing its inverted files for monolingual queries, and word-aligning parallel texts and extracting and pruning translation equivalents for cross-lingual ones. At run-time, grammar-like patterns are generated, organized to form a thesaurus index structure on query words' contexts, and presented to users along with their instantiations. Experimental results show that the extracted predominant patterns resemble phrases in grammar books and that the abstract-to-concrete context hierarchy of querying words effectively assists the process of language learning, especially in sentence translation or composition. Index terms-Grammatical constructions, lexical phrases, context, language learning, inverted files, phrase pairs, crosslingual pattern retrieval.

Research paper thumbnail of Using a Paraphrase Reference Tool to Improve EFL Learners’ Writing Skills

Paraphrasing, or restating information in other words, is an important writing skill for the acad... more Paraphrasing, or restating information in other words, is an important writing skill for the academic genre. Mastering paraphrasing skills helps language learners write and self-edit their works. However, there has been little research on developing automatic reference tools to assist language learners’ paraphrasing for better writing quality. In the light of this pressing need, we developed an automatic paraphrase reference tool, PREFER (PREFabricate Expression Recognizer) to help English learners vary their expressions and further improve their writing skills. PREFER is designed to automatically generate and display phrasal paraphrases for EFL learners’ reference (with Chinese as their first language). The method involves using phrasal translations in a bilingual parallel corpus and machine translation techniques. We infer the semantic equivalence between English phrases if they are aligned to the same Chinese phrase (i.e., the “pivot”). For example, given a phrase “on the whole”,...

Research paper thumbnail of The Spiral Spin State in LiCu 2 O 2

Journal of the Physical Society of Japan, 2014

ABSTRACT

Research paper thumbnail of An Automatic Reference Aid for Improving EFL Learners' Formulaic Expressions in Productive Language Use

IEEE Transactions on Learning Technologies, 2014

ABSTRACT Formulaic language is important to language acquisition; however, English language learn... more ABSTRACT Formulaic language is important to language acquisition; however, English language learners are often reported to have problems with formulaic expressions. Several lists of formulaic sequences have been proposed, mainly for developing teaching and testing materials. However, their limited numbers and insufficient usage information seem unable to benefit formulaic language use. To address these issues we have developed GRASP, a reference aid for formulaic expressions, to promote learners' productive competence. Users are allowed multi-word inputs to target their desired phrases or collocations. Utilizing natural language processing techniques, our system categorizes and displays the structures and sequences in a hierarchical way. The corresponding example sentences are also provided. The formulaic structures serve as a quick access index. The formulaic sequences and corpus examples illustrate the real world language use. Importantly, automatic summarization from language data lends support to the idea of data-driven learning. A single-group pre-posttest design was adopted to assess the effectiveness of GRASP on 150 Chinese-speaking college freshmen. The results indicated that our reference aid made a substantial contribution to students' performance on formulaic expression use in a sentence completion task, compared with the existing tools. Notably, the less proficient students showed marked improvement.