Muhua Zhu - Academia.edu (original) (raw)

Papers by Muhua Zhu

We participated in the Third International Chinese Word Segmentation Bakeoff. Specifically, we ev... more We participated in the Third International Chinese Word Segmentation Bakeoff. Specifically, we evaluated our Chinese word segmenter NEUCipSeg in the close track, on all four corpora, namely Academis Sinica (AS), City University of Hong Kong (CITYU), Microsoft Research (MSRA), and University of Pennsylvania/University of Colorado (UPENN). Based on Support Vector Machines (SVMs), a basic segmenter is designed regarding Chinese word segmentation as a problem of character-based tagging. Moreover, we proposed postprocessing rules specially taking into account the properties of results brought out by the basic segmenter. Our system achieved good ranks in all four corpora.

Natural Language Processing and Chinese Computing, 2018

In this paper, we propose a retrieval and knowledge-based question answering system for the compe... more In this paper, we propose a retrieval and knowledge-based question answering system for the competition task in NLPCC 2017. Regarding the question side, our system uses a ranking model to score candidate entities to detect a topic entity from questions. Then similarities between the question and candidate relation chains are computed, based on which candidate answer entities are ranked. By returning the highest scored answer entity, our system finally achieves the F1-score of 41.96% on test set of NLPCC 2017. Our current system focuses on solving single-relation questions, but it can be extended to answering multiple-relation questions.

Due to the scarcity of annotated data, Abstract Meaning Representation (AMR) research is relative... more Due to the scarcity of annotated data, Abstract Meaning Representation (AMR) research is relatively limited and challenging for languages other than English. Upon the availability of English AMR dataset and English-to- X parallel datasets, in this paper we propose a novel cross-lingual pre-training approach via multi-task learning (MTL) for both zeroshot AMR parsing and AMR-to-text generation. Specifically, we consider three types of relevant tasks, including AMR parsing, AMR-to-text generation, and machine translation. We hope that knowledge gained while learning for English AMR parsing and text generation can be transferred to the counterparts of other languages. With properly pretrained models, we explore four different finetuning methods, i.e., vanilla fine-tuning with a single task, one-for-all MTL fine-tuning, targeted MTL fine-tuning, and teacher-studentbased MTL fine-tuning. Experimental results on AMR parsing and text generation of multiple non-English languages demonstrate...

Due to different types of inputs, diverse text generation tasks may adopt different encoder-decod... more Due to different types of inputs, diverse text generation tasks may adopt different encoder-decoder frameworks. Thus most existing approaches that aim to improve the robustness of certain generation tasks are input-relevant, and may not work well for other generation tasks. Alternatively, in this paper we present a universal approach to enhance the language representation for text generation on the base of generic encoder-decoder frameworks. This is done from two levels. First, we introduce randomness by randomly masking some percentage of tokens on the decoder side when training the models. In this way, instead of using ground truth history context, we use its corrupted version to predict the next token. Then we propose an auxiliary task to properly recover those masked tokens. Experimental results on several text generation tasks including machine translation (MT), AMR-to-text generation, and image captioning show that the proposed approach can significantly improve over competiti...

ArXiv, 2021

Recent studies on Knowledge Base Question Answering (KBQA) have shown great progress on this task... more Recent studies on Knowledge Base Question Answering (KBQA) have shown great progress on this task via better question understanding. Previous works for encoding questions mainly focus on the word sequences, but seldom consider the information from syntactic trees. In this paper, we propose an approach to learn syntax-based representations for KBQA. First, we encode path-based syntax by considering the shortest dependency paths between keywords. Then, we propose two encoding strategies to mode the information of whole syntactic trees to obtain tree-based syntax. Finally, we combine both path-based and tree-based syntax representations for KBQA. We conduct extensive experiments on a widely used benchmark dataset and the experimental results show that our syntax-aware systems can make full use of syntax information in different settings and achieve state-of-the-art performance of KBQA.

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Fine-grained entity typing (FET) is a fundamental task for various entity-leveraging applications... more Fine-grained entity typing (FET) is a fundamental task for various entity-leveraging applications. Although great success has been made, existing systems still have challenges in handling noisy samples in training data introduced by distant supervision methods. To address these noise, previous studies either focus on processing the clean samples (i,e., have only one label) and noisy samples (i,e., have multiple labels) with different strategies or filtering the noisy labels based on the assumption that the distantly-supervised label set certainly contains the correct type label. In this paper, we propose a probabilistic automatic relabeling method which treats all training samples uniformly. Our method aims to estimate the pseudo-truth label distribution of each sample, and the pseudo-truth distribution will be treated as part of trainable parameters which are jointly updated during the training process. The proposed approach does not rely on any prerequisite or extra supervision, m...

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Fully supervised neural approaches have achieved significant progress in the task of Chinese word... more Fully supervised neural approaches have achieved significant progress in the task of Chinese word segmentation (CWS). Nevertheless, the performance of supervised models tends to drop dramatically when they are applied to outof-domain data. Performance degradation is caused by the distribution gap across domains and the out of vocabulary (OOV) problem. In order to simultaneously alleviate these two issues, this paper proposes to couple distant annotation and adversarial training for crossdomain CWS. For distant annotation, we rethink the essence of "Chinese words" and design an automatic distant annotation mechanism that does not need any supervision or pre-defined dictionaries from the target domain. The approach could effectively explore domain-specific words and distantly annotate the raw texts for the target domain. For adversarial training, we develop a sentence-level training procedure to perform noise reduction and maximum utilization of the source domain information. Experiments on multiple realworld datasets across various domains show the superiority and robustness of our model, significantly outperforming previous state-ofthe-art cross-domain CWS methods.

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Recent studies on AMR-to-text generation often formalize the task as a sequence-tosequence (seq2s... more Recent studies on AMR-to-text generation often formalize the task as a sequence-tosequence (seq2seq) learning problem by converting an Abstract Meaning Representation (AMR) graph into a word sequence. Graph structures are further modeled into the seq2seq framework in order to utilize the structural information in the AMR graphs. However, previous approaches only consider the relations between directly connected concepts while ignoring the rich structure in AMR graphs. In this paper we eliminate such a strong limitation and propose a novel structure-aware selfattention approach to better modeling the relations between indirectly connected concepts in the state-of-the-art seq2seq model, i.e., the Transformer. In particular, a few different methods are explored to learn structural representations between two concepts. Experimental results on English AMR benchmark datasets show that our approach significantly outperforms the state of the art with 29.66 and 31.82 BLEU scores on LDC2015E86 and LDC2017T10, respectively. To the best of our knowledge, these are the best results achieved so far by supervised models on the benchmarks.

Computational Intelligence, 2016

Shift-reduce parsing enjoys the property of efficiency because of the use of efficient parsing al... more Shift-reduce parsing enjoys the property of efficiency because of the use of efficient parsing algorithms like greedy/deterministic search and beam search. In addition, shift-reduce parsing is much simpler and easy to implement compared with other parsing algorithms. In this article, we explore constituent boundary information to improve the performance of shift-reduce phrase-structure parsing. In previous work, constituent boundary information has been used to speed up chart parsers successfully. However, whether it is useful for improving parsing accuracy has not been investigated. We propose two different models to capture constituent boundary information, based on which two sets of novel features are designed for a shift-reduce parser. The first model is a boundary prediction model that uses a classifier to predict the boundaries of constituents. We use automatically parsed data to train the classifier. The second one is a Tree Likelihood Model that measures the validity of a constituent by its likelihood which is calculated on automatically parsed data. Experimental results show that our proposed method outperforms a strong baseline by 0:8% and 1:6% in F-score on English and Chinese data, respectively, achieving the competitive parsing accuracies on Chinese (84:8%) and English (90:8%). To our knowledge, this is the first time for shift-reduce phrase-structure parsing to advance the state-of-the-art with constituent boundary information.

Proceedings of the AAAI Conference on Artificial Intelligence, 2019

The study of consumer psychology reveals two categories of consumption decision procedures: compe... more The study of consumer psychology reveals two categories of consumption decision procedures: compensatory rules and non-compensatory rules. Existing recommendation models which are based on latent factor models assume the consumers follow the compensatory rules, i.e. they evaluate an item over multiple aspects and compute a weighted or/and summated score which is used to derive the rating or ranking of the item. However, it has been shown in the literature of consumer behavior that, consumers adopt non-compensatory rules more often than compensatory rules. Our main contribution in this paper is to study the unexplored area of utilizing non-compensatory rules in recommendation models.Our general assumptions are (1) there are K universal hidden aspects. In each evaluation session, only one aspect is chosen as the prominent aspect according to user preference. (2) Evaluations over prominent and non-prominent aspects are non-compensatory. Evaluation is mainly based on item performance on...

Proceedings of the AAAI Conference on Artificial Intelligence, 2019

Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State... more Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art approaches treat it as a sequence labeling problem and adopt such models as BiLSTM-CRF. While these models work relatively well on standard benchmark datasets, they face challenges in the context of E-commerce where the slot labels are more informative and carry richer expressions. In this work, inspired by the unique structure of E-commerce knowledge base, we propose a novel multi-task model with cascade and residual connections, which jointly learns segment tagging, named entity tagging and slot filling. Experiments show the effectiveness of the proposed cascade and residual structures. Our model has a 14.6% advantage in F1 score over the strong baseline methods on a new Chinese E-commerce shopping assistant dataset, while achieving competitive accuracies on a standard dataset. Furthermore, online test deployed on such dominant E-commerce platform shows 130% improvement on ...

ACM Transactions on Asian and Low-Resource Language Information Processing, 2016

Semantic parsing maps a sentence in natural language into a structured meaning representation. Pr... more Semantic parsing maps a sentence in natural language into a structured meaning representation. Previous studies show that semantic parsing with synchronous context-free grammars (SCFGs) achieves favorable performance over most other alternatives. Motivated by the observation that the performance of semantic parsing with SCFGs is closely tied to the translation rules, this article explores to extend translation rules with high quality and increased coverage in three ways. First, we examine the difference between word alignments for semantic parsing and statistical machine translation (SMT) to better adapt word alignment in SMT to semantic parsing. Second, we introduce both structure and syntax informed nonterminals, better guiding the parsing in favor of well-formed structure, instead of using a uninformed nonterminal in SCFGs. Third, we address the unknown word translation issue via synthetic translation rules. Last but not least, we use a filtering approach to improve performance v...

Proceedings of the CoNLL-16 shared task, 2016

This paper describes our submission to the CoNLL-2016 shared task (Xue et al., 2016) on end-to-en... more This paper describes our submission to the CoNLL-2016 shared task (Xue et al., 2016) on end-to-end Chinese shallow discourse parsing. We decompose the endto-end process into four steps. Firstly, we define a syntactically heuristic algorithm to identify elementary discourse units (EDUs) and further to recognize valid EDU pairs. Secondly, we recognize explicit discourse connectives. Thirdly, we link each explicit connective to valid EDU pairs to obtain explicit discourse relations. For those valid EDU pairs not linked to any explicit connective, they become non-explicit discourse relations. 1 Finally, we assign each discourse relation, either explicit or non-explicit with a discourse sense. Our system is evaluated on the closed track of the CoNLL-2016 shared task and achieves 35.54% and 23.46% in F1-measure on the official test set and blind test set, respectively.

Proceedings of the CoNLL-16 shared task, 2016

This paper describes the submitted English shallow discourse parsing system from the natural lang... more This paper describes the submitted English shallow discourse parsing system from the natural language processing (NLP) group of Soochow university (SoNLP-DP) to the CoNLL-2016 shared task. Our System classifies discourse relations into explicit and non-explicit relations and uses a pipeline platform to conduct every subtask to form an end-to-end shallow discourse parser in the Penn Discourse Treebank (PDTB). Our system is evaluated on the CoNLL-2016 Shared Task closed track and achieves the 24.31% and 28.78% in F1-measure on the official blind test set and test set, respectively.

This paper proposes a method to improve shift-reduce constituency parsing by using lexical depend... more This paper proposes a method to improve shift-reduce constituency parsing by using lexical dependencies. The lexical dependency information is obtained from a large amount of auto-parsed data that is generated by a baseline shift-reduce parser on unlabeled data. We then incorporate a set of novel features defined on this information into the shift-reduce parsing model. The features can help to disambiguate action conflicts during decoding. Experimental results show that the new features achieve absolute improvements over a strong baseline by 0.9% and 1.1% on English and Chinese respectively. Moreover, the improved parser outperforms all previously reported shift-reduce constituency parsers.

Proceedings of ACL-IJCNLP 2015 System Demonstrations, 2015

We present a new toolkit-NiuParserfor Chinese syntactic and semantic analysis. It can handle a wi... more We present a new toolkit-NiuParserfor Chinese syntactic and semantic analysis. It can handle a wide range of Natural Language Processing (NLP) tasks in Chinese, including word segmentation, partof-speech tagging, named entity recognition, chunking, constituent parsing, dependency parsing, and semantic role labeling. The NiuParser system runs fast and shows state-of-the-art performance on several benchmarks. Moreover, it is very easy to use for both research and industrial purposes. Advanced features include the Software Development Kit (SDK) interfaces and a multi-thread implementation for system speed-up.

Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration - NEWS '09, 2009

This paper presents an approach to translating Chinese organization names into English based on c... more This paper presents an approach to translating Chinese organization names into English based on correlative expansion. Firstly, some candidate translations are generated by using statistical translation method. And several correlative named entities for the input are retrieved from a correlative named entity list. Secondly, three kinds of expansion methods are used to generate some expanded queries. Finally, these queries are submitted to a search engine, and the refined translation results are mined and re-ranked by using the returned web pages. Experimental results show that this approach outperforms the compared system in overall translation accuracy.

Shift-reduce dependency parsers give comparable accuracies to their chart-based counterparts, yet... more Shift-reduce dependency parsers give comparable accuracies to their chart-based counterparts, yet the best shift-reduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.

International Journal of Computer Processing of Languages, 2005

In this paper, we present a novel model for improving the performance of Domain Dictionary-based ... more In this paper, we present a novel model for improving the performance of Domain Dictionary-based text categorization. The proposed model is named as Self-Partition Model (SPM). SPM can group the candidate words into the predefined clusters, which are generated according to the structure of Domain Dictionary. Using these learned clusters as features, we proposed a novel text representation. The experimental results show that the proposed text representation-based text categorization system performs better than the Domain Dictionary-based text categorization system. It also performs better than the system based on Bagof-Words when the number of features is small and the training corpus size is small.

Natural Language Processing and Chinese Computing, 2018

ArXiv, 2021

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Computational Intelligence, 2016

Proceedings of the AAAI Conference on Artificial Intelligence, 2019

ACM Transactions on Asian and Low-Resource Language Information Processing, 2016

Proceedings of the CoNLL-16 shared task, 2016

Proceedings of ACL-IJCNLP 2015 System Demonstrations, 2015

Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration - NEWS '09, 2009

International Journal of Computer Processing of Languages, 2005