Zhanming Jie - Academia.edu (original) (raw)

Papers by Zhanming Jie

Proceedings of the AAAI Conference on Artificial Intelligence

Named entity recognition (NER), which focuses on the extraction of semantically meaningful named ... more Named entity recognition (NER), which focuses on the extraction of semantically meaningful named entities and their semantic classes from text, serves as an indispensable component for several down-stream natural language processing (NLP) tasks such as relation extraction and event extraction. Dependency trees, on the other hand, also convey crucial semantic-level information. It has been shown previously that such information can be used to improve the performance of NER. In this work, we investigate on how to better utilize the structured information conveyed by dependency trees to improve the performance of NER. Specifically, unlike existing approaches which only exploit dependency information for designing local features, we show that certain global structured information of the dependency trees can be exploited when building NER models where such information can provide guided learning and inference. Through extensive experiments, we show that our proposed novel dependency-guid...

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Solving math word problems requires deductive reasoning over the quantities in the text. Various ... more Solving math word problems requires deductive reasoning over the quantities in the text. Various recent research efforts mostly relied on sequence-to-sequence or sequence-to-tree models to generate mathematical expressions without explicitly performing relational reasoning between quantities in the given context. While empirically effective, such approaches typically do not provide explanations for the generated expressions. In this work, we view the task as a complex relation extraction problem, proposing a novel approach that presents explainable deductive reasoning steps to iteratively construct target expressions, where each step involves a primitive operation over two quantities defining their relation. Through extensive experiments on four benchmark datasets, we show that the proposed model significantly outperforms existing strong baselines. We further demonstrate that the deductive procedure not only presents more explainable steps but also enables us to make more accurate predictions on questions that require more complex reasoning.

arXiv (Cornell University), Apr 12, 2021

It has been shown that named entity recognition (NER) could benefit from incorporating the long-d... more It has been shown that named entity recognition (NER) could benefit from incorporating the long-distance structured information captured by dependency trees. We believe this is because both types of features-the contextual information captured by the linear sequences and the structured information captured by the dependency trees may complement each other. However, existing approaches largely focused on stacking the LSTM and graph neural networks such as graph convolutional networks (GCNs) for building improved NER models, where the exact interaction mechanism between the two different types of features is not very clear, and the performance gain does not appear to be significant. In this work, we propose a simple and robust solution to incorporate both types of features with our Synergized-LSTM (Syn-LSTM), which clearly captures how the two types of features interact. We conduct extensive experiments on several standard datasets across four languages. The results demonstrate that the proposed model achieves better performance than previous approaches while requiring fewer parameters. Our further analysis demonstrates that our model can capture longer dependencies compared with strong baselines. 1

We consider multilingual semantic parsing-the task of simultaneously parsing semantically equival... more We consider multilingual semantic parsing-the task of simultaneously parsing semantically equivalent sentences from multiple different languages into their corresponding formal semantic representations. Our model is built on top of the hybrid tree semantic parsing framework, where natural language sentences and their corresponding semantics are assumed to be generated jointly from an underlying generative process. We first introduce a variant of the joint generative process, which essentially gives us a new semantic parsing model within the framework. Based on the different models that can be developed within the framework, we then investigate several approaches for performing the multilingual semantic parsing task. We present our evaluations on a standard dataset annotated with sentences in multiple languages coming from different language families.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Dependency parse trees are helpful for discovering the opinion words in aspect-based sentiment an... more Dependency parse trees are helpful for discovering the opinion words in aspect-based sentiment analysis (ABSA) (Huang and Carley, 2019). However, the trees obtained from offthe-shelf dependency parsers are static, and could be sub-optimal in ABSA. This is because the syntactic trees are not designed for capturing the interactions between opinion words and aspect words. In this work, we aim to shorten the distance between aspects and corresponding opinion words by learning an aspect-centric tree structure. The aspect and opinion words are expected to be closer along such tree structure compared to the standard dependency parse tree. The learning process allows the tree structure to adaptively correlate the aspect and opinion words, enabling us to better identify the polarity in the ABSA task. We conduct experiments on five aspectbased sentiment datasets, and the proposed model significantly outperforms recent strong baselines. Furthermore, our thorough analysis demonstrates the average distance between aspect and opinion words are shortened by at least 19% on the standard SemEval Restau-rant14 (Pontiki et al., 2014) dataset. * Work done when visiting SUTD. † Corresponding author. Accepted as a long paper in the main conference of EMNLP 2021 (Conference on Empirical Methods in Natural Language Processing). Loving the harry potter movie marathon... Loving the harry potter movie marathon... root root (b) Learned aspect-centric Tree (Ours) (a) Dependency parse tree from spaCy. Dist(Loving, harry) = 4, Dist(Loving, potter) = 3 Dist(Loving, harry) = 1, Dist(Loving, potter) = 2

ArXiv, 2020

Existing works on KG-to-text generation take as input a few RDF triples or key-value pairs convey... more Existing works on KG-to-text generation take as input a few RDF triples or key-value pairs conveying the knowledge of some entities to generate a natural language description. Existing datasets, such as WikiBIO, WebNLG, and E2E, basically have a good alignment between an input triple/pair set and its output text. However in practice, the input knowledge could be more than enough, because the output description may only want to cover the most significant knowledge. In this paper, we introduce a large-scale and challenging dataset to facilitate the study of such practical scenario in KG-to-text. Our dataset involves exploring large knowledge graphs (KG) to retrieve abundant knowledge of various types of main entities, which makes the current graph-to-sequence models severely suffered from the problems of information loss and parameter explosion while generating the description text. We address these challenges by proposing a multi-graph structure that is able to represent the original...

We present the industry dataset information and experimental details of the main paper (Jie et al... more We present the industry dataset information and experimental details of the main paper (Jie et al., 2019) in this supplementary material.

Natural Language Processing and Chinese Computing, 2020

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Previous works on knowledge-to-text generation take as input a few RDF triples or keyvalue pairs ... more Previous works on knowledge-to-text generation take as input a few RDF triples or keyvalue pairs conveying the knowledge of some entities to generate a natural language description. Existing datasets, such as WIKIBIO, WebNLG, and E2E, basically have a good alignment between an input triple/pair set and its output text. However, in practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge. In this paper, we introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text. Our dataset involves retrieving abundant knowledge of various types of main entities from a large knowledge graph (KG), which makes the current graph-to-sequence models severely suffer from the problems of information loss and parameter explosion while generating the descriptions. We address these challenges by proposing a multi-graph structure that is able to represent the original graph information more comprehensively. Furthermore, we also incorporate aggregation methods that learn to extract the rich graph information. Extensive experiments demonstrate the effectiveness of our model architecture. 1 * Liying Cheng is under the Joint Ph.D. Program between Alibaba and Singapore University of Technology and Design. † Dekun Wu was a visiting student at SUTD. Yan Zhang and Zhanming Jie were interns at Alibaba.

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Dependency tree structures capture longdistance and syntactic relationships between words in a se... more Dependency tree structures capture longdistance and syntactic relationships between words in a sentence. The syntactic relations (e.g., nominal subject, object) can potentially infer the existence of certain named entities. In addition, the performance of a named entity recognizer could benefit from the longdistance dependencies between the words in dependency trees. In this work, we propose a simple yet effective dependency-guided LSTM-CRF model to encode the complete dependency trees and capture the above properties for the task of named entity recognition (NER). The data statistics show strong correlations between the entity types and dependency relations. We conduct extensive experiments on several standard datasets and demonstrate the effectiveness of the proposed model in improving NER and achieving state-of-theart performance. Our analysis reveals that the significant improvements mainly result from the dependency relations and long-distance interactions provided by dependency trees.

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

We propose a novel dependency-based hybrid tree model for semantic parsing, which converts natura... more We propose a novel dependency-based hybrid tree model for semantic parsing, which converts natural language utterance into machine interpretable meaning representations. Unlike previous state-of-the-art models, the semantic information is interpreted as the latent dependency between the natural language words in our joint representation. Such dependency information can capture the interactions between the semantics and natural language words. We integrate a neural component into our model and propose an efficient dynamicprogramming algorithm to perform tractable inference. Through extensive experiments on the standard multilingual GeoQuery dataset with eight languages, we demonstrate that our proposed approach is able to achieve state-ofthe-art performance across several languages. Analysis also justifies the effectiveness of using our new dependency-based representation. 1

Physical Review E, 2018

The method of choice to study one-dimensional strongly interacting many body quantum systems is b... more The method of choice to study one-dimensional strongly interacting many body quantum systems is based on matrix product states and operators. Such method allows to explore the most relevant, and numerically manageable, portion of an exponentially large space. It also allows to describe accurately correlations between distant parts of a system, an important ingredient to account for the context in machine learning tasks. Here we introduce a machine learning model in which matrix product operators are trained to implement sequence to sequence prediction, i.e. given a sequence at a time step, it allows one to predict the next sequence. We then apply our algorithm to cellular automata (for which we show exact analytical solutions in terms of matrix product operators), and to nonlinear coupled maps. We show advantages of the proposed algorithm when compared to conditional random fields and bidirectional long short-term memory neural network. To highlight the flexibility of the algorithm, we also show that it can readily perform classification tasks.

IEEE Transactions on Multimedia, 2015

Billions of user shared images are generated by individuals in many social networks today, and th... more Billions of user shared images are generated by individuals in many social networks today, and this particular form of user data is widely accessible to others due to the nature of online social sharing. When user social graphs are only accessible to exclusive parties, these user shared images are proved to be an easier and effective alternative to discover user connections. This work investigated over 360,000 user shared images from two social networks, Skyrock and 163 Weibo, in which 3 million follower/followee relationships are involved. It is observed that the shared images from users with a follower/followee relationship show relatively higher similarities. A multimedia big data system that utilizes this observed phenomenon is proposed as an alternative to user generated tags and social graphs for follower/followee recommendation and gender identification. To the best of our knowledge, this is the first attempt in this field to prove and formulate such a phenomenon for mass user shared images along with more practical prediction methods. These findings are useful for information or services recommendations in any social network with intensive image sharing, as well as for other interesting personalization applications, particularly when there is no access to those exclusive user social graphs.

2015 IEEE Fourth Symposium on Network Cloud Computing and Applications (NCCA), 2015

Recently, Bag-of-Features Tagging is proven to be an alternative to discover user connections fro... more Recently, Bag-of-Features Tagging is proven to be an alternative to discover user connections from user shared images in social networks. This approach used unsupervised clustering to classify the user shared images and then correlate similar user, which is computationally intensive for real-world applications. This paper introduces a cloud-assisted framework to improve the efficiency and scalability of Bag-of-Features Tagging. The framework distributes the computation of the unsupervised clustering, the profile learning process and also the similarity calculation. The experiment proves how a scalable cloud-assisted framework outperforms a stand-alone machine with different parameters on a real social network dataset, Skyrock.

2015 IEEE First International Conference on Big Data Computing Service and Applications, 2015

Predicting the virality of contents is attractive for many applications in today's big data era. ... more Predicting the virality of contents is attractive for many applications in today's big data era. Previous works mostly focus on final popularity, but predicting the time at which content gets popular (virality timing), is essential for applications such as viral marketing. This work proposes a community-aware iterative algorithm to predict virality timing of contents in social media using big data of user dynamics in social cascades and community structure in social networks. From the continuously generated big data, the algorithm uses the increasing amount of data to make self-corrections on the virality timing prediction and improve its prediction. Experimental results on viral stories from a social network, Digg, prove that the proposed algorithm is able to predict viralty timing effectively, with the prediction error bounded within 30% with 20% of data.

Proceedings of the 2019 Conference of the North, 2019

Supervised approaches to named entity recognition (NER) are largely developed based on the assump... more Supervised approaches to named entity recognition (NER) are largely developed based on the assumption that the training data is fully annotated with named entity information. However, in practice, annotated data can often be imperfect with one typical issue being the training data may contain incomplete annotations. We highlight several pitfalls associated with learning under such a setup in the context of NER and identify limitations associated with existing approaches, proposing a novel yet easy-to-implement approach for recognizing named entities with incomplete data annotations. We demonstrate the effectiveness of our approach through extensive experiments. 1

2015 IEEE First International Conference on Big Data Computing Service and Applications, 2015

Proceedings of the AAAI Conference on Artificial Intelligence

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

arXiv (Cornell University), Apr 12, 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

ArXiv, 2020

Natural Language Processing and Chinese Computing, 2020

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

Physical Review E, 2018

IEEE Transactions on Multimedia, 2015

2015 IEEE Fourth Symposium on Network Cloud Computing and Applications (NCCA), 2015

2015 IEEE First International Conference on Big Data Computing Service and Applications, 2015

Proceedings of the 2019 Conference of the North, 2019

2015 IEEE First International Conference on Big Data Computing Service and Applications, 2015