A General and Multi-lingual Phrase Chunking Model Based on Masking Method (original) (raw)

Abstract

Several phrase chunkers have been proposed over the past few years. Some state-of-the-art chunkers achieved better performance via integrating external resources, e.g., parsers and additional training data, or combining multiple learners. However, in many languages and domains, such external materials are not easily available and the combination of multiple learners will increase the cost of training and testing. In this paper, we propose a mask method to improve the chunking accuracy. The experimental results show that our chunker achieves better performance in comparison with other deep parsers and chunkers. For CoNLL-2000 data set, our system achieves 94.12 in F rate. For the base-chunking task, our system reaches 92.95 in F rate. When porting to Chinese, the performance of the base-chunking task is 92.36 in F rate. Also, our chunker is quite efficient. The complete chunking time of a 50K words document is about 50 seconds.

Preview

Unable to display preview. Download preview PDF.

References

Abney, S.: Parsing by chunks. Principle-Based Parsing. Computation and Psycholinguistics, 257–278 (1991)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Google Scholar
Carreras, X., Marquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (2003)
Google Scholar
Carreras, X., Marquez, L.: Introduction to the CoNLL-2004 shared task: semantic role labeling. In: Proceedings of Conference on Natural Language Learning, pp. 89–97 (2004)
Google Scholar
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the ANLP-NAACL, pp. 132–139 (2000)
Google Scholar
Collins, M.: Head-driven statistical models for natural language processing. Ph.D. thesis. University of Pennsylvania (1998)
Google Scholar
Giménez, J., Márquez, L.: Fast and accurate Part-of-Speech tagging: the SVM approach revisited. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, pp. 158–165 (2003)
Google Scholar
Joachims, T.: A statistical learning model of text classification with support vector machines. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 128–136 (2001)
Google Scholar
Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the 2nd Meetings of the North American Chapter and the Association for the Computational Linguistics (2001)
Google Scholar
Li, H., Huang, C.N., Gao, J., Fan, X.: Chinese chunking with another type of spec. In: The Third SIGHAN Workshop on Chinese Language Processing (2004)
Google Scholar
Li, H., Webster, J.J., Kit, C., Yao, T.: Transductive HMM based chinese text chunking. In: International Conference on Natural Language Processing and Knowledge Engineering, pp. 257–262 (2003)
Google Scholar
Li, S.: Chunking based on maximum entropy. Chinese Journal of Computer 25(12), 1734–1738 (2003)
Google Scholar
Li, X., Roth, D.: Exploring evidence for shallow parsing. In: Proceedings of Conference on Natural Language Learning, pp. 127–132 (2001)
Google Scholar
Molina, A., Pla, F.: Shallow parsing using specialized HMMs. Journal of Machine Learning Research 2, 595–613 (2002)
Article MATH Google Scholar
Park, S.B., Zhang, B.T.: Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information. Journal of Information Processing and Management 40, 421–439 (2004)
Article Google Scholar
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the 3rd Workshop on Very Large Corpora, pp. 82–94 (1995)
Google Scholar
Tjong Kim Sang, E.F.: Transforming a chunker to a parser. Computational Linguistics in the Netherlands, 177–188 (2000)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of Conference on Natural Language Learning, pp. 127–132 (2000)
Google Scholar
Tjong Kim Sang, E.F.: Memory-based shallow parsing. Journal of Machine Learning Research, 559–594 (2002)
Google Scholar
Zhang, T., Damerau, F., Johnson, D.: Text Chunking based on a Generalization Winnow. Journal of Machine Learning Research 2, 615–637 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Central University, No.300, Jhong-Da Rd., Jhongli City, Taoyuan County, 32001, Taiwan,R.O.C.
Yu-Chieh Wu & Chia-Hui Chang
Department of Computer Science and Information Engineering, Ming Chuan University, No.5, De-Ming Rd, Gweishan District, Taoyuan, 333, Taiwan, R.O.C.
Yue-Shi Lee

Authors

Yu-Chieh Wu
Chia-Hui Chang
Yue-Shi Lee

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Copyright information

About this paper

Cite this paper

Wu, YC., Chang, CH., Lee, YS. (2006). A General and Multi-lingual Phrase Chunking Model Based on Masking Method. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299\_17

Download citation

.RIS
.ENW
.BIB
DOI: https://doi.org/10.1007/11671299\_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer Science Computer Science (R0)Springer Nature Proceedings Computer Science

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.