A General and Multi-lingual Phrase Chunking Model Based on Masking Method (original) (raw)

Abstract

Several phrase chunkers have been proposed over the past few years. Some state-of-the-art chunkers achieved better performance via integrating external resources, e.g., parsers and additional training data, or combining multiple learners. However, in many languages and domains, such external materials are not easily available and the combination of multiple learners will increase the cost of training and testing. In this paper, we propose a mask method to improve the chunking accuracy. The experimental results show that our chunker achieves better performance in comparison with other deep parsers and chunkers. For CoNLL-2000 data set, our system achieves 94.12 in F rate. For the base-chunking task, our system reaches 92.95 in F rate. When porting to Chinese, the performance of the base-chunking task is 92.36 in F rate. Also, our chunker is quite efficient. The complete chunking time of a 50K words document is about 50 seconds.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abney, S.: Parsing by chunks. Principle-Based Parsing. Computation and Psycholinguistics, 257–278 (1991)
    Google Scholar
  2. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)
    Google Scholar
  3. Carreras, X., Marquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (2003)
    Google Scholar
  4. Carreras, X., Marquez, L.: Introduction to the CoNLL-2004 shared task: semantic role labeling. In: Proceedings of Conference on Natural Language Learning, pp. 89–97 (2004)
    Google Scholar
  5. Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the ANLP-NAACL, pp. 132–139 (2000)
    Google Scholar
  6. Collins, M.: Head-driven statistical models for natural language processing. Ph.D. thesis. University of Pennsylvania (1998)
    Google Scholar
  7. Giménez, J., Márquez, L.: Fast and accurate Part-of-Speech tagging: the SVM approach revisited. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, pp. 158–165 (2003)
    Google Scholar
  8. Joachims, T.: A statistical learning model of text classification with support vector machines. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 128–136 (2001)
    Google Scholar
  9. Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the 2nd Meetings of the North American Chapter and the Association for the Computational Linguistics (2001)
    Google Scholar
  10. Li, H., Huang, C.N., Gao, J., Fan, X.: Chinese chunking with another type of spec. In: The Third SIGHAN Workshop on Chinese Language Processing (2004)
    Google Scholar
  11. Li, H., Webster, J.J., Kit, C., Yao, T.: Transductive HMM based chinese text chunking. In: International Conference on Natural Language Processing and Knowledge Engineering, pp. 257–262 (2003)
    Google Scholar
  12. Li, S.: Chunking based on maximum entropy. Chinese Journal of Computer 25(12), 1734–1738 (2003)
    Google Scholar
  13. Li, X., Roth, D.: Exploring evidence for shallow parsing. In: Proceedings of Conference on Natural Language Learning, pp. 127–132 (2001)
    Google Scholar
  14. Molina, A., Pla, F.: Shallow parsing using specialized HMMs. Journal of Machine Learning Research 2, 595–613 (2002)
    Article MATH Google Scholar
  15. Park, S.B., Zhang, B.T.: Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information. Journal of Information Processing and Management 40, 421–439 (2004)
    Article Google Scholar
  16. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the 3rd Workshop on Very Large Corpora, pp. 82–94 (1995)
    Google Scholar
  17. Tjong Kim Sang, E.F.: Transforming a chunker to a parser. Computational Linguistics in the Netherlands, 177–188 (2000)
    Google Scholar
  18. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of Conference on Natural Language Learning, pp. 127–132 (2000)
    Google Scholar
  19. Tjong Kim Sang, E.F.: Memory-based shallow parsing. Journal of Machine Learning Research, 559–594 (2002)
    Google Scholar
  20. Zhang, T., Damerau, F., Johnson, D.: Text Chunking based on a Generalization Winnow. Journal of Machine Learning Research 2, 615–637 (2002)
    Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science and Information Engineering, National Central University, No.300, Jhong-Da Rd., Jhongli City, Taoyuan County, 32001, Taiwan,R.O.C.
    Yu-Chieh Wu & Chia-Hui Chang
  2. Department of Computer Science and Information Engineering, Ming Chuan University, No.5, De-Ming Rd, Gweishan District, Taoyuan, 333, Taiwan, R.O.C.
    Yue-Shi Lee

Authors

  1. Yu-Chieh Wu
  2. Chia-Hui Chang
  3. Yue-Shi Lee

Editor information

Editors and Affiliations

  1. National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
    Alexander Gelbukh

Rights and permissions

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, YC., Chang, CH., Lee, YS. (2006). A General and Multi-lingual Phrase Chunking Model Based on Masking Method. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299\_17

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us