Leveraging Wikipedia knowledge to classify multilingual biomedical documents - PubMed (original) (raw)
Comparative Study
Leveraging Wikipedia knowledge to classify multilingual biomedical documents
Marcos Antonio Mouriño García et al. Artif Intell Med. 2018 Jun.
Abstract
This article presents a classifier that leverages Wikipedia knowledge to represent documents as vectors of concepts weights, and analyses its suitability for classifying biomedical documents written in any language when it is trained only with English documents. We propose the cross-language concept matching technique, which relies on Wikipedia interlanguage links to convert concept vectors between languages. The performance of the classifier is compared to a classifier based on machine translation, and two classifiers based on MetaMap. To perform the experiments, we created two multilingual corpus. The first one, Multi-Lingual UVigoMED (ML-UVigoMED) is composed of 23,647 Wikipedia documents about biomedical topics written in English, German, French, Spanish, Italian, Galician, Romanian, and Icelandic. The second one, English-French-Spanish-German UVigoMED (EFSG-UVigoMED) is composed of 19,210 biomedical abstract extracted from MEDLINE written in English, French, Spanish, and German. The performance of the approach proposed is superior to any of the state-of-the art classifier in the benchmark. We conclude that leveraging Wikipedia knowledge is of great advantage in tasks of multilingual classification of biomedical documents.
Keywords: Biomedical document classification; Hybrid word-concept document representation; Multilingual text classification; Wikipedia Miner semantic annotator; Wikipedia-based bag of concepts document representation.
Copyright © 2018 Elsevier B.V. All rights reserved.
Similar articles
- A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge*. Spanish-English Cross-language Case Study.
Mouriño-García MA, Pérez-Rodríguez R, Anido-Rifón LE. Mouriño-García MA, et al. Methods Inf Med. 2017 Oct 26;56(5):370-376. doi: 10.3414/ME17-01-0028. Epub 2017 Aug 16. Methods Inf Med. 2017. PMID: 28816337 - Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach.
Mouriño García MA, Pérez Rodríguez R, Anido Rifón LE. Mouriño García MA, et al. PeerJ. 2015 Sep 29;3:e1279. doi: 10.7717/peerj.1279. eCollection 2015. PeerJ. 2015. PMID: 26468436 Free PMC article. - A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.
Kors JA, Clematide S, Akhondi SA, van Mulligen EM, Rebholz-Schuhmann D. Kors JA, et al. J Am Med Inform Assoc. 2015 Sep;22(5):948-56. doi: 10.1093/jamia/ocv037. Epub 2015 May 6. J Am Med Inform Assoc. 2015. PMID: 25948699 Free PMC article. - Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.
Chen Z, He Z, Liu X, Bian J. Chen Z, et al. BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):65. doi: 10.1186/s12911-018-0630-x. BMC Med Inform Decis Mak. 2018. PMID: 30066651 Free PMC article. - Scientific collaboration networks using biomedical text.
Jonnalagadda SR, Topham PS, Silverman EJ, Peeler RG. Jonnalagadda SR, et al. Methods Mol Biol. 2014;1159:147-57. doi: 10.1007/978-1-4939-0709-0_9. Methods Mol Biol. 2014. PMID: 24788266 Review.
Cited by
- Artificial intelligence-Developments in medicine in the last two years.
Galimova RM, Buzaev IV, Ramilevich KA, Yuldybaev LK, Shaykhulova AF. Galimova RM, et al. Chronic Dis Transl Med. 2019 Jan 9;5(1):64-68. doi: 10.1016/j.cdtm.2018.11.004. eCollection 2019 Mar. Chronic Dis Transl Med. 2019. PMID: 30993265 Free PMC article. No abstract available. - AI-driven streamlined modeling: experiences and lessons learned from multiple domains.
Sunkle S, Saxena K, Patil A, Kulkarni V. Sunkle S, et al. Softw Syst Model. 2022;21(3):1-23. doi: 10.1007/s10270-022-00982-6. Epub 2022 Feb 19. Softw Syst Model. 2022. PMID: 35221860 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources