Improving the representation and conversion of mathematical formulae by considering their textual context - PubMed (original) (raw)
Improving the representation and conversion of mathematical formulae by considering their textual context
Moritz Schubotz et al. TUGboat (Provid). 2018 May.
Abstract
Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial for communicating information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.
Figures
Figure 1:
Graphical user interface to support the creation of our gold standard. The interface provides several TEX input fields (left) and a mathematical expression tree rendered by the VMEXT visualization tool (right).
Figure 2:
Overview of the structural tree edit distances (using r = 0, i = d = 1) between the MathML trees generated by the conversion tools and the gold standard MathML trees.
Figure 3:
Time in seconds required by each tool to parse the 305 gold standard LATEX expressions in logarithmic scale.
Figure 4:
Mathematical language processing is the task of mapping textual descriptions to components of mathematical formulae (Part-of-Math tagging).
Similar articles
- Do the Math: Making Mathematics in Wikipedia Computable.
Greiner-Petter A, Schubotz M, Breitinger C, Scharpf P, Aizawa A, Gipp B. Greiner-Petter A, et al. IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4384-4395. doi: 10.1109/TPAMI.2022.3195261. Epub 2023 Mar 7. IEEE Trans Pattern Anal Mach Intell. 2023. PMID: 35914035 - Error correction of semantic mathematical expressions based on bayesian algorithm.
Wang X, Yang F, Liu H, Shi Q. Wang X, et al. Math Biosci Eng. 2022 Mar 25;19(6):5428-5445. doi: 10.3934/mbe.2022255. Math Biosci Eng. 2022. PMID: 35603363 - Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.
Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article. - Path-based knowledge reasoning with textual semantic information for medical knowledge graph completion.
Lan Y, He S, Liu K, Zeng X, Liu S, Zhao J. Lan Y, et al. BMC Med Inform Decis Mak. 2021 Nov 29;21(Suppl 9):335. doi: 10.1186/s12911-021-01622-7. BMC Med Inform Decis Mak. 2021. PMID: 34844576 Free PMC article. Review. - [Estimate of fetal long bone length in early pregnancy: comparison between mathematical formulae].
Rosati P, Guariglia L. Rosati P, et al. Minerva Ginecol. 2000 Jun;52(6):229-33. Minerva Ginecol. 2000. PMID: 11085045 Review. Italian.
Cited by
- Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems.
Greiner-Petter A, Schubotz M, Cohl HS, Gipp B. Greiner-Petter A, et al. ASLIB J Inf Manag. 2019;71(3):10.1108/AJIM-08-2018-0185. doi: 10.1108/AJIM-08-2018-0185. ASLIB J Inf Manag. 2019. PMID: 34603731 Free PMC article.
References
- Aizawa A, Kohlhase M, et al.NTCIR-11 math-2 task overview. In Proc. 11th NTCIR Conf. on Evaluation of Information Access Technologies, Tokyo, Japan, 2014.
- Cajori F. A History of Mathematical Notations, vol. 1. Courier Corporation, 1928.
- Cohl HS, McClain MA, et al.Digital repository of mathematical formulae. In Conference on Intelligent Computer Mathematics (CICM), Coimbra, Portugal, pp. 419–422, 2014. doi:10.1007/978-3-319-08434-3_30 - DOI
- Cohl HS, Schubotz M, et al.Growing the digital repository of mathematical formulae with generic sources. In Kerber M, Carette J, et al., eds., CICM, Washington, DC, USA, vol. 9150, pp. 280–287, 2015. doi:10.1007/978-3-319-20615-8_18 - DOI
LinkOut - more resources
Full Text Sources
Other Literature Sources