Word Embeddings, Analogies, and Machine Learning: Beyond King - Man + Woman = Queen (original) (raw)

Abstract

Solving word analogies became one of the most popular benchmarks for word embeddings on the assumption that linear relations between word pairs (such as king:man :: woman:queen) are indicative of the quality of the embedding. We question this assumption by showing that the information not detected by linear offset may still be recoverable by a more sophisticated search method, and thus is actually encoded in the embedding. The general problem with linear offset is its sensitivity to the idiosyncrasies of individual words. We show that simple averaging over multiple word pairs improves over the state-of-the-art. A further improvement in accuracy (up to 30% for some embeddings and relations) is achieved by combining cosine similarity with an estimation of the extent to which a candidate answer belongs to the correct word class. In addition to this practical contribution, this work highlights the problem of the interaction between word embeddings and analogy retrieval algorithms, and its implications for the evaluation of word embeddings and the use of analogies in extrinsic tasks.

FAQs

sparkles

What are the advantages of LRCos over traditional methods like 3CosAdd?add

LRCos significantly outperforms 3CosAdd, achieving up to 34% higher accuracy on derivational relations. This method effectively mitigates noise sensitivity and adapts better to varied word classes.

How does LRCos improve accuracy on complex linguistic relations?add

LRCos utilizes supervised learning from multiple examples, enhancing performance on difficult derivational and grammatical relations. It achieves average accuracies of 47.7% on BATS while 3CosAdd only reaches 28.1%.

What impact does training set size have on the performance of LRCos?add

Performance for LRCos saturates at around 50 training pairs on average. Increased data beyond this threshold does not significantly improve accuracy for Russian morphological categories.

How do different embeddings affect the performance of LRCos and 3CosAdd?add

Different word embeddings yield varied performances with LRCos and 3CosAdd; for instance, GloVe shows only a modest improvement with LRCos. SVD-based models gain a significant boost of over 15% from LRCos.

What limitations does LRCos exhibit in analogy detection?add

LRCos struggles with lexicographic relations, where it still underperforms compared to established methods. Further optimization strategies are required to enhance accuracy in these more complex scenarios.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (30)

Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43(3):209- 226.
Marco Baroni, Georgiana Dinu, and Germn Kruszewski. 2014. Dont count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, volume 1, pages 238-247.
Vladimr Benko and V.P. Zakharov. 2016. Very large Russian corpora: New opportunities and new challenges. In Kompjuternaja Lingvistika I Intellektuanyje Technologii: Po Materialam Medunarodnoj konferencii "Dialog" (2016), volume 15(22), pages 79-93. Moskva: Rossijskij gosudarstvennyj gumanitarnyj universitet.
Vladimír Benko. 2014. Aranea: Yet another family of (comparable) web corpora. In Petr Sojka, Aleš Horák, Ivan Kopeček, and Karel Pala, editors, Text, speech, and dialogue: 17th international conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings, LNCS 8655, pages 257-264. Springer.
Yuanyuan Cai, Wei Lu, Xiaoping Che, and Kailun Shi. 2015. Differential Evolutionary Algorithm Based on Multiple Vector Metrics for Semantic Similarity Assessment in Continuous Vector Space. In Proceedings of DMS 2015, pages 241-249. [doi:10.18293/DMS2015-001].
John Caron. 2001. Computational information retrieval. chapter Experiments with LSA Scoring: Optimal Rank and Basis, pages 157-169. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22-29, Mar.
Trevor Cohen, Dominic Widdows, and Thomas Rindflesch. 2015. Expansion-by-analogy: A vector symbolic approach to semantic search. In Quantum Interaction, pages 54-66. Springer.
Qing Cui, Bin Gao, Jiang Bian, Siyu Qiu, and Tie-Yan Liu. 2014. Learning effective word embedding using morphological word similarity. arXiv preprint arXiv:1407.1687.
Aleksandr Drozd, Anna Gladkova, and Satoshi Matsuoka. 2015. Python, performance, and natural language processing. In Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing, PyHPC '15, pages 1:1-1:10, New York, NY, USA. ACM.
Katrin Erk. 2012. Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6(10):635-653.
Stefano Federici, Simonetta Montemagni, and Vito Pirrelli. 1997. Inferring semantic similarity from distributional evidence: an analogy-based approach to word sense disambiguation. In Proceedings of the ACL/EACL Work- shop on Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, pages 90-97.
Daniel Fried and Kevin Duh. 2014. Incorporating both distributional and relational semantics in word representa- tions. arXiv preprint arXiv:1412.4369.
Justin Garten, Kenji Sagae, Volkan Ustun, and Morteza Dehghani. 2015. Combining distributed vector represen- tations for words. In Proceedings of NAACL-HLT 2015, pages 95-101.
Anna Gladkova, Aleksandr Drozd, and Satoshi Matsuoka. 2016. Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn't. In Proceedings of NAACL-HLT 2016, pages 47-54. Association for Computational Linguistics.
Gene H. Golub and Charles F. Van Loan. 1996. Matrix Computations (3rd Ed.). Johns Hopkins University Press, Baltimore, MD, USA.
Maximilian Köper, Christian Scheible, and Sabine Schulte im Walde. 2015. Multilingual reliability and semantic structure of continuous word spaces. In Proceedings of the 11th International Conference on Computational Semantics 2015, pages 40-45. Association for Computational Linguistics.
Siwei Lai, Kang Liu, Liheng Xu, and Jun Zhao. 2015. How to generate a good word embedding? arXiv preprint arXiv:1507.05523.
Jean-Franois Lavallée and Philippe Langlais. 2010. Unsupervised morphological analysis by formal analogy. In Multilingual Information Access Evaluation I. Text Retrieval Experiments, pages 617-624. Springer.
Rmi Lebret and Ronan Collobert. 2015. Rehabilitation of count-based models for word vector representations. In Computational Linguistics and Intelligent Text Processing, pages 417-429. Springer.
Yves Lepage and Chooi-ling Goh. 2009. Towards automatic acquisition of linguistic features. In Proceedings of the 17th Nordic Conference on Computational Linguistics (NODALIDA 2009), eds., Kristiina Jokinen and Eckard Bick, pages 118-125.
Omer Levy and Yoav Goldberg. 2014. Linguistic regularities in sparse and explicit word representations. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning, pages 171-180, Ann Arbor, Michigan, June. Association for Computational Linguistics.
Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Improving distributional similarity with lessons learned from word embeddings. In Transactions of the Association for Computational Linguistics, volume 3, pages 211-225.
Tal Linzen. 2016. Issues in evaluating semantic spaces using word analogies. In Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP, pages 13-18. Association for Computational Linguistics.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. Proceedings of International Conference on Learning Representations (ICLR).
Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. 2013b. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013). Association for Computa- tional Linguistics.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: global vectors for word represen- tation. In Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014), volume 12, pages 1532-1543.
Peter D. Turney. 2008. A uniform approach to analogies, synonyms, antonyms, and associations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 905-912.
Ekaterina Vylomova, Laura Rimmel, Trevor Cohn, and Timothy Baldwin. 2016. Take and took, gaggle and goose, book and read: evaluating the utility of vector differences for lexical relation learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1671-1682. Association for Computational Linguistics.
Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. 2015. Category enhanced word embedding. arXiv preprint arXiv:1511.08629.