Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All (original) (raw)
Related papers
AlephBERT: A Hebrew Large Pre-Trained Language Model to Start-off your Hebrew NLP Application With
2021
Choosing an optimal architecture for segmentation and POS-tagging of Modern Hebrew
2005
Noun phrase chunking in hebrew: Influence of lexical and morphological features
2006
Joint Hebrew segmentation and parsing using a PCFG-LA lattice parser
2011
Investigating the effect of sub-word segmentation on the performance of transformer language models
arXiv (Cornell University), 2023
Basic Word Completion and Prediction for Hebrew
Lecture Notes in Computer Science, 2012
HeQ: a Large and Diverse Hebrew Reading Comprehension Benchmark
2022
An Unsupervised Morpheme-Based HMM for Hebrew
Accurate Unlexicalized Parsing for Modern Hebrew
Lecture Notes in Computer Science, 2007
Hebrew computational linguistics: Past and future
Artificial Intelligence Review, 2004
Hebrew Named Entity Recognition
MONEY
Building a tree-bank of modern Hebrew text
… Automatique des Langues, 2001
Experiments with Language Models for Word Completion and Prediction in Hebrew
Lecture Notes in Computer Science, 2014
A Novel Challenge Set for Hebrew Morphological Disambiguation and Diacritics Restoration
Findings of the Association for Computational Linguistics: EMNLP 2020
Overview of the progression of state-of-the-art language models
TELKOMNIKA Telecommunication Computing Electronics and Control, 2024
A morphologically annotated Hebrew CHILDES corpus
A computational lexicon of contemporary Hebrew
… of The fifth international conference on …, 2006
On Losses for Modern Language Models
Stéphane Aroca-Ouellette, Frank Rudzicz
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
An unsupervised morpheme-based HMM for hebrew morphological disambiguation
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL '06, 2006
A Large and Diverse Arabic Corpus for Language Modeling
arXiv (Cornell University), 2022
Smoothing a lexicon-based POS tagger for Arabic and Hebrew
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages Common Issues and Resources - Semitic '07, 2007
Linguistic Variations in Classical Hebrew: from Markov Models to Neural Networks
Yanniek van der Schans, David Ruhe
Toward Better Understanding of Hebrew NP Chunks
2007
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
Word Segmentation, Unknown-word Resolution, and Morphological Agreement in a Hebrew Parsing System
Computational Linguistics, 2013
Automatic Thesaurus Construction for Modern Hebrew
2018
Benchmarking Arabic AI with Large Language Models
arXiv (Cornell University), 2023
Designing CoSIH: The Corpus of Spoken Israeli Hebrew
International Journal of Corpus Linguistics, 2001
SVM model tampering and anchored learning: a case study in Hebrew NP chunking
2007
On the Importance of Tokenization in Arabic Embedding Models
Proceedings of the Fifth Arabic Natural Language Processing Workshop, 2020
A Transformer-based Parser for Syriac Morphology
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, 2023
A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2021