Bridging the Gap for Tokenizer-Free Language Models (original) (raw)

Character-Level Language Modeling with Deeper Self-Attention

Mandy Guo

Proceedings of the AAAI Conference on Artificial Intelligence, 2019

View PDFchevron_right

Improving language models by retrieving from trillions of tokens

Roman Ring

Cornell University - arXiv, 2021

View PDFchevron_right

BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks

Julien Kloetzer

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

View PDFchevron_right

Overview of the progression of state-of-the-art language models

TELKOMNIKA JOURNAL

TELKOMNIKA Telecommunication Computing Electronics and Control, 2024

View PDFchevron_right

Language Modeling with Deep Transformers

Albert Zeyer

Interspeech 2019

View PDFchevron_right

COMPARATIVE ANALYSIS OF TRANSFORMER BASED LANGUAGE MODELS

Computer Science & Information Technology (CS & IT) Computer Science Conference Proceedings (CSCP)

View PDFchevron_right

Improving N-gram Language Models with Pre-trained Deep Transformer

Yutong Pang

2019

View PDFchevron_right

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Mohammad Hossein Sekhavat

arXiv (Cornell University), 2024

View PDFchevron_right

Large Margin Neural Language Model

Jiaji Huang

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View PDFchevron_right

Enhancing recurrent neural network-based language models by word tokenization

shahenda sarhan

Human-centric Computing and Information Sciences, 2018

View PDFchevron_right

Language-Independent Text Tokenization Using Unsupervised Deep Learning

Aladdin Hafez

Intelligent Automation & Soft Computing

View PDFchevron_right

Scaling recurrent neural network language models

Tony Robinson

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015

View PDFchevron_right

On Losses for Modern Language Models

Stéphane Aroca-Ouellette, Frank Rudzicz

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

View PDFchevron_right

Pre-training Polish Transformer-Based Language Models at Scale

Rafał Poświata

Artificial Intelligence and Soft Computing, 2020

View PDFchevron_right

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Raul Puri

ArXiv, 2019

View PDFchevron_right

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Elton Zhang

2022

View PDFchevron_right

End-to-End Transformer-Based Models in Textual-Based NLP

Abir Rahali

AI

View PDFchevron_right

AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models

Sanchari Sen

arXiv (Cornell University), 2020

View PDFchevron_right

Improving the training and evaluation efficiency of recurrent neural network language models

Xunying Liu, Mark Gales

View PDFchevron_right

A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning

Evans Kotei 19PHD0280

Information

View PDFchevron_right

Overview of the Transformer-based Models for NLP Tasks

anthony gillioz

Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, 2020

View PDFchevron_right

Survey of Neural Text Representation Models

Ana Meštrović

Information, 2020

View PDFchevron_right

A Tensorized Transformer for Language Modeling

xindian ma

ArXiv, 2019

View PDFchevron_right

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Noa Nabeshima

ArXiv, 2021

View PDFchevron_right

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Kenton Murray

Proceedings of the 3rd Workshop on Neural Generation and Translation

View PDFchevron_right

One billion word benchmark for measuring progress in statistical language modeling

Tony Robinson

View PDFchevron_right

Sequence-to-Sequence Lexical Normalization with Multilingual Transformers

Liviu P. Dinu

Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

View PDFchevron_right

Word-Phrase-Entity Recurrent Neural Networks for Language Modeling

Sarangarajan Parthasarathy

Interspeech 2016, 2016

View PDFchevron_right

Long-span language modeling for speech recognition

Sarangarajan Parthasarathy

arXiv: Computation and Language, 2019

View PDFchevron_right

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

susannah young

ArXiv, 2021

View PDFchevron_right

DeepNorm-A Deep Learning Approach to Text Normalization

Shaurya Rohatgi

ArXiv, 2017

View PDFchevron_right

AttViz: Online exploration of self-attention for transparent neural language modeling

Senja Pollak

ArXiv, 2020

View PDFchevron_right