ConvBERT: Improving BERT with Span-based Dynamic Convolution (original) (raw)

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

Melison Dylan

View PDFchevron_right

LNLF-BERT: Transformer for Long Document Classification with Multiple Attention Levels

Linh Manh Pham

IEEE Access, 2024

View PDFchevron_right

SesameBERT: Attention for Anywhere

Hsiang Chih Cheng

2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020

View PDFchevron_right

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Sench Galiedon

View PDFchevron_right

Question Answering Using Hierarchical Attention on Top of BERT Features

Reham Osama

Proceedings of the 2nd Workshop on Machine Reading for Question Answering

View PDFchevron_right

RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling

herman sugiharto

arXiv (Cornell University), 2023

View PDFchevron_right

DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

Vladimir Araujo

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

View PDFchevron_right

Sequential Attention Module for Natural Language Processing

lianxin jiang

ArXiv, 2021

View PDFchevron_right

Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet

Victor Makarenkov

2020

View PDFchevron_right

On the Prunability of Attention Heads in Multilingual BERT

Madhura Pande

ArXiv, 2021

View PDFchevron_right

ThisIsCompetition at SemEval-2019 Task 9: BERT is unstable for out-of-domain samples

Changki Lee

Proceedings of the 13th International Workshop on Semantic Evaluation

View PDFchevron_right

Do Attention Heads in BERT Track Syntactic Dependencies?

Shikha Bordia

ArXiv, 2019

View PDFchevron_right

Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling

Tianyi Zhou

arXiv (Cornell University), 2018

View PDFchevron_right

Character-Level Language Modeling with Deeper Self-Attention

Mandy Guo

Proceedings of the AAAI Conference on Artificial Intelligence, 2019

View PDFchevron_right

Attention Is All You Need

Brittney Shi, Illia Polosukhin

View PDFchevron_right

BERT Probe: A python package for probing attention based robustness evaluation of BERT models

Mahnoor Shahid

Software Impacts

View PDFchevron_right

TiltedBERT: Resource Adjustable Version of BERT

Mohammad Sharifkhani

2022

View PDFchevron_right

SuperShaper: Task-Agnostic Super Pre-training of BERT Models with Variable Hidden Dimensions

Vinod Ganesan

ArXiv, 2021

View PDFchevron_right

Scalable Attentive Sentence Pair Modeling via Distilled Sentence Embedding

itzik malkiel

Proceedings of the AAAI Conference on Artificial Intelligence, 2020

View PDFchevron_right

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Tianyi Zhou

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

View PDFchevron_right

Span Selection Pre-training for Question Answering

Alfio Gliozzo

arXiv (Cornell University), 2019

View PDFchevron_right

Question Answering with Self-Attention

George Sarmonikas

2020

View PDFchevron_right

Span-Based Neural Buffer: Towards Efficient and Effective Utilization of Long-Distance Context for Neural Sequence Models

Kaisheng Yao

Proceedings of the AAAI Conference on Artificial Intelligence

View PDFchevron_right

BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks

Julien Kloetzer

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

View PDFchevron_right

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Naman Goyal

arXiv (Cornell University), 2019

View PDFchevron_right

The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT

Madhura Pande

2021

View PDFchevron_right

Improving the BERT model for long text sequences in question answering domain

Mareeswari Venkatachala, IJAAS Journal

International Journal of Advances in Applied Sciences (IJAAS), 2023

View PDFchevron_right

GiBERT: Enhancing BERT with Linguistic Information using a Lightweight Gated Injection Method

Maria Liakata

Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

View PDFchevron_right

HUBERT Untangles BERT to Improve Transfer across NLP Tasks

Paul Smolensky

ArXiv, 2019

View PDFchevron_right

AttViz: Online exploration of self-attention for transparent neural language modeling

Senja Pollak

ArXiv, 2020

View PDFchevron_right

Long-span language modeling for speech recognition

Sarangarajan Parthasarathy

arXiv: Computation and Language, 2019

View PDFchevron_right

ABC: Attention with Bounded-memory Control

Noah Smith

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View PDFchevron_right

Attention-Based Convolutional Neural Network for Machine Comprehension

danial chakma

Proceedings of the Workshop on Human-Computer Question Answering, 2016

View PDFchevron_right

CalBERT - Code-Mixed Adaptive Language Representations Using BERT

Ms. Ashwini M Joshi PESIT ISE

2022

View PDFchevron_right

AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

Jonáš Kulhánek

Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, 2021

View PDFchevron_right