ConvBERT: Improving BERT with Span-based Dynamic Convolution (original) (raw)
Related papers
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
LNLF-BERT: Transformer for Long Document Classification with Multiple Attention Levels
IEEE Access, 2024
SesameBERT: Attention for Anywhere
2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Question Answering Using Hierarchical Attention on Top of BERT Features
Proceedings of the 2nd Workshop on Machine Reading for Question Answering
RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling
arXiv (Cornell University), 2023
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP
Sequential Attention Module for Natural Language Processing
ArXiv, 2021
Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet
2020
On the Prunability of Attention Heads in Multilingual BERT
ArXiv, 2021
ThisIsCompetition at SemEval-2019 Task 9: BERT is unstable for out-of-domain samples
Proceedings of the 13th International Workshop on Semantic Evaluation
Do Attention Heads in BERT Track Syntactic Dependencies?
ArXiv, 2019
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling
arXiv (Cornell University), 2018
Character-Level Language Modeling with Deeper Self-Attention
Proceedings of the AAAI Conference on Artificial Intelligence, 2019
Brittney Shi, Illia Polosukhin
BERT Probe: A python package for probing attention based robustness evaluation of BERT models
Software Impacts
TiltedBERT: Resource Adjustable Version of BERT
2022
SuperShaper: Task-Agnostic Super Pre-training of BERT Models with Variable Hidden Dimensions
ArXiv, 2021
Scalable Attentive Sentence Pair Modeling via Distilled Sentence Embedding
Proceedings of the AAAI Conference on Artificial Intelligence, 2020
Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018
Span Selection Pre-training for Question Answering
arXiv (Cornell University), 2019
Question Answering with Self-Attention
2020
Proceedings of the AAAI Conference on Artificial Intelligence
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
RoBERTa: A Robustly Optimized BERT Pretraining Approach
arXiv (Cornell University), 2019
2021
Improving the BERT model for long text sequences in question answering domain
Mareeswari Venkatachala, IJAAS Journal
International Journal of Advances in Applied Sciences (IJAAS), 2023
GiBERT: Enhancing BERT with Linguistic Information using a Lightweight Gated Injection Method
Findings of the Association for Computational Linguistics: EMNLP 2021, 2021
HUBERT Untangles BERT to Improve Transfer across NLP Tasks
ArXiv, 2019
AttViz: Online exploration of self-attention for transparent neural language modeling
ArXiv, 2020
Long-span language modeling for speech recognition
arXiv: Computation and Language, 2019
ABC: Attention with Bounded-memory Control
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Attention-Based Convolutional Neural Network for Machine Comprehension
Proceedings of the Workshop on Human-Computer Question Answering, 2016
CalBERT - Code-Mixed Adaptive Language Representations Using BERT
2022
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, 2021