ConvBERT: Improving BERT with Span-based Dynamic Convolution (original) (raw)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Sench Galiedon

View PDFchevron_right

LNLF-BERT: Transformer for Long Document Classification with Multiple Attention Levels

Linh Manh Pham

IEEE Access, 2024

View PDFchevron_right

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

Melison Dylan

View PDFchevron_right

SesameBERT: Attention for Anywhere

Hsiang Chih Cheng

2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020

View PDFchevron_right

Question Answering Using Hierarchical Attention on Top of BERT Features

Reham Osama

Proceedings of the 2nd Workshop on Machine Reading for Question Answering

View PDFchevron_right

DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

Vladimir Araujo

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

View PDFchevron_right

RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling

herman sugiharto

arXiv (Cornell University), 2023

View PDFchevron_right

Sequential Attention Module for Natural Language Processing

lianxin jiang

ArXiv, 2021

View PDFchevron_right

Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet

Victor Makarenkov

2020

View PDFchevron_right

Do Attention Heads in BERT Track Syntactic Dependencies?

Shikha Bordia

ArXiv, 2019

View PDFchevron_right

Span Selection Pre-training for Question Answering

Alfio Gliozzo

arXiv (Cornell University), 2019

View PDFchevron_right

Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling

Tianyi Zhou

arXiv (Cornell University), 2018

View PDFchevron_right

ThisIsCompetition at SemEval-2019 Task 9: BERT is unstable for out-of-domain samples

Changki Lee

Proceedings of the 13th International Workshop on Semantic Evaluation

View PDFchevron_right

On the Prunability of Attention Heads in Multilingual BERT

Madhura Pande

ArXiv, 2021

View PDFchevron_right

Character-Level Language Modeling with Deeper Self-Attention

Mandy Guo

Proceedings of the AAAI Conference on Artificial Intelligence, 2019

View PDFchevron_right

Scalable Attentive Sentence Pair Modeling via Distilled Sentence Embedding

itzik malkiel

Proceedings of the AAAI Conference on Artificial Intelligence, 2020

View PDFchevron_right

Question Answering with Self-Attention

George Sarmonikas

2020

View PDFchevron_right

SuperShaper: Task-Agnostic Super Pre-training of BERT Models with Variable Hidden Dimensions

Vinod Ganesan

ArXiv, 2021

View PDFchevron_right

BERT Probe: A python package for probing attention based robustness evaluation of BERT models

Mahnoor Shahid

Software Impacts

View PDFchevron_right

Span-Based Neural Buffer: Towards Efficient and Effective Utilization of Long-Distance Context for Neural Sequence Models

Kaisheng Yao

Proceedings of the AAAI Conference on Artificial Intelligence

View PDFchevron_right

TiltedBERT: Resource Adjustable Version of BERT

Mohammad Sharifkhani

2022

View PDFchevron_right

Attention Is All You Need

Brittney Shi, Illia Polosukhin

View PDFchevron_right

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Tianyi Zhou

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

View PDFchevron_right

BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks

Julien Kloetzer

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

View PDFchevron_right

The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT

Madhura Pande

2021

View PDFchevron_right

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Naman Goyal

arXiv (Cornell University), 2019

View PDFchevron_right

Attention-Based Convolutional Neural Network for Machine Comprehension

danial chakma

Proceedings of the Workshop on Human-Computer Question Answering, 2016

View PDFchevron_right

HUBERT Untangles BERT to Improve Transfer across NLP Tasks

Paul Smolensky

ArXiv, 2019

View PDFchevron_right

Improving the BERT model for long text sequences in question answering domain

Mareeswari Venkatachala, IJAAS Journal

International Journal of Advances in Applied Sciences (IJAAS), 2023

View PDFchevron_right

Ensemble ALBERT and RoBERTa for Span Prediction in Question Answering

Sony Bachina

2021

View PDFchevron_right

ABC: Attention with Bounded-memory Control

Noah Smith

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View PDFchevron_right

GiBERT: Enhancing BERT with Linguistic Information using a Lightweight Gated Injection Method

Maria Liakata

Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

View PDFchevron_right

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention

Andreas Stolcke

2016

View PDFchevron_right

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering

Changmao Li

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

View PDFchevron_right

Structured Attention Networks

Carl Denton

arXiv: Computation and Language, 2017

View PDFchevron_right