The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT (original) (raw)
Related papers
Do Attention Heads in BERT Track Syntactic Dependencies?
ArXiv, 2019
How Far Does BERT Look At:Distance-based Clustering and Analysis of BERTś Attention
2020
On the Prunability of Attention Heads in Multilingual BERT
ArXiv, 2021
What Does BERT Look at? An Analysis of BERT’s Attention
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2019
BERT Probe: A python package for probing attention based robustness evaluation of BERT models
Software Impacts
Exploring the Role of Transformers in NLP: From BERT to GPT-3
IRJET, 2023
Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on a Syntactic Task
Findings of the Association for Computational Linguistics: ACL 2022
LNLF-BERT: Transformer for Long Document Classification with Multiple Attention Levels
IEEE Access, 2024
A Primer in BERTology: What We Know About How BERT Works
Transactions of the Association for Computational Linguistics, 2020
Representation biases in sentence transformers
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023
Using Roark-Hollingshead Distance to Probe BERT’s Syntactic Competence
2022
Morphosyntactic probing of multilingual BERT models
Natural Language Engineering
The Universe of Utterances According to BERT
ICWS, 2023
TiltedBERT: Resource Adjustable Version of BERT
2022
The argument-adjunct distinction in BERT: A FrameNet-based investigation
ICWS, 2023
Look at that! BERT can be easily distracted from paying attention to morphosyntax
2021
Word-order typology in Multilingual BERT: A case study in subordinate-clause detection
Proceedings of SIGTYP Workshop, 2022
An exploratory study on code attention in BERT
Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension
What Does BERT Learn about the Structure of Language?
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Augmenting BERT Carefully with Underrepresented Linguistic Features
arXiv (Cornell University), 2020
Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet
2020
Exploring Linguistic Properties of Monolingual BERTs with Typological Classification among Languages
arXiv (Cornell University), 2023
Cornell University - arXiv, 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
2020
Exploring Neural Language Models via Analysis of Local and Global Self-Attention Spaces
2021
AILAB-Udine@SMM4H 22: Limits of Transformers and BERT Ensembles
Cornell University - arXiv, 2022
On the evolution of syntactic information encoded by BERT’s contextualized representations
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
Proceedings of the First Workshop on Insights from Negative Results in NLP, 2020
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
SesameBERT: Attention for Anywhere
2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020
Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
2021
An Interpretability Illusion for BERT
2021
On Robustness of Finetuned Transformer-based NLP Models
arXiv (Cornell University), 2023
End-to-End Transformer-Based Models in Textual-Based NLP
AI
On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2021