BERT Algorithm used in Google Search (original) (raw)
Related papers
A Review on BERT and Its Implementation in Various NLP Tasks
Advances in computer science research, 2023
We present a detailed study of BERT, which stands for 'Bidirectional Encoder Representations from Transformers'. Natural Language Processing can be considered as an important field when concerned with development of intelligent systems. Various tasks require understanding the correct meaning of the sentence in order to provide the output. Languages are difficult to understand by computers due to their ever-changing nature with context. BERT is considered as a revolution for making the computers understand the context of the text, which is the biggest hurdle in natural language processing tasks. It learns the language and its context in a way that closely resembles how a human brain understand the meaning of a sentence. It is unique because of its ability to learn from both left and right context in a sentence for a specific word. The evolution of BERT has marked a new era in perception and understanding of natural languages which may lead computers to grasps the natural language with better comprehension. The purpose of this paper is in the effort of providing a better understanding of BERT language model and its implementation in various NLP tasks.
Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet
2020
One of the challenges in the NLP field is training large classification models, a task that is both difficult and tedious. It is even harder when GPU hardware is unavailable. The increased availability of pre-trained and off-the-shelf word embeddings, models, and modules aim at easing the process of training large models and achieving a competitive performance. We explore the use of off-the-shelf BERT models and share the results of our experiments and compare their results to those of LSTM networks and more simple baselines. We show that the complexity and computational cost of BERT is not a guarantee for enhanced predic-tive performance in the classification tasks at hand.
TiltedBERT: Resource Adjustable Version of BERT
2022
In this paper, we proposed a novel adjustable fine-tuning method that improves the training and inference time of the BERT model on downstream tasks. In the proposed method, we first detect more important word vectors in each layer by our proposed redundancy metric and then eliminate the less important word vectors with our proposed strategy. In our method, the word vector elimination rate in each layer is controlled by the Tilt-Rate hyper-parameter, and the model learns to work with a considerably lower number of Floating Point Operations (FLOPs) than the original BERT\textsubscript{base} model. Our proposed method does not need any extra training steps, and also it can be generalized to other transformer-based models. We perform extensive experiments that show the word vectors in higher layers have an impressive amount of redundancy that can be eliminated and decrease the training and inference time. Experimental results on extensive sentiment analysis, classification and regressi...
A Primer in BERTology: What We Know About How BERT Works
Transactions of the Association for Computational Linguistics, 2020
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research.
Exploring the Role of Transformers in NLP: From BERT to GPT-3
IRJET, 2023
The paper "Exploring the Role of Transformers in NLP: From BERT to GPT-3" provides an overview of the role of Transformers in NLP, with a focus on BERT and GPT-3. It covers topics such as the Role of Transformers in BERT, Transformer Encoder Architecture BERT, and Role of Transformers in GPT-3, Transformers in GPT-3 Architecture, Limitations of Transformers, Transformer Neural Network Design, and Pre-Training Process. The paper also discusses attention visualization and future directions for research, including developing more efficient models and integrating external knowledge sources. It is a valuable resource for researchers and practitioners in NLP, particularly the attention visualization section
User Generated Data: Achilles' heel of BERT
ArXiv, 2020
Owing to BERT's phenomenal success on various NLP tasks and benchmark datasets, industry practitioners have started to experiment with incorporating BERT to build applications to solve industry use cases. Industrial NLP applications are known to deal with much more noisy data as compared to benchmark datasets. In this work we systematically show that when the text data is noisy, there is a significant degradation in the performance of BERT. While this work is motivated from our business use case of building NLP applications for user generated text data which is known to be very noisy, our findings are applicable across various use cases in the industry. Specifically, we show that BERT's performance on fundamental tasks like sentiment analysis and textual similarity drops significantly as we introduce noise in data in the form of spelling mistakes and typos. For our experiments we use three well known datasets - IMDB movie reviews, SST-2 and STS-B to measure the performance. ...
DeText: A Deep Text Ranking Framework with BERT
2020
Ranking is the most important component in a search system. Mostsearch systems deal with large amounts of natural language data,hence an effective ranking system requires a deep understandingof text semantics. Recently, deep learning based natural languageprocessing (deep NLP) models have generated promising results onranking systems. BERT is one of the most successful models thatlearn contextual embedding, which has been applied to capturecomplex query-document relations for search ranking. However,this is generally done by exhaustively interacting each query wordwith each document word, which is inefficient for online servingin search product systems. In this paper, we investigate how tobuild an efficient BERT-based ranking model for industry use cases.The solution is further extended to a general ranking framework,DeText, that is open sourced and can be applied to various rankingproductions. Offline and online experiments of DeText on threereal-world search systems present signif...
BERT-QPP: Contextualized Pre-trained transformers for Query Performance Prediction
Proceedings of the 30th ACM International Conference on Information & Knowledge Management
Query Performance Prediction (QPP) is focused on estimating the difficulty of satisfying a user query for a certain retrieval method. While most state of the art QPP methods are based on term frequency and corpus statistics, more recent work in this area have started to explore the utility of pretrained neural embeddings, neural architectures and contextual embeddings. Such approaches extract features from pretrained or contextual embeddings for the sake of training a supervised performance predictor. In this paper, we adopt contextual embeddings to perform performance prediction, but distinguish ourselves from the state of the art by proposing to directly fine-tune a contextual embedding, i.e., BERT, specifically for the task of query performance prediction. As such, our work allows the fine-tuned contextual representations to estimate the performance of a query based on the association between the representation of the query and the retrieved documents. We compare the performance of our approach with the state-of-the-art based on the MS MARCO passage retrieval corpus and its three associated query sets: (1) MS MARCO development set, (2) TREC DL 2019, and (3) TREC DL 2020. We show that our approach not only shows significant improved prediction performance compared to all the state-of-the-art methods, but also, unlike past neural predictors, it shows significantly lower latency, making it possible to use in practice.
Contextual Embeddings-Based Web Page Categorization Using the Fine-Tune BERT Model
Symmetry
The World Wide Web has revolutionized the way we live, causing the number of web pages to increase exponentially. The web provides access to a tremendous amount of information, so it is difficult for internet users to locate accurate and useful information on the web. In order to categorize pages accurately based on the queries of users, methods of categorizing web pages need to be developed. The text content of web pages plays a significant role in the categorization of web pages. If a word’s position is altered within a sentence, causing a change in the interpretation of that sentence, this phenomenon is called polysemy. In web page categorization, the polysemy property causes ambiguity and is referred to as the polysemy problem. This paper proposes a fine-tuned model to solve the polysemy problem, using contextual embeddings created by the symmetry multi-head encoder layer of the Bidirectional Encoder Representations from Transformers (BERT). The effectiveness of the proposed mod...
On the Study of Transformers for Query Suggestion
ACM Transactions on Information Systems, 2022
When conducting a search task, users may find it difficult to articulate their need, even more so when the task is complex. To help them complete their search, search engine usually provide query suggestions. A good query suggestion system requires to model user behavior during the search session. In this article, we study multiple Transformer architectures applied to the query suggestion task and compare them with recurrent neural network (RNN)-based models. We experiment Transformer models with different tokenizers, with different Encoders (large pretrained models or fully trained ones), and with two kinds of architectures (flat or hierarchic). We study the performance and the behaviors of these various models, and observe that Transformer-based models outperform RNN-based ones. We show that while the hierarchical architectures exhibit very good performances for query suggestion, the flat models are more suitable for complex and long search tasks. Finally, we investigate the flat ...