Emad Kebriaei - Academia.edu (original) (raw)

Papers by Emad Kebriaei

Research paper thumbnail of UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models

Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Detecting which parts of a sentence contribute to that sentence's toxicity-rather than providing ... more Detecting which parts of a sentence contribute to that sentence's toxicity-rather than providing a sentence-level verdict of hatefulnesswould increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multiple models and contextual embeddings and report the best setting out of all. The experiments start with keyword-based models and are followed by attention-based, named entitybased, transformers-based, and ensemble models. Our best approach, an ensemble model, achieves an F1 of 0.684 in the competition's evaluation phase. 2 Related Work In this section we provide a brief overview of studies on hate and toxic speech detection, followed by work on span detection in different sub-fields. 2.1 Hate Speech Hate speech is defined as "any communication that disparages a person or a group on the basis of some characteristic such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other

Research paper thumbnail of Deep Sentiment Analysis using a Graph-based Text Representation

Social media brings about new ways of communication among people and is influencing trading strat... more Social media brings about new ways of communication among people and is influencing trading strategies in the market. The popularity of social networks produces a large collection of unstructured data such as text and image in a variety of disciplines like business and health. The main element of social media arises as text which provokes a set of challenges for traditional information retrieval and natural language processing tools. Informal language, spelling errors, abbreviations, and special characters are typical in social media posts. These features lead to a prohibitively large vocabulary size for text mining methods. Another problem with traditional social text mining techniques is that they fail to take semantic relations into account, which is essential in a domain of applications such as event detection, opinion mining, and news recommendation. This paper set out to employ a network-based viewpoint on text documents and investigate the usefulness of graph representation t...

Research paper thumbnail of Improved model-based clustering using evolutionary optimization

As an optimization strategy Maximum Likelihood's definitive goal is to adjust a statistic mod... more As an optimization strategy Maximum Likelihood's definitive goal is to adjust a statistic model with a specific dataset, the method will adjust some variables of a statistical model from a dataset or a known distribution, so the model can “describe” each data sample and estimate the others. Clustering can be based on probability models to cover the missing values. This provides insights into when the data should conform to the model and led to development of new clustering methods such as Expectation Maximization (EM) which is based on the principle of Maximum Likelihood of unobserved variables in finite mixture models. Evolutionary Algorithms are trusted to further improve optimization tactics, in this paper Big-Bang Big-Crunch evolutionary algorithm have been used to boost model-based clustering and experimental results on the real datasets shows its superiority over the typical model-based clustering methods.

Research paper thumbnail of ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization

ArXiv, 2021

Abstractive text summarization is one of the areas influenced by the emergence of pre-trained lan... more Abstractive text summarization is one of the areas influenced by the emergence of pre-trained language models. Current pre-training works in abstractive summarization give more points to the summaries with more words in common with the main text and pay less attention to the semantic similarity between generated sentences and the original document. We propose ARMAN, a Transformer-based encoderdecoder model pre-trained with three novel objectives to address this issue. In ARMAN, salient sentences from a document are selected according to a modified semantic score to be masked and form a pseudo summary. To summarize more accurately and similar to human writing patterns, we applied modified sentence reordering. We evaluated our proposed models on six downstream Persian summarization tasks. Experimental results show that our proposed model achieves state-of-the-art performance on all six summarization tasks measured by ROUGE and BERTScore. Our models also outperform prior works in textu...

Research paper thumbnail of Leveraging Deep Graph-Based Text Representation for Sentiment Polarity Applications

Expert Systems with Applications

Over the last few years, machine learning over graph structures has manifested a significant enha... more Over the last few years, machine learning over graph structures has manifested a significant enhancement in text mining applications such as event detection, opinion mining, and news recommendation. One of the primary challenges in this regard is structuring a graph that encodes and encompasses the features of textual data for the effective machine learning algorithm. Besides, exploration and exploiting of semantic relations is regarded as a principal step in text mining applications. However, most of the traditional text mining methods perform somewhat poor in terms of employing such relations. In this paper, we propose a sentence-level graph-based text representation which includes stop words to consider semantic and term relations. Then, we employ a representation learning approach on the combined graphs of sentences to extract the latent and continuous features of the documents. Eventually, the learned features of the documents are fed into a deep neural network for the sentiment classification task. The experimental results demonstrate that the proposed method substantially outperforms the related sentiment analysis approaches based on several benchmark datasets. Furthermore, our method can be generalized on different datasets without any dependency on pre-trained word embeddings.

Research paper thumbnail of UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models

Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Detecting which parts of a sentence contribute to that sentence's toxicity-rather than providing ... more Detecting which parts of a sentence contribute to that sentence's toxicity-rather than providing a sentence-level verdict of hatefulnesswould increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multiple models and contextual embeddings and report the best setting out of all. The experiments start with keyword-based models and are followed by attention-based, named entitybased, transformers-based, and ensemble models. Our best approach, an ensemble model, achieves an F1 of 0.684 in the competition's evaluation phase. 2 Related Work In this section we provide a brief overview of studies on hate and toxic speech detection, followed by work on span detection in different sub-fields. 2.1 Hate Speech Hate speech is defined as "any communication that disparages a person or a group on the basis of some characteristic such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other

Research paper thumbnail of Deep Sentiment Analysis using a Graph-based Text Representation

Social media brings about new ways of communication among people and is influencing trading strat... more Social media brings about new ways of communication among people and is influencing trading strategies in the market. The popularity of social networks produces a large collection of unstructured data such as text and image in a variety of disciplines like business and health. The main element of social media arises as text which provokes a set of challenges for traditional information retrieval and natural language processing tools. Informal language, spelling errors, abbreviations, and special characters are typical in social media posts. These features lead to a prohibitively large vocabulary size for text mining methods. Another problem with traditional social text mining techniques is that they fail to take semantic relations into account, which is essential in a domain of applications such as event detection, opinion mining, and news recommendation. This paper set out to employ a network-based viewpoint on text documents and investigate the usefulness of graph representation t...

Research paper thumbnail of Improved model-based clustering using evolutionary optimization

As an optimization strategy Maximum Likelihood's definitive goal is to adjust a statistic mod... more As an optimization strategy Maximum Likelihood's definitive goal is to adjust a statistic model with a specific dataset, the method will adjust some variables of a statistical model from a dataset or a known distribution, so the model can “describe” each data sample and estimate the others. Clustering can be based on probability models to cover the missing values. This provides insights into when the data should conform to the model and led to development of new clustering methods such as Expectation Maximization (EM) which is based on the principle of Maximum Likelihood of unobserved variables in finite mixture models. Evolutionary Algorithms are trusted to further improve optimization tactics, in this paper Big-Bang Big-Crunch evolutionary algorithm have been used to boost model-based clustering and experimental results on the real datasets shows its superiority over the typical model-based clustering methods.

Research paper thumbnail of ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization

ArXiv, 2021

Abstractive text summarization is one of the areas influenced by the emergence of pre-trained lan... more Abstractive text summarization is one of the areas influenced by the emergence of pre-trained language models. Current pre-training works in abstractive summarization give more points to the summaries with more words in common with the main text and pay less attention to the semantic similarity between generated sentences and the original document. We propose ARMAN, a Transformer-based encoderdecoder model pre-trained with three novel objectives to address this issue. In ARMAN, salient sentences from a document are selected according to a modified semantic score to be masked and form a pseudo summary. To summarize more accurately and similar to human writing patterns, we applied modified sentence reordering. We evaluated our proposed models on six downstream Persian summarization tasks. Experimental results show that our proposed model achieves state-of-the-art performance on all six summarization tasks measured by ROUGE and BERTScore. Our models also outperform prior works in textu...

Research paper thumbnail of Leveraging Deep Graph-Based Text Representation for Sentiment Polarity Applications

Expert Systems with Applications

Over the last few years, machine learning over graph structures has manifested a significant enha... more Over the last few years, machine learning over graph structures has manifested a significant enhancement in text mining applications such as event detection, opinion mining, and news recommendation. One of the primary challenges in this regard is structuring a graph that encodes and encompasses the features of textual data for the effective machine learning algorithm. Besides, exploration and exploiting of semantic relations is regarded as a principal step in text mining applications. However, most of the traditional text mining methods perform somewhat poor in terms of employing such relations. In this paper, we propose a sentence-level graph-based text representation which includes stop words to consider semantic and term relations. Then, we employ a representation learning approach on the combined graphs of sentences to extract the latent and continuous features of the documents. Eventually, the learned features of the documents are fed into a deep neural network for the sentiment classification task. The experimental results demonstrate that the proposed method substantially outperforms the related sentiment analysis approaches based on several benchmark datasets. Furthermore, our method can be generalized on different datasets without any dependency on pre-trained word embeddings.