ghassan kanaan - Profile on Academia.edu (original) (raw)

Papers by ghassan kanaan

Research paper thumbnail of A Review Study for Arabic Machine Learning and Deep Learning Methods

A Review Study for Arabic Machine Learning and Deep Learning Methods

2022 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS)

Research paper thumbnail of Offensive Language Detection in Social Networks for Arabic Language Using Clustering Techniques

With the advent of social networks, the users have obtained a golden opportunity to express their... more With the advent of social networks, the users have obtained a golden opportunity to express their opinions using text and multimedia. However, some users abused these platforms by introducing acts such as Cyber-Bullying and Cyber-Harassment. Despite the various negative health and social effects, the works proposed toward the detection of these acts are still limited, especially in non-English languages. In Arabic, few works studied this phenomenon. These works had limited datasets. As the number of available training datasets are limited, it is still hard to train classifiers to detect these acts. Therefore, clustering has posed as an alternative solution to tackle this difficulty. In this work, we propose the use of clustering to detect Cyber-Bullying and Cyber-Harassment. We adopted various clustering algorithms including K-Means and Expectation Maximization (EM). Moreover, we used various natural language processing (NLP) tools for this objective. The results illustrate that the...

Research paper thumbnail of Arbitrary Passage Retreval Based on Fixed - Length Window Using Arabic Documents

عمادة البحث العلمي والدراسات العليا - جامعة اليرموك, 2009

Retrieval systems accumulate a great range of documents, from abstracts, newspaper articles, and ... more Retrieval systems accumulate a great range of documents, from abstracts, newspaper articles, and web pages to journal articles, books, court transcripts, and legislation. Collections of different types of documents represent shortcomings in current approaches toward ranking schemes. The use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings. Passage ranking provides suitable units of text, to be returned to the user, can avoid the difficulties of comparing documents of different lengths, and enables identification of short blocks of relevant material amongst otherwise irrelevant text. This paper proposes a new method of passage retrieval called fixed length arbitrary passage retrieval for Arabic documents. This method has been discussed, implemented, and evaluated. The experiment results show that ranking with fixed arbitrary passage gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with other passage types.

Research paper thumbnail of Arab Academy for Banking and Financial Sciences Amman Jordan

Word sense ambiguity is widely spread in all natural languages; a word may carry several distinct... more Word sense ambiguity is widely spread in all natural languages; a word may carry several distinct meanings. Human can figure out the suitable meaning according to the context in which the word occurs. The Arabic language is highly polysemous; in many situations we find it extremely necessary to disambiguate the word senses. This paper studies and compares the performance of a search engine before and after expanding the query through Interactive Word Sense Disambiguation (WSD). We found that expanding polysemous query terms by adding more specific synonyms will narrow the search into the specific targeted request and thus causes both precision and recall to increase; on the other hand, expanding the query with a more general (polysemous) synonym will broaden the search which would cause the precision to decrease.

Research paper thumbnail of Evaluating Machine Translations from Arabic into English and Vice Versa

International Research Journal of Electronics and Computer Engineering, 2017

Machine translation (MT) allows direct communication between two persons without the need for the... more Machine translation (MT) allows direct communication between two persons without the need for the third party or via dictionary in your pocket, which could bring significant and per formative improvement. Since most traditional translational way is a word-sensitive, it is very important to consider the word order in addition to word selection in the evaluation of any machine translation. To evaluate the MT performance, it is necessary to dynamically observe the translation in the machine translator tool according to word order, and word selection and furthermore the sentence length. However, applying a good evaluation with respect to all previous points is a very challenging issue. In this paper, we first summarize various approaches to evaluate machine translation. We propose a practical solution by selecting an appropriate powerful tool called iBLEU to evaluate the accuracy degree of famous MT tools (i.e. Google, Bing, Systranet and Babylon). Based on the solution structure, we further discuss the performance order for these tools in both directions Arabic to English and English to Arabic. After extensive testing, we can decide that any direction gives more accurate results in translation based on the selected machine translations MTs. Finally, we proved the choosing of Google as best system performance and Systranet as the worst one.

Research paper thumbnail of Improved hierarchical classifiers for multi-way sentiment analysis

Improved hierarchical classifiers for multi-way sentiment analysis

Int. Arab J. Inf. Technol., 2017

Sentiment Analysis (SA) is field in computational linguistics concerned with determining the sent... more Sentiment Analysis (SA) is field in computational linguistics concerned with determining the sentiment conveyed in a piece of text towards certain entities (such as people, organizations, products, services, events, etc.) using NLP tools. The considered sentiments can be as simple as positive vs. negative. A more fine-grained approach known as Multi-Way Sentiment Analysis (MWSA) is based on ranking systems, such as the 5-star ranking system. In such systems, rankings close to each other can be confusing; thus, some researchers have suggested that using Hierarchical Classifiers (HCs) can yield better results compared with traditional Flat Classifier (FCs). Unlike FCs, which try to address the entire classification problem at once, HCs employ some kind of tree structures where the nodes are simple “core” classifiers customized to address a subset of the classification problem. This study aims to explore extensively the use of HCs to address MWSA by studying six different hierarchies. ...

Research paper thumbnail of Comparison Between Inverted and Signature Files Based on Arabic Documents

The purpose of this research is to give an idea about inverted files and signature files based on... more The purpose of this research is to give an idea about inverted files and signature files based on Arabic documents collection, and to give the comparison points between the two techniques and the performance of the two techniques on each of the comparison points. The most common measures of system performance used to compare the information retrieval mechanisms are time, space, and recall/precision evaluation measurements. The shorter the response time is, the smaller the space used, the better system is considered to be [1], so our comparisons point will include space overhead, search time, and average recall/precision . In this research, two indices will be built, inverted-file and signature-file. However, to measure the performance of each one, a retrieval system must be built to compare the results of using these indices. A collection of 242 Arabic Abstracts from the proceeding of the Saudi Arabian National Computer Conferences have been used in the two systems, and a collection...

Research paper thumbnail of Arbitrary Passage Retrieval Based on Fixed-Length Window using Arabic Documents

Retrieval systems accumulate a great range of documents, from abstracts, newspaper articles, and ... more Retrieval systems accumulate a great range of documents, from abstracts, newspaper articles, and web pages to journal articles, books, court transcripts, and legislation. Collections of different types of documents represent shortcomings in current approaches toward ranking schemes. The use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings. Passage ranking provides suitable units of text, to be returned to the user, can avoid the difficulties of comparing documents of different lengths, and enables identification of short blocks of relevant material amongst otherwise irrelevant text. This paper proposes a new method of passage retrieval called fixed length arbitrary passage retrieval for Arabic documents. This method has been discussed, implemented, and evaluated. The experiment results show that ranking with fixed arbitrary passage gives substantial improvements in retrieval effectiveness over traditional document ranking s...

Research paper thumbnail of Chapter IV Enhanced Information Retrieval Evaluation between Pseudo Relevance Feedback and Query Similarity Relevant Documents Methodology Applied on Arabic Text

Information retrieval systems utilize user feedback for generating optimal queries with respect t... more Information retrieval systems utilize user feedback for generating optimal queries with respect to a particular information need. However, the methods that have been developed in IR for generating these queries do not memorize information gathered from previous search processes, and hence cannot use such information in new search processes. Thus, a new search process cannot profit from the results of the previous processes. Web Information Retrieval systems should be able to maintain results from previous search processes, thus learning from previous queries and improving overall retrieval quality. In this chapter, we are using the similarity of a new query to previously learned queries. We then expand the new query by extracting terms from documents, which have been judged as relevant to these previously

Research paper thumbnail of Arabic Text Categorization: A Comparison Survey

Arabic Text Categorization: A Comparison Survey

2021 International Conference on Information Technology (ICIT), 2021

Text categorization acquires more significance considering the plenty of text added continually o... more Text categorization acquires more significance considering the plenty of text added continually on the web. The lack of huge and free Arabic datasets makes it more difficult to classify. This paper reviews some text classification papers with some comparisons between the datasets they used, the techniques they applied, and the best results they reached for the different methodologies that have been implemented.

Research paper thumbnail of Enhanced Arabic Information Retrieval by Using Arabic Slang Language

Modern Applied Science, 2019

Slang language has become the most used language in the most countries. It has almost become the ... more Slang language has become the most used language in the most countries. It has almost become the first language in the social media, websites and daily conversations. Moreover, it has become used in many conferences to clarify information and to deliver the required purpose of them. Therefore, this great spread of slang language over the world. In Jordan indicates that it is important to know meanings of Jordanian slang vocabularies. Mainly, In research system, we created a system framework allows users to restore Arabic information depending on queries that are written in slang language and this framework was made basically by context-free grammar to convert from slang to classical and vice versa. In addition, to conclude with, we will apply it on the colloquial slang in North of Jordan specifically; Irbid, Ajloun, Jerash, Mafraq and AlRamtha city. As well as, we will make a special file for Non_Arabic words and the stop words too. After we made an evaluation for the system relying...

Research paper thumbnail of Retrieving Arabic Textual Documents Based on Queries Written in Bahraini Slang Language

Modern Applied Science, 2019

Nowadays, the most used language is the colloquial language not the classical language. It is wid... more Nowadays, the most used language is the colloquial language not the classical language. It is widely used in many nations. The kingdom of Bahrain had the largest share in the spread of the colloquial language, which becomes the trader's language and the language of the social communication too. It became so popular that its usage starts dominating the daily conversations. In this research, we will create algorithm to enhance the process of information retrieval in Arabic slang language of the Gulf. In this algorithm, we put some special Bahraini rules to convert queries from Slang Bahraini to a classical language. In addition, we will apply this algorithm on the Bahraini colloquial language. After making an evaluation for the system relying on the results of three main aspects recall, precision, and F-measure, we noticed that the results of precision about 0.64 for both researches slang and classical, which gives a great indication that the system supports searching in Bahraini ...

Research paper thumbnail of Emotion analysis of Arabic articles and its impact on identifying the author's gender

Emotion analysis of Arabic articles and its impact on identifying the author's gender

2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 2015

The Gender Identification (GI) problem is concerned with determining the gender of the author of ... more The Gender Identification (GI) problem is concerned with determining the gender of the author of a given text based on its contents. The GI problem is one of the authorship profiling problems which have a wide range of applications in various fields such as marketing and security. Due to its importance, extensive research efforts have been invested in the GI problem for different languages. Unfortunately, the same cannot be said about the Arabic language despite its strategic importance and widespread. In this work, we explore the GI problem for Arabic text as a supervised learning problem. Specifically, we consider and compare two approaches for feature extraction. The first one is the Bag-Of-Words (BOW) approach while the second one is based on computing features related to sentiments and emotions. One goal of this work is to confirm the validity of the common stereotype that female authors tend to write in a more emotional way than male authors. Our results show that there is no conclusive evidence that this is true for our dataset.

Research paper thumbnail of An Improved Algorithm for the Extraction of Triliteral Arabic Roots

European Scientific Journal, Jan 31, 2014

Stemming in the Arabic language is extracting the root form of the verb, removing inflectional af... more Stemming in the Arabic language is extracting the root form of the verb, removing inflectional affixes and derivational morphemes. Stemming is a share form of language processing in the systems of information retrieval. It is similar to the morphological processing used in natural language processing, but to some extent has different aims. Stemming is used to reduce word forms to common words. Stemming is the process of removing all affixes from a word to extract its root. This paper describes a stemming algorithm that has been developed for the Arabic language. The algorithm utilizes an important morphological aspect of the Arabic language. The algorithm examines the word and extracts its root. It examines the word letter by letter starting from the end of the word, i.e., from the last letter of the word to the first. The algorithm correctly stems most Arabic words that are derived from roots, and achieves high rate of accuracy. The algorithm has been tested on a corpus of 242 abstracts of Arabic documents from the

Research paper thumbnail of Extracting Named Entities Using Named Entity Recognizer and Generating Topics Using Latent Dirichlet Allocation Algorithm for Arabic News Articles

This paper explains for the Arabic language, how to extract named entities and topics from news a... more This paper explains for the Arabic language, how to extract named entities and topics from news articles. Due to the lack of high quality tools for Named Entity Recognition (NER) and topic identification for Arabic, we have built an Arabic NER (RenA) and an Arabic topic extraction tool using the popular LDA algorithm (ALDA). NER involves extracting information and identifying types, such as name, organization, and location. LDA works by applying statistical methods to vector representations of collections of documents. Though there are effective tools for NER and LDA for English, these are not directly applicable to Arabic. Accordingly, we developed new methods and tools (i.e., RenA and ALDA). To allow assessment of these, and comparison with other methods and tools, we built a baseline corpus to be used in NER evaluation, with help from volunteer graduate students who understand Arabic. RenA produces good results, with accurate Name, Organization, and Location extraction from news articles collected from online resources. We compared the RenA results with a popular Arabic NER, and achieved an enhancement. We also carried out an experiment to evaluate ALDA, again involving volunteer graduate students who understand Arabic. ALDA showed very good results in terms of topics extraction form Arabic news articles, achieving high accuracy, based on an experimental evaluation with participants using a Likert scale.

Research paper thumbnail of Approaches to Retrieve Verses of the Holy Quran Based on Full Meaning

Approaches to Retrieve Verses of the Holy Quran Based on Full Meaning

2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, 2013

Most information retrieval researchers and specialists in NLP have focused on employing the conce... more Most information retrieval researchers and specialists in NLP have focused on employing the concepts of the Holy Quran in their field of work as a step towards combining modern technology and the understanding of Quran itself. Retrieving information from the Holy Quran in response to a particular query has been considered to be a very significant field in research and analysis. Many researchers have found special search options to retrieve verses from the Holy Quran such as exact matching of query, query's stem matching, and query's synonyms matching based on certain thesauruses. In this study, the researchers adopt special search options for retrieving verses from the Holy Quran based on single-word queries according to the full meaning of the Quran's verses in addition to the meaning of the search query; in other words, the researchers in this study add the exact matching of search-query as a default search option in order to retrieve the verses that match the search query in a precise form. After that, the researchers also add query's synonyms matching as a search option to give the user the ability to add any synonym that can enhance the retrieval process. The synonyms selection is not based on a thesaurus but it just an additional search option that can aid in enhancing the Information retrieval process. In order to find the Quran verses that are related to the search query without including the word or its synonyms exactly, the researchers add new special search options based on the full meaning of the verses. These new search options are based on special indices for each verse in the Holy Quran about the topic of the verse and the explanation of its full meaning. The researchers select two books of the Holy Quran explanation that gives the full meaning of each verse; "Tafseer Klmat Al-Quran Tafseer w Bayan" and "Tafseer Al- Jalalayn".

Research paper thumbnail of Different Classification Algorithms Based on Arabic Text Classification: Feature Selection Comparative Study

International Journal of Advanced Computer Science and Applications, 2015

Feature selection is necessary for effective text classification. Dataset preprocessing is essent... more Feature selection is necessary for effective text classification. Dataset preprocessing is essential to make upright result and effective performance. This paper investigates the effectiveness of using feature selection. In this paper we have been compared the performance between different classifiers in different situations using feature selection with stemming, and without stemming.Evaluation used a BBC Arabic dataset, different classification algorithms such as decision tree (D.T), Knearest neighbors (KNN), Naïve Bayesian (NB) method and Naïve Bayes Multinomial(NBM) classifier were used. The experimental results are presented in term of precision, recall, F-Measures, accuracy and time to build model.

Research paper thumbnail of Building an effective rule-based light stemmer for Arabic language to inprove search effectiveness

Building an effective rule-based light stemmer for Arabic language to inprove search effectiveness

2008 International Conference on Innovations in Information Technology, 2008

... [3] Hayder K. Al Ameed, Shaikha O. Al Ketbi, Amna A. Al Kaabi, Khadija S. Al Shebli, Naila F.... more ... [3] Hayder K. Al Ameed, Shaikha O. Al Ketbi, Amna A. Al Kaabi, Khadija S. Al Shebli, Naila F. Al Shamsi, Noura H. Al Nuaimi, Shaikha S. Al Muhairi, Arabic Light Stemmer: Anew Enhanced Approach, Software Engineering Dept., College of Information Technology, UAE ...

Research paper thumbnail of A Comparison between Interactive and Automatic Query Expansion Applied on Arabic Language

A Comparison between Interactive and Automatic Query Expansion Applied on Arabic Language

2007 Innovations in Information Technologies (IIT), 2007

Much attention has been paid to the relative effectiveness of interactive query expansion (IQE) v... more Much attention has been paid to the relative effectiveness of interactive query expansion (IQE) versus automatic query expansion (AQE). This research has been shown that automatic query expansion (collection dependent) strategy gives better performance than no query expansion. The percentage of queries that are improved by AQE strategy is 57% with average precision equal to 43.2. Compared against AQE (collection

Research paper thumbnail of Impact of Stemmer on Arabic Text Retrieval

Information Retrieval Technology, 2014

Stemming is a process of reducing inflected words to their stem, stem or root from a generally wr... more Stemming is a process of reducing inflected words to their stem, stem or root from a generally written word form. One of the high inflected words in the languages world is Arabic Language. Stemming improve the retrieval performance by reducing words variants, and in lcrease the similarity between related words. However, an Arabic Information Retrieval (AIR) can use stemming algorithms to retrieve a greater number of documents related to the users' query. Therefore, the aim of this paper is to evaluate the impact of three different Arabic stemmers (i.e. 'Information Science Research Institute" (ISRI), morphological and syntax based lemmatization "Educated Text Stemmer" (ETS), and Light10 stemmer) on the Arabic Information Retrieval performance for Arabic language, we used the Linguistic Data Consortium (LDC) Arabic Newswire data set as benchmark dataset. The evaluation of the three different stemmers ranked the best performance was achieved by light10 stemmer in term of mean average precision.

Research paper thumbnail of A Review Study for Arabic Machine Learning and Deep Learning Methods

A Review Study for Arabic Machine Learning and Deep Learning Methods

2022 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS)

Research paper thumbnail of Offensive Language Detection in Social Networks for Arabic Language Using Clustering Techniques

With the advent of social networks, the users have obtained a golden opportunity to express their... more With the advent of social networks, the users have obtained a golden opportunity to express their opinions using text and multimedia. However, some users abused these platforms by introducing acts such as Cyber-Bullying and Cyber-Harassment. Despite the various negative health and social effects, the works proposed toward the detection of these acts are still limited, especially in non-English languages. In Arabic, few works studied this phenomenon. These works had limited datasets. As the number of available training datasets are limited, it is still hard to train classifiers to detect these acts. Therefore, clustering has posed as an alternative solution to tackle this difficulty. In this work, we propose the use of clustering to detect Cyber-Bullying and Cyber-Harassment. We adopted various clustering algorithms including K-Means and Expectation Maximization (EM). Moreover, we used various natural language processing (NLP) tools for this objective. The results illustrate that the...

Research paper thumbnail of Arbitrary Passage Retreval Based on Fixed - Length Window Using Arabic Documents

عمادة البحث العلمي والدراسات العليا - جامعة اليرموك, 2009

Retrieval systems accumulate a great range of documents, from abstracts, newspaper articles, and ... more Retrieval systems accumulate a great range of documents, from abstracts, newspaper articles, and web pages to journal articles, books, court transcripts, and legislation. Collections of different types of documents represent shortcomings in current approaches toward ranking schemes. The use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings. Passage ranking provides suitable units of text, to be returned to the user, can avoid the difficulties of comparing documents of different lengths, and enables identification of short blocks of relevant material amongst otherwise irrelevant text. This paper proposes a new method of passage retrieval called fixed length arbitrary passage retrieval for Arabic documents. This method has been discussed, implemented, and evaluated. The experiment results show that ranking with fixed arbitrary passage gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with other passage types.

Research paper thumbnail of Arab Academy for Banking and Financial Sciences Amman Jordan

Word sense ambiguity is widely spread in all natural languages; a word may carry several distinct... more Word sense ambiguity is widely spread in all natural languages; a word may carry several distinct meanings. Human can figure out the suitable meaning according to the context in which the word occurs. The Arabic language is highly polysemous; in many situations we find it extremely necessary to disambiguate the word senses. This paper studies and compares the performance of a search engine before and after expanding the query through Interactive Word Sense Disambiguation (WSD). We found that expanding polysemous query terms by adding more specific synonyms will narrow the search into the specific targeted request and thus causes both precision and recall to increase; on the other hand, expanding the query with a more general (polysemous) synonym will broaden the search which would cause the precision to decrease.

Research paper thumbnail of Evaluating Machine Translations from Arabic into English and Vice Versa

International Research Journal of Electronics and Computer Engineering, 2017

Machine translation (MT) allows direct communication between two persons without the need for the... more Machine translation (MT) allows direct communication between two persons without the need for the third party or via dictionary in your pocket, which could bring significant and per formative improvement. Since most traditional translational way is a word-sensitive, it is very important to consider the word order in addition to word selection in the evaluation of any machine translation. To evaluate the MT performance, it is necessary to dynamically observe the translation in the machine translator tool according to word order, and word selection and furthermore the sentence length. However, applying a good evaluation with respect to all previous points is a very challenging issue. In this paper, we first summarize various approaches to evaluate machine translation. We propose a practical solution by selecting an appropriate powerful tool called iBLEU to evaluate the accuracy degree of famous MT tools (i.e. Google, Bing, Systranet and Babylon). Based on the solution structure, we further discuss the performance order for these tools in both directions Arabic to English and English to Arabic. After extensive testing, we can decide that any direction gives more accurate results in translation based on the selected machine translations MTs. Finally, we proved the choosing of Google as best system performance and Systranet as the worst one.

Research paper thumbnail of Improved hierarchical classifiers for multi-way sentiment analysis

Improved hierarchical classifiers for multi-way sentiment analysis

Int. Arab J. Inf. Technol., 2017

Sentiment Analysis (SA) is field in computational linguistics concerned with determining the sent... more Sentiment Analysis (SA) is field in computational linguistics concerned with determining the sentiment conveyed in a piece of text towards certain entities (such as people, organizations, products, services, events, etc.) using NLP tools. The considered sentiments can be as simple as positive vs. negative. A more fine-grained approach known as Multi-Way Sentiment Analysis (MWSA) is based on ranking systems, such as the 5-star ranking system. In such systems, rankings close to each other can be confusing; thus, some researchers have suggested that using Hierarchical Classifiers (HCs) can yield better results compared with traditional Flat Classifier (FCs). Unlike FCs, which try to address the entire classification problem at once, HCs employ some kind of tree structures where the nodes are simple “core” classifiers customized to address a subset of the classification problem. This study aims to explore extensively the use of HCs to address MWSA by studying six different hierarchies. ...

Research paper thumbnail of Comparison Between Inverted and Signature Files Based on Arabic Documents

The purpose of this research is to give an idea about inverted files and signature files based on... more The purpose of this research is to give an idea about inverted files and signature files based on Arabic documents collection, and to give the comparison points between the two techniques and the performance of the two techniques on each of the comparison points. The most common measures of system performance used to compare the information retrieval mechanisms are time, space, and recall/precision evaluation measurements. The shorter the response time is, the smaller the space used, the better system is considered to be [1], so our comparisons point will include space overhead, search time, and average recall/precision . In this research, two indices will be built, inverted-file and signature-file. However, to measure the performance of each one, a retrieval system must be built to compare the results of using these indices. A collection of 242 Arabic Abstracts from the proceeding of the Saudi Arabian National Computer Conferences have been used in the two systems, and a collection...

Research paper thumbnail of Arbitrary Passage Retrieval Based on Fixed-Length Window using Arabic Documents

Retrieval systems accumulate a great range of documents, from abstracts, newspaper articles, and ... more Retrieval systems accumulate a great range of documents, from abstracts, newspaper articles, and web pages to journal articles, books, court transcripts, and legislation. Collections of different types of documents represent shortcomings in current approaches toward ranking schemes. The use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings. Passage ranking provides suitable units of text, to be returned to the user, can avoid the difficulties of comparing documents of different lengths, and enables identification of short blocks of relevant material amongst otherwise irrelevant text. This paper proposes a new method of passage retrieval called fixed length arbitrary passage retrieval for Arabic documents. This method has been discussed, implemented, and evaluated. The experiment results show that ranking with fixed arbitrary passage gives substantial improvements in retrieval effectiveness over traditional document ranking s...

Research paper thumbnail of Chapter IV Enhanced Information Retrieval Evaluation between Pseudo Relevance Feedback and Query Similarity Relevant Documents Methodology Applied on Arabic Text

Information retrieval systems utilize user feedback for generating optimal queries with respect t... more Information retrieval systems utilize user feedback for generating optimal queries with respect to a particular information need. However, the methods that have been developed in IR for generating these queries do not memorize information gathered from previous search processes, and hence cannot use such information in new search processes. Thus, a new search process cannot profit from the results of the previous processes. Web Information Retrieval systems should be able to maintain results from previous search processes, thus learning from previous queries and improving overall retrieval quality. In this chapter, we are using the similarity of a new query to previously learned queries. We then expand the new query by extracting terms from documents, which have been judged as relevant to these previously

Research paper thumbnail of Arabic Text Categorization: A Comparison Survey

Arabic Text Categorization: A Comparison Survey

2021 International Conference on Information Technology (ICIT), 2021

Text categorization acquires more significance considering the plenty of text added continually o... more Text categorization acquires more significance considering the plenty of text added continually on the web. The lack of huge and free Arabic datasets makes it more difficult to classify. This paper reviews some text classification papers with some comparisons between the datasets they used, the techniques they applied, and the best results they reached for the different methodologies that have been implemented.

Research paper thumbnail of Enhanced Arabic Information Retrieval by Using Arabic Slang Language

Modern Applied Science, 2019

Slang language has become the most used language in the most countries. It has almost become the ... more Slang language has become the most used language in the most countries. It has almost become the first language in the social media, websites and daily conversations. Moreover, it has become used in many conferences to clarify information and to deliver the required purpose of them. Therefore, this great spread of slang language over the world. In Jordan indicates that it is important to know meanings of Jordanian slang vocabularies. Mainly, In research system, we created a system framework allows users to restore Arabic information depending on queries that are written in slang language and this framework was made basically by context-free grammar to convert from slang to classical and vice versa. In addition, to conclude with, we will apply it on the colloquial slang in North of Jordan specifically; Irbid, Ajloun, Jerash, Mafraq and AlRamtha city. As well as, we will make a special file for Non_Arabic words and the stop words too. After we made an evaluation for the system relying...

Research paper thumbnail of Retrieving Arabic Textual Documents Based on Queries Written in Bahraini Slang Language

Modern Applied Science, 2019

Nowadays, the most used language is the colloquial language not the classical language. It is wid... more Nowadays, the most used language is the colloquial language not the classical language. It is widely used in many nations. The kingdom of Bahrain had the largest share in the spread of the colloquial language, which becomes the trader's language and the language of the social communication too. It became so popular that its usage starts dominating the daily conversations. In this research, we will create algorithm to enhance the process of information retrieval in Arabic slang language of the Gulf. In this algorithm, we put some special Bahraini rules to convert queries from Slang Bahraini to a classical language. In addition, we will apply this algorithm on the Bahraini colloquial language. After making an evaluation for the system relying on the results of three main aspects recall, precision, and F-measure, we noticed that the results of precision about 0.64 for both researches slang and classical, which gives a great indication that the system supports searching in Bahraini ...

Research paper thumbnail of Emotion analysis of Arabic articles and its impact on identifying the author's gender

Emotion analysis of Arabic articles and its impact on identifying the author's gender

2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 2015

The Gender Identification (GI) problem is concerned with determining the gender of the author of ... more The Gender Identification (GI) problem is concerned with determining the gender of the author of a given text based on its contents. The GI problem is one of the authorship profiling problems which have a wide range of applications in various fields such as marketing and security. Due to its importance, extensive research efforts have been invested in the GI problem for different languages. Unfortunately, the same cannot be said about the Arabic language despite its strategic importance and widespread. In this work, we explore the GI problem for Arabic text as a supervised learning problem. Specifically, we consider and compare two approaches for feature extraction. The first one is the Bag-Of-Words (BOW) approach while the second one is based on computing features related to sentiments and emotions. One goal of this work is to confirm the validity of the common stereotype that female authors tend to write in a more emotional way than male authors. Our results show that there is no conclusive evidence that this is true for our dataset.

Research paper thumbnail of An Improved Algorithm for the Extraction of Triliteral Arabic Roots

European Scientific Journal, Jan 31, 2014

Stemming in the Arabic language is extracting the root form of the verb, removing inflectional af... more Stemming in the Arabic language is extracting the root form of the verb, removing inflectional affixes and derivational morphemes. Stemming is a share form of language processing in the systems of information retrieval. It is similar to the morphological processing used in natural language processing, but to some extent has different aims. Stemming is used to reduce word forms to common words. Stemming is the process of removing all affixes from a word to extract its root. This paper describes a stemming algorithm that has been developed for the Arabic language. The algorithm utilizes an important morphological aspect of the Arabic language. The algorithm examines the word and extracts its root. It examines the word letter by letter starting from the end of the word, i.e., from the last letter of the word to the first. The algorithm correctly stems most Arabic words that are derived from roots, and achieves high rate of accuracy. The algorithm has been tested on a corpus of 242 abstracts of Arabic documents from the

Research paper thumbnail of Extracting Named Entities Using Named Entity Recognizer and Generating Topics Using Latent Dirichlet Allocation Algorithm for Arabic News Articles

This paper explains for the Arabic language, how to extract named entities and topics from news a... more This paper explains for the Arabic language, how to extract named entities and topics from news articles. Due to the lack of high quality tools for Named Entity Recognition (NER) and topic identification for Arabic, we have built an Arabic NER (RenA) and an Arabic topic extraction tool using the popular LDA algorithm (ALDA). NER involves extracting information and identifying types, such as name, organization, and location. LDA works by applying statistical methods to vector representations of collections of documents. Though there are effective tools for NER and LDA for English, these are not directly applicable to Arabic. Accordingly, we developed new methods and tools (i.e., RenA and ALDA). To allow assessment of these, and comparison with other methods and tools, we built a baseline corpus to be used in NER evaluation, with help from volunteer graduate students who understand Arabic. RenA produces good results, with accurate Name, Organization, and Location extraction from news articles collected from online resources. We compared the RenA results with a popular Arabic NER, and achieved an enhancement. We also carried out an experiment to evaluate ALDA, again involving volunteer graduate students who understand Arabic. ALDA showed very good results in terms of topics extraction form Arabic news articles, achieving high accuracy, based on an experimental evaluation with participants using a Likert scale.

Research paper thumbnail of Approaches to Retrieve Verses of the Holy Quran Based on Full Meaning

Approaches to Retrieve Verses of the Holy Quran Based on Full Meaning

2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, 2013

Most information retrieval researchers and specialists in NLP have focused on employing the conce... more Most information retrieval researchers and specialists in NLP have focused on employing the concepts of the Holy Quran in their field of work as a step towards combining modern technology and the understanding of Quran itself. Retrieving information from the Holy Quran in response to a particular query has been considered to be a very significant field in research and analysis. Many researchers have found special search options to retrieve verses from the Holy Quran such as exact matching of query, query's stem matching, and query's synonyms matching based on certain thesauruses. In this study, the researchers adopt special search options for retrieving verses from the Holy Quran based on single-word queries according to the full meaning of the Quran's verses in addition to the meaning of the search query; in other words, the researchers in this study add the exact matching of search-query as a default search option in order to retrieve the verses that match the search query in a precise form. After that, the researchers also add query's synonyms matching as a search option to give the user the ability to add any synonym that can enhance the retrieval process. The synonyms selection is not based on a thesaurus but it just an additional search option that can aid in enhancing the Information retrieval process. In order to find the Quran verses that are related to the search query without including the word or its synonyms exactly, the researchers add new special search options based on the full meaning of the verses. These new search options are based on special indices for each verse in the Holy Quran about the topic of the verse and the explanation of its full meaning. The researchers select two books of the Holy Quran explanation that gives the full meaning of each verse; "Tafseer Klmat Al-Quran Tafseer w Bayan" and "Tafseer Al- Jalalayn".

Research paper thumbnail of Different Classification Algorithms Based on Arabic Text Classification: Feature Selection Comparative Study

International Journal of Advanced Computer Science and Applications, 2015

Feature selection is necessary for effective text classification. Dataset preprocessing is essent... more Feature selection is necessary for effective text classification. Dataset preprocessing is essential to make upright result and effective performance. This paper investigates the effectiveness of using feature selection. In this paper we have been compared the performance between different classifiers in different situations using feature selection with stemming, and without stemming.Evaluation used a BBC Arabic dataset, different classification algorithms such as decision tree (D.T), Knearest neighbors (KNN), Naïve Bayesian (NB) method and Naïve Bayes Multinomial(NBM) classifier were used. The experimental results are presented in term of precision, recall, F-Measures, accuracy and time to build model.

Research paper thumbnail of Building an effective rule-based light stemmer for Arabic language to inprove search effectiveness

Building an effective rule-based light stemmer for Arabic language to inprove search effectiveness

2008 International Conference on Innovations in Information Technology, 2008

... [3] Hayder K. Al Ameed, Shaikha O. Al Ketbi, Amna A. Al Kaabi, Khadija S. Al Shebli, Naila F.... more ... [3] Hayder K. Al Ameed, Shaikha O. Al Ketbi, Amna A. Al Kaabi, Khadija S. Al Shebli, Naila F. Al Shamsi, Noura H. Al Nuaimi, Shaikha S. Al Muhairi, Arabic Light Stemmer: Anew Enhanced Approach, Software Engineering Dept., College of Information Technology, UAE ...

Research paper thumbnail of A Comparison between Interactive and Automatic Query Expansion Applied on Arabic Language

A Comparison between Interactive and Automatic Query Expansion Applied on Arabic Language

2007 Innovations in Information Technologies (IIT), 2007

Much attention has been paid to the relative effectiveness of interactive query expansion (IQE) v... more Much attention has been paid to the relative effectiveness of interactive query expansion (IQE) versus automatic query expansion (AQE). This research has been shown that automatic query expansion (collection dependent) strategy gives better performance than no query expansion. The percentage of queries that are improved by AQE strategy is 57% with average precision equal to 43.2. Compared against AQE (collection

Research paper thumbnail of Impact of Stemmer on Arabic Text Retrieval

Information Retrieval Technology, 2014

Stemming is a process of reducing inflected words to their stem, stem or root from a generally wr... more Stemming is a process of reducing inflected words to their stem, stem or root from a generally written word form. One of the high inflected words in the languages world is Arabic Language. Stemming improve the retrieval performance by reducing words variants, and in lcrease the similarity between related words. However, an Arabic Information Retrieval (AIR) can use stemming algorithms to retrieve a greater number of documents related to the users' query. Therefore, the aim of this paper is to evaluate the impact of three different Arabic stemmers (i.e. 'Information Science Research Institute" (ISRI), morphological and syntax based lemmatization "Educated Text Stemmer" (ETS), and Light10 stemmer) on the Arabic Information Retrieval performance for Arabic language, we used the Linguistic Data Consortium (LDC) Arabic Newswire data set as benchmark dataset. The evaluation of the three different stemmers ranked the best performance was achieved by light10 stemmer in term of mean average precision.