Suleiman H Mustafa | Yarmouk University (original) (raw)

Papers by Suleiman H Mustafa

Journal of the Association for Information Science and Technology, 2005

In this article, a word-oriented approximate string matching approach for searching Arabic text i... more In this article, a word-oriented approximate string matching approach for searching Arabic text is presented. The distance between a pair of words is determined on the basis of aligning the two words by using occurrence heuristic tables. Two words are considered related if they have the same morphological or lexical basis. The heuristic reports an approximate match if common letters agree in order and noncommon letters represent valid affixes. The heuristic was tested by using four different alignment strategies: forward, backward, combined forward–backward, and combined backward– forward. Using the error rate and missing rate as performance indicators, the approach was successful in providing more than 80p correct matches. Within the conditions of the experiments performed, the results indicated that the combined forward–backward strategy seemed to exhibit the best performance. Most of the errors were caused by multiple-letter occurrences and by the presence of weak letters in case...

Int. Arab J. Inf. Technol., 2016

The researchers in the field of Arabic sentiment analysis (SA) need a relatively big standard Ara... more The researchers in the field of Arabic sentiment analysis (SA) need a relatively big standard Arabic sentiment analysis corpus to conduct their studies. There are a number of existing Arabic datasets; however they suffer from certain limitations such as the small number of reviews or topics they contain, the restriction to Modern Standard Arabic (MSA), or not being publicly available. Therefore, this study aims to establish a flexible and relatively big standard Arabic sentiment analysis corpus that can be considered as a pillar and cornerstone to build larger Arabic corpora. In addition to MSA, this corpus contains reviews written in the five main Arabic dialects (Egyptian, Levantine, Arabian Peninsula, Mesopotamian, and Maghrebi group). Furthermore, this corpus has other five types of reviews (English, mixed MSA & English, French, mixed MSA & Emoticons, and mixed Egyptian & Emoticons). This corpus is released for free to be used by researchers in this field, where it is characterized by its flexibility in allowing the users to add, remove, and revise its contents. The total number of topics and reviews of this initial copy is 250 and 1,442, respectively. The collected topics are distributed equally among five domains (classes): Economy, Food-Life style, Religion, Sport, and Technology, where each domain has 50 topics. This corpus is built manually to ensure the highest quality to the researchers in this field.

Measuring text similarity has been studied for a long time due to its importance in many applicat... more Measuring text similarity has been studied for a long time due to its importance in many applications in natural language processing and related areas such as Web-based document searching. One such possible application which is investigated in this paper is determining the similarity between course descriptions of the same subject for credit transfer among various universities or similar academic programs. In this paper, three different bi-gram techniques have been used to calculate the similarity between two or more Arabic documents which take the form of course descriptions. One of the techniques uses the vector model to represent each document in a way that each bi-gram is associated with a weight that reflects the importance of the bi-gram in the document. Then the cosine similarity is used to compute the similarity between the two vectors. The other two techniques are: word-based and whole document-based evaluation techniques. In both techniques, the Dice's similarity meas...

2016 7th International Conference on Computer Science and Information Technology (CSIT), 2016

Many Governments have realized the importance of information and communication technology (ICT) t... more Many Governments have realized the importance of information and communication technology (ICT) to improve the delivery of information and services to citizens and business. The fame of internet use, and the wide spread of mobile and portable means made the access to the governmental services easy and reachable at any time for a wide sector of citizens. People with disabilities have all the right to benefit from the e-services provided by the governmental websites equally with other citizens. To do so e-government websites must conform with the Web Content Accessibility Guidelines (WCAG). This paper was undertaken with three main purposes in mind, First: identifying the accessibility level of Jordanian e-Government websites in conformance with web content accessibility Guidelines (WCAG 1.0) at three level of priority, Second: assessing the efforts that have been devoted by the e-government websites to improve accessibility of Jordanian citizens with disability by comparing the recen...

2019 International Arab Conference on Information Technology (ACIT), 2019

Measuring software quality attributes helps in determining the degree of the quality of the softw... more Measuring software quality attributes helps in determining the degree of the quality of the software system. Among the various software attributes, cohesion is considered one of the most important software design concerns. In this study, the focus was on investigating the impact of transitive or indirect relations between classes on measuring cohesion. It was assumed that software metrics which take into account transitive relations would provide cohesion values equal or greater than metrics which consider direct relations. Four metrics are covered in this study, including: TCC (Tight Class Cohesion), LCC (Loose Class Cohesion), LCC-D (Lack of Class Cohesion - Direct), and LCC-I (Lack of Class Cohesion-Indirect). Several programs in C# were selected and a tool was developed for calculating these metrics and their values were compared and correlated. The findings of the study show that the assumptions of this study were not valid for certain cases. The values of TCC and LCC were iden...

This study was based on a major assumption that the lexical structure of Arabic textual words inv... more This study was based on a major assumption that the lexical structure of Arabic textual words involves semantic content that could be used to determine the class of a given word and its functional features within a given text. Hence, the purpose of the study was to explore the extent at which we can rely on word structure to determine word class without the need for using language glossaries and word lists or using the textual context. The results indicate that the morphological structure of Arabic textual word was helpful in achieving a rate of success approaching 79% of the total number of words in the sample used in the study. In certain cases, the approach adopted in the investigation was not adequate for class tagging due to two major reasons, the first of which was the absence of prefixes and suffixes and the second was the incapability of distinguishing affixes from original letters. It was concluded that the approach adopted in this study should be supplemented by using othe...

Due to the affix structure of words in Arabic, a given word may take different forms in different... more Due to the affix structure of words in Arabic, a given word may take different forms in different contexts. These variants may not be recognized as semantically equivalent in IR without some processing, Stemming is one of the most commonly used techniques for doing so. The research reported in this paper evaluates the retrieval effectiveness of four different stemming algorithms for Arabic information retrieval systems, including those reported by Khoja, Taghva, Mustafa, and Aljlayl and compare their performance with no stemming. The first two are considered heavy stemmers, while the others are classified as light stemmers. The evaluation was based on a set of 477 documents on medical herbs comprising more than 95856 Arabic words. The index terms were prepared by an expert in the field. Three performance metrics were used in this study, including average precision at recall 10, P@5, and R-precision. The results indicated that all stemmers significantly outperformed zero stemming. Ho...

This paper reports the results of investigating the impact of query length on the performance of ... more This paper reports the results of investigating the impact of query length on the performance of Arabic retrieval. Thirty queries were used in the investigation, each of which was phrased in three different types of length: short, medium, and longer, giving ninety different queries. A Corpus of one thousand documents on herbal medication was used and expert judgments were used to determine document relevance to each query. The main finding of this research is that using shorter queries improves both precision and recall. Due to the absence of other results to compare with and the lack of agreement on how length affects retrieval, it has been concluded that the results should be viewed in light of the type of dataset used and how queries were formulated and categorized.

Although a number of attempts have been made to develop a stemming formalism for the Arabic langu... more Although a number of attempts have been made to develop a stemming formalism for the Arabic language, most of these attempts have focused merely on the lexical structure of words as modeled by the Arabic grammatical and morphological lexical rules. This paper discusses the merits of light stemming for Arabic data and presents a simple light stemming strategy that has been developed on the basis of an analysis of actual occurrence of suffixes and prefixes in real texts. The performance of this stemming strategy has been compared with that of a heavier stemming strategy that takes into consideration most grammatical prefixes and suffixes. The results indicate that only a few of the prefixes and suffixes have an impact on the correctness of stems generated. Light stemming has exhibited superior performance than heavy stemming in terms of over-stemming and under-stemming measures. It has been shown that the two stemming strategies are significantly different in retrieval performance.

The aim of this research study was to evaluate the websites of Jordan's universities from the... more The aim of this research study was to evaluate the websites of Jordan's universities from the usability perspective. Two online automated tools, namely: html toolbox and web page analyze were used along with a questionnaire directed towards users of these websites. Tools were used to measure the websites internal attributes which can not be perceived by users, such as html code error, download time, and size of html page. The questionnaire was developed and designed based on 23 usability criteria divided into 5 categories. Each category deals with one usability aspect. The results showed that the overall usability level of the studied Websites is acceptable. However, there are some weaknesses in some aspects of the design, interface, and performances. Suggestions are provided in the study to enhance the usability of these websites.

Journal of King Saud University - Computer and Information Sciences, 2002

In this paper, the Arabic lexicon has been investigated in the context of relational database the... more In this paper, the Arabic lexicon has been investigated in the context of relational database theory. A feature analysis of lexical entities has been carried out which shows that lexical attributes can be classified into five categories comprising nineteen attributes, including form attributes, morphological attributes, functional attributes, meaning attributes, and referential attributes. Based on this analysis, eleven database relations have been identified which form the backbone of an Arabic lexical database, including: words, roots, forms, infinitives, verbs, nouns, plurals, particles, meanings, lexical functions, and cross-references. The design ideas discussed in this paper were tested using a sample of lexical items selected from a modern printed dictionary. The results of developing an experimental lexical database indicate that the relational approach provides an efficient method for storing and retrieving Arabic lexical information. It should be mentioned, however, that several problems were encountered when the printed data was translated into a database form. Some of these problems are inherent in the Arabic lexicon itself, while others are due to the way by which lexical information is presented by paper-based dictionaries.

Journal of the American Society for Information Science and Technology, 2005

In this article, a word-oriented approximate string matching approach for searching Arabic text i... more In this article, a word-oriented approximate string matching approach for searching Arabic text is presented. The distance between a pair of words is determined on the basis of aligning the two words by using occurrence heuristic tables. Two words are considered related if they have the same morphological or lexical basis. The heuristic reports an approximate match if common letters agree in order and noncommon letters represent valid affixes. The heuristic was tested by using four different alignment strategies: forward, backward, combined forward-backward, and combined backwardforward. Using the error rate and missing rate as performance indicators, the approach was successful in providing more than 80% correct matches. Within the conditions of the experiments performed, the results indicated that the combined forward-backward strategy seemed to exhibit the best performance. Most of the errors were caused by multiple-letter occurrences and by the presence of weak letters in cases in which the shared core consisted of one or two letters.

Journal of Software Engineering and Applications, 2011

Service-Oriented Architecture (SOA) is becoming the dominant approach for developing and organizi... more Service-Oriented Architecture (SOA) is becoming the dominant approach for developing and organizing distributed enterprise-wide applications. Although the concepts of SOA have been extensively described in the literature and industry, the effects of adopting SOA on software quality are still unclear. The aim of the paper is to analyze how adopting SOA can affect software quality as opposed to the Object-Oriented (OO) paradigm and expose the differential implications of adopting both paradigms on software quality. The paper provides a brief introduction of the architectural differences between the Service-Oriented (SO) and OO paradigms and a description of internal software quality metrics used for the comparison. The effects and differences are exposed by providing a case study architected for both paradigms. The quantitative measure concluded in the paper showed that a software system developed using SOA approach provides higher reusability and lower coupling among software modules, but at the same time higher complexity than those of the OO approach. It was also found that some of the existing OO software quality metrics are inapplicable to SOA software systems. As a consequence, new metrics need to be developed specifically to SOA software systems.

Journal of Information & Knowledge Management, 2005

WSPC Journals Online,WorldSciNet.

Information Processing & Management, 2005

This work assesses the performance of two N -gram matching techniques for Arabic root-driven stri... more This work assesses the performance of two N -gram matching techniques for Arabic root-driven string searching: contiguous N -grams and hybrid N -grams, combining contiguous and non-contiguous. The two techniques were tested using three experiments involving different levels of textual word stemming, a textual corpus containing about 25 thousand words (with a total size of about 160KB), and a set of 100 query textual words. The results of the hybrid approach showed significant performance improvement over the conventional contiguous approach, especially in the cases where stemming was used. The present results and the inconsistent findings of previous studies raise some questions regarding the efficiency of pure conventional N -gram matching and the ways in which it should be used in languages other than English.

Computer Standards & Interfaces, 2002

This paper presents the results of investigating the impact of variations found in character codi... more This paper presents the results of investigating the impact of variations found in character coding schemes on the performance of string hashing. The investigation involved three types of Arabic strings (single words, personal names, and document titles) and four different Arabic coding schemes. The results were examined in three different respects: collision rates, arithmetic code redundancy, and the contribution of

Journal of the Association for Information Science and Technology, 2005

Int. Arab J. Inf. Technol., 2016

2016 7th International Conference on Computer Science and Information Technology (CSIT), 2016

2019 International Arab Conference on Information Technology (ACIT), 2019

Journal of King Saud University - Computer and Information Sciences, 2002

Journal of the American Society for Information Science and Technology, 2005

In this article, a word-oriented approximate string matching approach for searching Arabic text i... more In this article, a word-oriented approximate string matching approach for searching Arabic text is presented. The distance between a pair of words is determined on the basis of aligning the two words by using occurrence heuristic tables. Two words are considered related if they have the same morphological or lexical basis. The heuristic reports an approximate match if common letters agree in order and noncommon letters represent valid affixes. The heuristic was tested by using four different alignment strategies: forward, backward, combined forward-backward, and combined backwardforward. Using the error rate and missing rate as performance indicators, the approach was successful in providing more than 80% correct matches. Within the conditions of the experiments performed, the results indicated that the combined forward-backward strategy seemed to exhibit the best performance. Most of the errors were caused by multiple-letter occurrences and by the presence of weak letters in cases in which the shared core consisted of one or two letters.

Journal of Software Engineering and Applications, 2011

Journal of Information & Knowledge Management, 2005

WSPC Journals Online,WorldSciNet.

Information Processing & Management, 2005

Computer Standards & Interfaces, 2002