ghassan kanaan - Academia.edu (original) (raw)
Papers by ghassan kanaan
Recent periods witnessed lots of intrusion trials, illegal access for data basis belonging to gov... more Recent periods witnessed lots of intrusion trials, illegal access for data basis belonging to governmental institutions and private organizations that depend on computers in their daily work. Such incidents drove attention to the issue of data security and protection. The paper describes a new security scheme to protect the computer data from unauthorized people to access and interpret, once managed to access it. The new security scheme based on a new algorithm to cipher the letters of the text with different keys for each letter based on the letter’s position in the plaintext that will enhance the data security. The strength of the proposed cipher algorithm is that there are multiple ciphertext letters for each plaintext letter.
Proceedings of the Third International Conference on Software and Data Technologies Special Session on Applications in Banking and Finance, 2008
Feature subset selection (FSS) is an important step for effective text classification (TC) system... more Feature subset selection (FSS) is an important step for effective text classification (TC) systems. This paper describes a novel FSS method based on Ant Colony Optimization (ACO) and Chi-square statistic. The proposed method adapted Chi-square statistic as heuristic information and the effectiveness of Support Vector Machines (SVMs) text classifier as a guidance to better selecting features for selective categories. Compared to six classical FSS methods, our proposed ACO-based FSS algorithm achieved better TC effectiveness. Evaluation used an in-house Arabic TC corpus. The experimental results are presented in term of macro-averaging F 1 measure.
Journal of Information Security and Applications, 2017
The Gender Identification (GI) problem is concerned with determining the gender of a given text's... more The Gender Identification (GI) problem is concerned with determining the gender of a given text's author. It has a wide range of academic/commercial applications in various fields including literature, security, forensics, electronic markets and trading, etc. To address this problem, researchers have proposed that the writing styles of authors of the same gender share certain aspects, which can be captured by certain stylometric features (SF). Another approach to address this problem focuses mainly on keywords occurrences in each document. This is known as the Bag-Of-Words (BOW) approach. In this work, we study and compare both approaches and focus on the Arabic language for which this problem is still largely understudied despite its importance. To the best of our knowledge, no previous work has considered these approaches for the GI problem of Arabic text. The comparison is carried out under different settings and the results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings. In fact, the best accuracy levels obtained by the SF and BOW approaches on our in-house dataset are 80.4% and 73.9%, respectively.
Encyclopedia of Social Network Analysis and Mining, 2014
Part-of-Speech tagging is the process of assigning grammatical part-of-speech tags to words based... more Part-of-Speech tagging is the process of assigning grammatical part-of-speech tags to words based on their context. Many automated tagging systems have been developed for English and many other western languages, and for some Asian languages, and have achieved accuracy rates ranging from 95% to 98%. A tagged corpus has more useful information than untagged corpus; so, tagged corpus can be used to extract grammatical and linguistic information from the corpus. Then, it can be used for many applications such as creating dictionaries and grammars of a language using real language data. Tagged corpora are also useful for detailed quantitative analysis of text. In this project, we have described a system for recognizing names, verbs, particles, and proper names in Arabic language text through a combination of high precision morphological analysis and a subsequent component which recognizes the named entities. Although highly deterministic and not taking account of context, the morphological analysis component removes a great deal of morpho-lexical ambiguity; yet, has the side-effect of demonstrating that the true difficulties in Arabic morphological ambiguity might be limited to specific contexts. We have shown that morphological information is crucially important to effective Arabic name recognition. We have tested our system by using some vowelized and non-vowelized text documents, and achieved an accuracy rate of about 93%. We have stated the factors of errors and how this accuracy rate can be enhanced.
International Journal of Advanced Computer Science and Applications, 2016
Sentiment Analysis (SA) is one of hottest fields in data mining (DM) and natural language process... more Sentiment Analysis (SA) is one of hottest fields in data mining (DM) and natural language processing (NLP). The goal of SA is to extract the sentiment conveyed in a certain text based on its content. While most current works focus on the simple problem of determining whether the sentiment is positive or negative, Multi-Way Sentiment Analysis (MWSA) focuses on sentiments conveyed through a rating or scoring system (e.g., a 5-star scoring system). In such scoring systems, the sentiments conveyed in two reviews of close scores (such as 4 stars and 5 stars) can be very similar creating an added challenge compared to traditional SA. One intuitive way of handling this challenge is via a divide-and-conquer approach where the MWSA problem is divided into a set of sub-problems allowing the use of customized classifiers to differentiate between reviews of close scores. A hierarchical classification structure can be used with this approach where each node represents a different classification sub-problem and the decision from it may lead to the invocation of another classifier. In this work, we show how the use of this divide-and-conquer hierarchical structure of classifiers can generate better results than the use of existing flat classifiers for the MWSA problem. We focus on the Arabic language for many reasons such as the importance of this language and the scarcity of prior works and available tools for it. To the best of our knowledge, very few papers have been published on MWSA of Arabic reviews. One notable work is that of Ali and Atiya, in which the authors collected a large scale Arabic Book Reviews (LABR) dataset and made it publicly available. Unfortunately, the baseline experiments on this dataset had very low accuracy. We present two different hierarchical structures and compare their accuracies with the flat structure using different core classifiers. The comparison is based on standard accuracy measures such as precision and recall in addition to using the mean squared error (MSE) as a more accurate measure given the fact that not all misclassifications are the same. The results show that, in general, hierarchical classifiers give significant improvements (of more than 50% in certain cases) over flat classifiers.
International Journal of Information Retrieval Research, 2012
In the authors’ study they evaluate and compare the storage efficiency of different sparse matrix... more In the authors’ study they evaluate and compare the storage efficiency of different sparse matrix storage structures as index structure for Arabic text collection and their corresponding sparse matrix-vector multiplication algorithms to perform query processing in any Information Retrieval (IR) system. The study covers six sparse matrix storage structures including the Coordinate Storage (COO), Compressed Sparse Row (CSR), Compressed Sparse Column (CSC), Block Coordinate (BCO), Block Sparse Row (BSR), and Block Sparse Column (BSC). Evaluation depends on the storage space requirements for each storage structure and the efficiency of the query processing algorithm. The experimental results demonstrate that CSR is more efficient in terms of storage space requirements and query processing time than the other sparse matrix storage structures. The results also show that CSR requires the least amount of disk space and performs the best in terms of query processing time compared with the ot...
2009 Sixth International Conference on Computer Graphics, Imaging and Visualization, 2009
Stemming is one of many tools used in information retrieval to combat the vocabulary mismatch pro... more Stemming is one of many tools used in information retrieval to combat the vocabulary mismatch problem, in which query words do not match document words. Stemming in the Arabic language does not fit into the usual mold, because stemming in most research in other languages so far depends only on eliminating prefixes and suffixes from the word, but Arabic words
Journal of Computer Science, 2006
The development of an efficient compression scheme to process the Arabic language represents a di... more The development of an efficient compression scheme to process the Arabic language represents a difficult task. This paper employs the dynamic Huffman coding on data compression with variable length bit coding, on the Arabic language. Experimental tests have been performed on both Arabic and English text. A comparison is made to measure the efficiency of compressing data results on both Arabic and English text. Also a comparison is made between the compression rate and the size of the file to be compressed. It has been found that as the file size increases, the compression ratio decreases for both Arabic and English text. The experimental results show that the average message length and the efficiency of compression on Arabic text is better than the compression on English text. Also, results show that the main factor which significantly affects compression ratio and average message length is the frequency of the symbols on the text.
Journal of Computer Science, 2006
A mobile Ad-hoc NETwork (MANET) is wireless network composed of mobile nodes that are dynamically... more A mobile Ad-hoc NETwork (MANET) is wireless network composed of mobile nodes that are dynamically and randomly located in such a manner that the interconnections between nodes are capable of changing on a continual basis. In order to facilitate communication within the network, a routing protocol is used to discover routes between nodes. The primary goal of such an ad-hoc network routing protocol is correct and efficient route establishment between a pair of nodes so that messages may be delivered in a timely manner. Route construction and maintenance should be done with a minimum of overhead and bandwidth consumption. The ABR is a source-initiated protocol and is working on the assumption of stable route from the source to the destination node. Maintenance for the route when the destination node moves will be performed in backtracking scheme starting from the immediate upstream node from the destination. If this process results in backtracking more than halfway to the source, it will discontinue and a new route request will be initiated from the source. In the case if the Source Node moves, then the Source Node will invoke a route reconstruction because the ABR is source-initiated protocol. This study presents an enhanced method for the route reconstruction in case the source, the intermediate, or the destination node changes its location by giving more active role to the moving node in maintaining the established route.
2009 International Conference on Information Management and Engineering, 2009
... Meta-search is the application of data fusion to document retrieval, Metasearch engine takes ... more ... Meta-search is the application of data fusion to document retrieval, Metasearch engine takes as an input the N ranked lists output by each of N search engines in response to a given query, As output, it ... [4] Javed Aslam, and Mark Montague, Models for Metasearch, In Proc. ...
Soft Computing Applications and Intelligent Systems, 2013
Stemming is a process of reducing inflected words to their stem, base or root from a generally wr... more Stemming is a process of reducing inflected words to their stem, base or root from a generally written word form. For languages that is high inflected like Arabic. Stemming improve the retrieval performance by reducing words variants. The effectiveness of stop words lists with light stemming for Arabic information retrieval (General stopwords list, Khoja stopwords list, Combined stopwords list), were investigated in this paper. Using vector space model as the popular weighting scheme was examined. The idea is to combine (General and Khoja) stopwords lists with light stemming to enhance the performance, and compare their effects on retrieval. The Linguistic Data Consortium (LDC) Arabic Newswire data set was used. The best performance was achieved with the Combined stopwords list, with light stemming.
2010 2nd IEEE International Conference on Information Management and Engineering, 2010
the data-fusion techniques have been investigated by many researchers and have been used in imple... more the data-fusion techniques have been investigated by many researchers and have been used in implementing several information retrieval systems. Introducing a new or improved data-fusion algorithm is an active research area for the researchers' community. We propose a framework for analyses and improvement of Data-fusion algorithms; this framework is going to be: First; a supportive tool for researchers when they are going to design a new Data-fusion algorithm by providing them with an extensive analysis and refinement for their new fusion algorithms, second; it can help researchers in understanding, analyzing, and improving existing Data-fusion algorithms.
International Journal of Computer Processing of Languages, 2004
We have selected 242 Arabic abstracts used by (Hmeidi and Kanaan, 1997); all of which involve com... more We have selected 242 Arabic abstracts used by (Hmeidi and Kanaan, 1997); all of which involve computer science and information system. We have also designed and built a new system to compare two different retrieval tasks: Ad-hoc retrieval and filtering retrieval. However, we have defined Ad-hoc and filtering retrieval systems and illustrated the development strategy for each system. We have compared both tasks using recall/precision evaluation, system usability, searching domain, ranking, construction complexity, and methodology.. From this experiment, we conclude that Ad-hoc retrieval is better than filtering retrieving. We, also, take in our account the advantages of using filtering services in information retrieval process. The objective of this research is to automate the process of examining documents by computing comparisons between the representation of the information need (the Queries) and the representation of the documents. Also, we will automate the process of representing information-needs as user-profiles by computing the comparison with the representation of documents. The automated process is considered successful when it produces results similar to those produced by human comparison of the documents themselves with actual information need. However, as a result, we will compare ad-hoc retrieval and filtering retrieval tasks and conclude the differences between them in term of information retrieval process.
2009 International Conference on Information Management and Engineering, 2009
... European Conf. on Machine Learning, LNAI 4701, Springer Berlin / Heidelberg, 2007, pp. 616-62... more ... European Conf. on Machine Learning, LNAI 4701, Springer Berlin / Heidelberg, 2007, pp. 616-623. [4] Javed Aslam and Mark Montague, Models for Metasearch, In Proc. ACM SIGIR 2001 Conf., ACM press, New Orleans, Louisiana, 2001, pp. 276-284. ...
International Journal of Advanced Computer Science and Applications, 2013
The aim of this paper is to uncover the reasons behind what so-called total failure in e-governme... more The aim of this paper is to uncover the reasons behind what so-called total failure in e-government project in Jordan. Reviewing the published papers in this context revealed that both citizens and employees do not understand the current status of this program. The majority of these papers measure the quality of e-services presented by e-government. However, according to the minister of Communication and Information Technologies (MOCIT), only three e-services are provided by this program up to writing this paper. Moreover, he decided to freeze the current working on e-government programme. These facts drove the authors to conduct this research. General review of the existing literature concerning e-government implementation in Jordan was applied, then a qualitative research was utilised to uncover the reasons behind the failure of the e-government program in Jordan. The collected data then was analysed using Strauss and Corbin's method of grounded theory. This paper illustrates that Jordanian government need to exert strenuous efforts to move from the first stage of e-government implementation into an interactive one after fourteen years of launching the program, considering that only three e-services are presented up to October 2013. Reasons behind the failure of egovernment in Jordan have also been identified.
International Journal of Computer Applications, 2013
By utilizing a search engine for the inquired object, the user may get what he has looked for. Ho... more By utilizing a search engine for the inquired object, the user may get what he has looked for. However, in the average; the number of words the user comes up with for a query is two or three in general [23]. This mostly causes a number of problems. To overcome such problems, various query expansion techniques have been developed. However, none of them are asserted to present the optimal solution, especially in Arabic language because its complex morphological structure. Thus, the main objective of this paper is to optimize Arabic queries using comprehensive combination of these expansion techniques that can be used to enhance the process of query expansion and to retrieve the maximum number of the relevant documents for the Arabic user's query. The paper found that the developed system improved the recall and precision over couples of separated techniques. This method gets the benefits of both expansion approaches: interactive and automatic query; because the inquired object is automatically expanded and users are discretely engaged in query expansion.
2007 Innovations in Information Technologies (IIT), 2007
The paper presents enhanced, effective and simple approach to text classification. The approach u... more The paper presents enhanced, effective and simple approach to text classification. The approach uses an algorithm to automatically classifying documents. The main idea of the algorithm is to select feature words from each document, those words cover all the ideas in the ...
2007 Innovations in Information Technologies (IIT), 2007
The paper describes a new stemmer algorithm to find the roots and patterns for Arabic words based... more The paper describes a new stemmer algorithm to find the roots and patterns for Arabic words based on excessive letter locations. The algorithm locates the trilateral root , quadri-literal root as well as the pentaliteral root. The algorithm is written with the goal of supporting natural language processing programs such as parsers and information retrieval systems. The algorithm has been
Egyptian Computer Science Journal, 2005
ABSTRACT
Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004.
Abstract Summary form only given. We present a new stemming algorithm to extract quadri-literal A... more Abstract Summary form only given. We present a new stemming algorithm to extract quadri-literal Arabic roots. The algorithm starts by excluding the prefixes and checks then the word characters starting from the last letter backward to the first one. A temporary matrix is used to store the suffix letters of the Arabic word, and another matrix is used to store the roots. The partition process is preceded by removing the particle from the source word. Checking the letters of any word includes checking whether the tested letter is included within the ...
Recent periods witnessed lots of intrusion trials, illegal access for data basis belonging to gov... more Recent periods witnessed lots of intrusion trials, illegal access for data basis belonging to governmental institutions and private organizations that depend on computers in their daily work. Such incidents drove attention to the issue of data security and protection. The paper describes a new security scheme to protect the computer data from unauthorized people to access and interpret, once managed to access it. The new security scheme based on a new algorithm to cipher the letters of the text with different keys for each letter based on the letter’s position in the plaintext that will enhance the data security. The strength of the proposed cipher algorithm is that there are multiple ciphertext letters for each plaintext letter.
Proceedings of the Third International Conference on Software and Data Technologies Special Session on Applications in Banking and Finance, 2008
Feature subset selection (FSS) is an important step for effective text classification (TC) system... more Feature subset selection (FSS) is an important step for effective text classification (TC) systems. This paper describes a novel FSS method based on Ant Colony Optimization (ACO) and Chi-square statistic. The proposed method adapted Chi-square statistic as heuristic information and the effectiveness of Support Vector Machines (SVMs) text classifier as a guidance to better selecting features for selective categories. Compared to six classical FSS methods, our proposed ACO-based FSS algorithm achieved better TC effectiveness. Evaluation used an in-house Arabic TC corpus. The experimental results are presented in term of macro-averaging F 1 measure.
Journal of Information Security and Applications, 2017
The Gender Identification (GI) problem is concerned with determining the gender of a given text's... more The Gender Identification (GI) problem is concerned with determining the gender of a given text's author. It has a wide range of academic/commercial applications in various fields including literature, security, forensics, electronic markets and trading, etc. To address this problem, researchers have proposed that the writing styles of authors of the same gender share certain aspects, which can be captured by certain stylometric features (SF). Another approach to address this problem focuses mainly on keywords occurrences in each document. This is known as the Bag-Of-Words (BOW) approach. In this work, we study and compare both approaches and focus on the Arabic language for which this problem is still largely understudied despite its importance. To the best of our knowledge, no previous work has considered these approaches for the GI problem of Arabic text. The comparison is carried out under different settings and the results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings. In fact, the best accuracy levels obtained by the SF and BOW approaches on our in-house dataset are 80.4% and 73.9%, respectively.
Encyclopedia of Social Network Analysis and Mining, 2014
Part-of-Speech tagging is the process of assigning grammatical part-of-speech tags to words based... more Part-of-Speech tagging is the process of assigning grammatical part-of-speech tags to words based on their context. Many automated tagging systems have been developed for English and many other western languages, and for some Asian languages, and have achieved accuracy rates ranging from 95% to 98%. A tagged corpus has more useful information than untagged corpus; so, tagged corpus can be used to extract grammatical and linguistic information from the corpus. Then, it can be used for many applications such as creating dictionaries and grammars of a language using real language data. Tagged corpora are also useful for detailed quantitative analysis of text. In this project, we have described a system for recognizing names, verbs, particles, and proper names in Arabic language text through a combination of high precision morphological analysis and a subsequent component which recognizes the named entities. Although highly deterministic and not taking account of context, the morphological analysis component removes a great deal of morpho-lexical ambiguity; yet, has the side-effect of demonstrating that the true difficulties in Arabic morphological ambiguity might be limited to specific contexts. We have shown that morphological information is crucially important to effective Arabic name recognition. We have tested our system by using some vowelized and non-vowelized text documents, and achieved an accuracy rate of about 93%. We have stated the factors of errors and how this accuracy rate can be enhanced.
International Journal of Advanced Computer Science and Applications, 2016
Sentiment Analysis (SA) is one of hottest fields in data mining (DM) and natural language process... more Sentiment Analysis (SA) is one of hottest fields in data mining (DM) and natural language processing (NLP). The goal of SA is to extract the sentiment conveyed in a certain text based on its content. While most current works focus on the simple problem of determining whether the sentiment is positive or negative, Multi-Way Sentiment Analysis (MWSA) focuses on sentiments conveyed through a rating or scoring system (e.g., a 5-star scoring system). In such scoring systems, the sentiments conveyed in two reviews of close scores (such as 4 stars and 5 stars) can be very similar creating an added challenge compared to traditional SA. One intuitive way of handling this challenge is via a divide-and-conquer approach where the MWSA problem is divided into a set of sub-problems allowing the use of customized classifiers to differentiate between reviews of close scores. A hierarchical classification structure can be used with this approach where each node represents a different classification sub-problem and the decision from it may lead to the invocation of another classifier. In this work, we show how the use of this divide-and-conquer hierarchical structure of classifiers can generate better results than the use of existing flat classifiers for the MWSA problem. We focus on the Arabic language for many reasons such as the importance of this language and the scarcity of prior works and available tools for it. To the best of our knowledge, very few papers have been published on MWSA of Arabic reviews. One notable work is that of Ali and Atiya, in which the authors collected a large scale Arabic Book Reviews (LABR) dataset and made it publicly available. Unfortunately, the baseline experiments on this dataset had very low accuracy. We present two different hierarchical structures and compare their accuracies with the flat structure using different core classifiers. The comparison is based on standard accuracy measures such as precision and recall in addition to using the mean squared error (MSE) as a more accurate measure given the fact that not all misclassifications are the same. The results show that, in general, hierarchical classifiers give significant improvements (of more than 50% in certain cases) over flat classifiers.
International Journal of Information Retrieval Research, 2012
In the authors’ study they evaluate and compare the storage efficiency of different sparse matrix... more In the authors’ study they evaluate and compare the storage efficiency of different sparse matrix storage structures as index structure for Arabic text collection and their corresponding sparse matrix-vector multiplication algorithms to perform query processing in any Information Retrieval (IR) system. The study covers six sparse matrix storage structures including the Coordinate Storage (COO), Compressed Sparse Row (CSR), Compressed Sparse Column (CSC), Block Coordinate (BCO), Block Sparse Row (BSR), and Block Sparse Column (BSC). Evaluation depends on the storage space requirements for each storage structure and the efficiency of the query processing algorithm. The experimental results demonstrate that CSR is more efficient in terms of storage space requirements and query processing time than the other sparse matrix storage structures. The results also show that CSR requires the least amount of disk space and performs the best in terms of query processing time compared with the ot...
2009 Sixth International Conference on Computer Graphics, Imaging and Visualization, 2009
Stemming is one of many tools used in information retrieval to combat the vocabulary mismatch pro... more Stemming is one of many tools used in information retrieval to combat the vocabulary mismatch problem, in which query words do not match document words. Stemming in the Arabic language does not fit into the usual mold, because stemming in most research in other languages so far depends only on eliminating prefixes and suffixes from the word, but Arabic words
Journal of Computer Science, 2006
The development of an efficient compression scheme to process the Arabic language represents a di... more The development of an efficient compression scheme to process the Arabic language represents a difficult task. This paper employs the dynamic Huffman coding on data compression with variable length bit coding, on the Arabic language. Experimental tests have been performed on both Arabic and English text. A comparison is made to measure the efficiency of compressing data results on both Arabic and English text. Also a comparison is made between the compression rate and the size of the file to be compressed. It has been found that as the file size increases, the compression ratio decreases for both Arabic and English text. The experimental results show that the average message length and the efficiency of compression on Arabic text is better than the compression on English text. Also, results show that the main factor which significantly affects compression ratio and average message length is the frequency of the symbols on the text.
Journal of Computer Science, 2006
A mobile Ad-hoc NETwork (MANET) is wireless network composed of mobile nodes that are dynamically... more A mobile Ad-hoc NETwork (MANET) is wireless network composed of mobile nodes that are dynamically and randomly located in such a manner that the interconnections between nodes are capable of changing on a continual basis. In order to facilitate communication within the network, a routing protocol is used to discover routes between nodes. The primary goal of such an ad-hoc network routing protocol is correct and efficient route establishment between a pair of nodes so that messages may be delivered in a timely manner. Route construction and maintenance should be done with a minimum of overhead and bandwidth consumption. The ABR is a source-initiated protocol and is working on the assumption of stable route from the source to the destination node. Maintenance for the route when the destination node moves will be performed in backtracking scheme starting from the immediate upstream node from the destination. If this process results in backtracking more than halfway to the source, it will discontinue and a new route request will be initiated from the source. In the case if the Source Node moves, then the Source Node will invoke a route reconstruction because the ABR is source-initiated protocol. This study presents an enhanced method for the route reconstruction in case the source, the intermediate, or the destination node changes its location by giving more active role to the moving node in maintaining the established route.
2009 International Conference on Information Management and Engineering, 2009
... Meta-search is the application of data fusion to document retrieval, Metasearch engine takes ... more ... Meta-search is the application of data fusion to document retrieval, Metasearch engine takes as an input the N ranked lists output by each of N search engines in response to a given query, As output, it ... [4] Javed Aslam, and Mark Montague, Models for Metasearch, In Proc. ...
Soft Computing Applications and Intelligent Systems, 2013
Stemming is a process of reducing inflected words to their stem, base or root from a generally wr... more Stemming is a process of reducing inflected words to their stem, base or root from a generally written word form. For languages that is high inflected like Arabic. Stemming improve the retrieval performance by reducing words variants. The effectiveness of stop words lists with light stemming for Arabic information retrieval (General stopwords list, Khoja stopwords list, Combined stopwords list), were investigated in this paper. Using vector space model as the popular weighting scheme was examined. The idea is to combine (General and Khoja) stopwords lists with light stemming to enhance the performance, and compare their effects on retrieval. The Linguistic Data Consortium (LDC) Arabic Newswire data set was used. The best performance was achieved with the Combined stopwords list, with light stemming.
2010 2nd IEEE International Conference on Information Management and Engineering, 2010
the data-fusion techniques have been investigated by many researchers and have been used in imple... more the data-fusion techniques have been investigated by many researchers and have been used in implementing several information retrieval systems. Introducing a new or improved data-fusion algorithm is an active research area for the researchers' community. We propose a framework for analyses and improvement of Data-fusion algorithms; this framework is going to be: First; a supportive tool for researchers when they are going to design a new Data-fusion algorithm by providing them with an extensive analysis and refinement for their new fusion algorithms, second; it can help researchers in understanding, analyzing, and improving existing Data-fusion algorithms.
International Journal of Computer Processing of Languages, 2004
We have selected 242 Arabic abstracts used by (Hmeidi and Kanaan, 1997); all of which involve com... more We have selected 242 Arabic abstracts used by (Hmeidi and Kanaan, 1997); all of which involve computer science and information system. We have also designed and built a new system to compare two different retrieval tasks: Ad-hoc retrieval and filtering retrieval. However, we have defined Ad-hoc and filtering retrieval systems and illustrated the development strategy for each system. We have compared both tasks using recall/precision evaluation, system usability, searching domain, ranking, construction complexity, and methodology.. From this experiment, we conclude that Ad-hoc retrieval is better than filtering retrieving. We, also, take in our account the advantages of using filtering services in information retrieval process. The objective of this research is to automate the process of examining documents by computing comparisons between the representation of the information need (the Queries) and the representation of the documents. Also, we will automate the process of representing information-needs as user-profiles by computing the comparison with the representation of documents. The automated process is considered successful when it produces results similar to those produced by human comparison of the documents themselves with actual information need. However, as a result, we will compare ad-hoc retrieval and filtering retrieval tasks and conclude the differences between them in term of information retrieval process.
2009 International Conference on Information Management and Engineering, 2009
... European Conf. on Machine Learning, LNAI 4701, Springer Berlin / Heidelberg, 2007, pp. 616-62... more ... European Conf. on Machine Learning, LNAI 4701, Springer Berlin / Heidelberg, 2007, pp. 616-623. [4] Javed Aslam and Mark Montague, Models for Metasearch, In Proc. ACM SIGIR 2001 Conf., ACM press, New Orleans, Louisiana, 2001, pp. 276-284. ...
International Journal of Advanced Computer Science and Applications, 2013
The aim of this paper is to uncover the reasons behind what so-called total failure in e-governme... more The aim of this paper is to uncover the reasons behind what so-called total failure in e-government project in Jordan. Reviewing the published papers in this context revealed that both citizens and employees do not understand the current status of this program. The majority of these papers measure the quality of e-services presented by e-government. However, according to the minister of Communication and Information Technologies (MOCIT), only three e-services are provided by this program up to writing this paper. Moreover, he decided to freeze the current working on e-government programme. These facts drove the authors to conduct this research. General review of the existing literature concerning e-government implementation in Jordan was applied, then a qualitative research was utilised to uncover the reasons behind the failure of the e-government program in Jordan. The collected data then was analysed using Strauss and Corbin's method of grounded theory. This paper illustrates that Jordanian government need to exert strenuous efforts to move from the first stage of e-government implementation into an interactive one after fourteen years of launching the program, considering that only three e-services are presented up to October 2013. Reasons behind the failure of egovernment in Jordan have also been identified.
International Journal of Computer Applications, 2013
By utilizing a search engine for the inquired object, the user may get what he has looked for. Ho... more By utilizing a search engine for the inquired object, the user may get what he has looked for. However, in the average; the number of words the user comes up with for a query is two or three in general [23]. This mostly causes a number of problems. To overcome such problems, various query expansion techniques have been developed. However, none of them are asserted to present the optimal solution, especially in Arabic language because its complex morphological structure. Thus, the main objective of this paper is to optimize Arabic queries using comprehensive combination of these expansion techniques that can be used to enhance the process of query expansion and to retrieve the maximum number of the relevant documents for the Arabic user's query. The paper found that the developed system improved the recall and precision over couples of separated techniques. This method gets the benefits of both expansion approaches: interactive and automatic query; because the inquired object is automatically expanded and users are discretely engaged in query expansion.
2007 Innovations in Information Technologies (IIT), 2007
The paper presents enhanced, effective and simple approach to text classification. The approach u... more The paper presents enhanced, effective and simple approach to text classification. The approach uses an algorithm to automatically classifying documents. The main idea of the algorithm is to select feature words from each document, those words cover all the ideas in the ...
2007 Innovations in Information Technologies (IIT), 2007
The paper describes a new stemmer algorithm to find the roots and patterns for Arabic words based... more The paper describes a new stemmer algorithm to find the roots and patterns for Arabic words based on excessive letter locations. The algorithm locates the trilateral root , quadri-literal root as well as the pentaliteral root. The algorithm is written with the goal of supporting natural language processing programs such as parsers and information retrieval systems. The algorithm has been
Egyptian Computer Science Journal, 2005
ABSTRACT
Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004.
Abstract Summary form only given. We present a new stemming algorithm to extract quadri-literal A... more Abstract Summary form only given. We present a new stemming algorithm to extract quadri-literal Arabic roots. The algorithm starts by excluding the prefixes and checks then the word characters starting from the last letter backward to the first one. A temporary matrix is used to store the suffix letters of the Arabic word, and another matrix is used to store the roots. The partition process is preceded by removing the particle from the source word. Checking the letters of any word includes checking whether the tested letter is included within the ...