Ahmed Guessoum - Academia.edu (original) (raw)

Papers by Ahmed Guessoum

Research paper thumbnail of ANETAC: Arabic Named Entity Transliteration and Classification Dataset

arXiv (Cornell University), Jul 6, 2019

In this paper, we make freely accessible ANETAC 1 our English-Arabic named entity transliteration... more In this paper, we make freely accessible ANETAC 1 our English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79, 924 instances, each instance is a triplet (e, a, c), where e is the English named entity, a is its Arabic transliteration and c is its class that can be either a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed for the researchers that are working on Arabic named entity transliteration, but it can also be used for named entity classification purposes. This dataset was developed and used as part of a previous research study done by Hadj Ameur et al. [1].

Research paper thumbnail of Ontological Relation Classification Using WordNet, Word Embeddings and Deep Neural Networks

Modelling and Implementation of Complex Systems, 2020

Learning ontological relations is an important step on the way to automatically developing ontolo... more Learning ontological relations is an important step on the way to automatically developing ontologies. This paper introduces a novel way to exploit WordNet [16], the combination of pre-trained word embeddings and deep neural networks for the task of ontological relation classification. The data from WordNet and the knowledge encapsulated in the pre-trained word vectors are combined into an enriched dataset. In this dataset a pair of terms that are linked in WordNet through some ontological relation are represented by their word embeddings. A Deep Neural Network uses this dataset to learn the classification of ontological relations based on the word embeddings. The implementation of this approach has yielded encouraging results, which should help the ontology learning research community develop tools for ontological relation extraction.

Research paper thumbnail of Grammatical Evolution Association Rule Mining to Detect Gene-Gene Interaction

Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms, 2014

An important goal of human genetics is to identify DNA sequence variations that increase or decre... more An important goal of human genetics is to identify DNA sequence variations that increase or decrease specific disease susceptibility. Complex interactions among genes and environmental factors are known to play a role in common human disease etiology. Methods for association rule mining (ARM) are highly successful; especially that they produce rules which are easily interpretable. This has made them widely used in various domains. During the different stages of the knowledge discovery process, several problems are faced. It turns out that, the search characteristics of Evolutionary Algorithms make them suited to solve this kind of problems. In this study, we introduce GEARM, a novel approach for discovering association rules using Grammatical Evolution. We present the approach and evaluate it on simulated data that represents epistasis models. We show that this method improves the performance of gene-gene interaction detection.

Research paper thumbnail of An Automatic Approach for WordNet Enrichment Applied to Arabic WordNet

This paper introduces an automatic method to extend existing WordNets via machine translation. Ou... more This paper introduces an automatic method to extend existing WordNets via machine translation. Our proposal relies on the hierarchical skeleton of the English Princeton WordNet (PWN) as a backbone to extend their taxonomies. Our proposal is applied to the Arabic WordNet (AWN) to enrich it by adding new synsets, and also by providing vocalizations and usage examples for each inserted lemma. Around 12000 new potential synsets can be added to AWN with a precision of at least \(93\%\). As such the coverage of AWN in terms of synsets can be increased from 11269 to around 24000 a very promising achievement on the path of enriching the Arabic WordNet.

Research paper thumbnail of Ontology learning: Grand tour and challenges

Computer Science Review, 2021

Ontologies are at the core of the semantic web. As knowledge bases, they are very useful resource... more Ontologies are at the core of the semantic web. As knowledge bases, they are very useful resources for many artificial intelligence applications. Ontology learning, as a research area, proposes techniques to automate several tasks of the ontology construction process to simplify the tedious work of manually building ontologies. In this paper we present the state of the art of this field. Different classes of approaches are covered (linguistic, statistical, and machine learning), including some recent ones (deep-learning-based approaches). In addition, some relevant solutions (frameworks), which offer strategies and built-in methods for ontology learning, are presented. A descriptive summary is made to point out the capabilities of the different contributions based on criteria that have to do with the produced ontology components and the degree of automation. We also highlight the challenge of evaluating ontologies to make them reliable, since it is not a trivial task in this field; it actually represents a research area on its own. Finally, we identify some unresolved issues and open questions.

Research paper thumbnail of Arabic Machine Translation: A survey of the latest trends and challenges

Computer Science Review, 2020

Given that Arabic is one of the most widely used languages in the world, the task of Arabic Machi... more Given that Arabic is one of the most widely used languages in the world, the task of Arabic Machine Translation (MT) has recently received a great deal of attention from the research community. Indeed, the amount of research that has been devoted to this task has led to some important achievements and improvements. However, the current state of Arabic MT systems has not reached the quality achieved for some other languages. Thus, much research work is still needed to improve it. This survey paper introduces the Arabic language, its characteristics, and the challenges involved in its translation. It provides the reader with a full summary of the important research studies that have been accomplished with regard to Arabic MT along with the most important tools and resources that are available for building and testing new Arabic MT systems. Furthermore, the survey paper discusses the current state of Arabic MT and provides some insights into possible future research directions.

Research paper thumbnail of Recommendation of users in social networks: A semantic and social based classification approach

Expert Systems, 2020

Recently, the study of social network‐based recommender systems has become an active research top... more Recently, the study of social network‐based recommender systems has become an active research topic. The integration of the social relationships that exist between users can improve the accuracy of recommendation results since the users' preferences are similar or influenced by their connected friends. We focus in this article on the recommendation of users in social networks. Our approach is based on semantic and social representations of the users' profiles. We have formalized and illustrated these two dimensions using the Yelp social network. The novelty of our approach concerns the modelling of the credibility of the user, through his/her trust and commitment in the social network. Moreover, in order to optimize the performance of the recommendation process, we have used two classification techniques: an unsupervised technique that uses the K‐means algorithm (applied initially to all users); and a supervised technique that uses the K‐Nearest Neighbours algorithm (applied...

Research paper thumbnail of Improving Arabic neural machine translation via n-best list re-ranking

Machine Translation, 2019

Even though the rise of the Neural Machine Translation (NMT) paradigm has brought a great deal of... more Even though the rise of the Neural Machine Translation (NMT) paradigm has brought a great deal of improvement to the machine translation field, the current translation results are still not perfect. One of the main reasons for this imperfection is the decoding task complexity. Indeed, the problem of finding the one best translation from the space of all possible translations was and still is a challenging problem. One of the most successful ways to address it is via n-best list re-ranking which attempts to reorder the n-best decoder translations according to some defined features. In this paper, we propose a set of new re-ranking features that can be extracted directly from the parallel corpus without needing any external tools. The features set that we propose takes into account lexical, syntactic, and even semantic aspects of the n-best list translations. We also present a method for feature weights optimization that uses a Quantum-behaved Particle Swarm Optimization (QPSO) algorithm. Our system has been evaluated on multiple English-to-Arabic and Arabic-to-English machine translation test sets, and the obtained re-ranking results yield noticeable improvements over the baseline NMT systems.

Research paper thumbnail of Sentiment Analysis of Users on Social Networks: Overcoming the challenge of the Loose Usages of the Algerian Dialect

Procedia Computer Science, 2018

Abstract Sentiment Analysis (SA) focuses on the study and analysis of peoples’ opinions, sentimen... more Abstract Sentiment Analysis (SA) focuses on the study and analysis of peoples’ opinions, sentiments and emotions based on written language. It is currently a very active research area in NLP. The growth of Web 2.0 has given all internet users the additional power of “interactivity, interoperability, and collaboration [1]. This has rapidly opened the door for the development of social media and the large interaction between users in all walks of life. Social media platforms are currently exploited by many companies as a major channel to advertise and sell products. As such, tools are clearly needed to analyse peoples’ opinions on and reviews of the various products, feedback on events, etc. In recent years, researchers on Arabic NLP have made some good effort tackling the problem of SA. These efforts have been more focused during the last couple of years on Arabic dialects and, to a lesser extent, on the dialects of the Maghreb region, even less on the Algerian Dialect (AlgD). The processing of this dialect is made even more complex with the frequent code switching by its speakers between Arabic and Latin letters. Facebook being widely used in the Arab world, and in Algeria more specifically, we are interested in this paper in the SA of Algerian users’ comments on various Facebook pages. A painstaking pre-processing of a corpus of such comments is done, and two neural network models, MLP and CNN, are trained to classify comments as negative, neutral or positive. Though a complex dialect, we have obtained an 81.6% accuracy with the MLP network and 89.5% accuracy with the CNN. We find this as a very encouraging result.

Research paper thumbnail of Arabic Machine Transliteration using an Attention-based Encoder-decoder Model

Procedia Computer Science, 2017

Ti t l e Ar a bi c m a c hi n e t r a n slit e r a tio n u si n g a n a t t e n tio n-b a s e d e... more Ti t l e Ar a bi c m a c hi n e t r a n slit e r a tio n u si n g a n a t t e n tio n-b a s e d e n c o d e r-d e c o d e r m o d el

Research paper thumbnail of A statistical approach for the induction of a grammar of Arabic

2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 2015

Over the last decade, a lot of research has focused on Arabic Natural Language Processing (ANLP).... more Over the last decade, a lot of research has focused on Arabic Natural Language Processing (ANLP). Various approaches and techniques have been used to develop ANLP tools. Some of these are rule-based while others are statistical or machine-learning-based. However, the development of some ANLP tools depends on the availability of a good Arabic grammar which covers the entire language. It turns out that the Arabic grammar used by most of the developed approaches was hand-crafted and most often extracted from short sentences. This manual development process is painstaking and time consuming while the developed grammar cannot describe the entire Arabic language.

Research paper thumbnail of TALAA-ASC: A sentence compression corpus for Arabic

2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 2015

A lot of work has been performed for many languages other than Arabic in sentence compression. Un... more A lot of work has been performed for many languages other than Arabic in sentence compression. Unfortunately, there is a lack of effort devoted to Arabic sentence compression. One of the reasons behind the lack of work in Arabic sentence compression is the absence of Arabic sentence compression corpora. In order to build and evaluate sentence compression systems, parallel corpora consisting of source sentences and their corresponding compressions are needed. In this paper, we present TALAA-ASC, the first Arabic sentence compression corpus. We present the methodology we followed in order to construct the corpus. We also give the different statistics and analyses that we have performed on this corpus.

Research paper thumbnail of Drug Repurposing by Optimizing Mining of Genes Target Association

Computational Intelligence Methods for Bioinformatics and Biostatistics, 2015

ABSTRACT A major alternative strategy for the pharmacology industry is to find new uses for appro... more ABSTRACT A major alternative strategy for the pharmacology industry is to find new uses for approved drugs. Many studies show that target binding of a drug often affects not only the intended disease-related genes, leading to unexpected outcomes, if the perturbed genes are related to other diseases this permits to reposition an existing drug. Thus, we focus on finding hidden relations between drug targets and disease-related genes in order to find new hypotheses of new drug-disease pairs . Association rule mining is a very-well known technique of data mining that is widely used for the discovery of interesting relations in large data sets. In this study we applied a new computational intelligence approach to to 288 drugs and 267 diseases, forming 5018 known drug-disease pairs. Our method, based on the learned rule sets representing hidden relationships among gene targets using Grammatical Evolution (GEARM), was able to discover interesting pairs of drugs and diseases, some of which were previously reported in literature and others can be served as new hypotheses.

Research paper thumbnail of A Knowledge-Based Approach to Goal Recognition

We tackle in this paper the problem known as plan recognition. Our aim is to try to restudy the p... more We tackle in this paper the problem known as plan recognition. Our aim is to try to restudy the problem, paying particular attention to the underlying reasoning processes. We rename the problem as goal recognition, explain the details of the reasoning steps that are required for it, and argue in particular that forward reasoning is the central process in goal recognition though it needs to be supported by other forms of reasoning such as abduction. We also sketch algorithms for the implementation of the various processes, give an integrated algorithm, and illustrate it with various examples that highlight some of the problems that can be encountered such as that of inconsistency between the observations and the available knowledge. 1 Introduction During the last decade, a great deal of effort was devoted to Natural Language Understanding (computer) systems. In this endeavour, researchers have tried to analyse and model the way human beings communicate. Of particular interest was the...

Research paper thumbnail of A Methodology for a Semi-Automatic Evaluation of the Lexicons of Machine Translation Systems

Machine Translation

The lexicon is a major part of any Machine Translation (MT) system. If the lexicon of an MT syste... more The lexicon is a major part of any Machine Translation (MT) system. If the lexicon of an MT system is not adequate, this will affect the quality of the whole system. Building a comprehensive lexicon, i.e., one with a high lexical coverage, is a major activity in the process of developing a good MT system. As such, the evaluation of the lexicon of an MT system is clearly a pivotal issue for the process of evaluating MT systems. In this paper, we introduce a new methodology that was devised to enable developers and users of MT Systems to evaluate their lexicons semi-automatically. This new methodology is based on the idea of the importance of a specific word or, more precisely, word sense, to a given application domain. This importance, or weight, determines how the presence of such a word in, or its absence from, the lexicon affects the MT system's lexical quality, which in turn will naturally affect the overall output quality. The method, which adopts a black-box approach to eva...

Research paper thumbnail of A Neural-Network-Based Arabic Morphological Analyzer

Research paper thumbnail of Aicha Boutorh, Ahmed Guessoum: Grammatical Evolution Association Rule Mining to Detect Gene-Gene Interaction

Research paper thumbnail of A Supervised Approach to Arabic Text Summarization Using AdaBoost

Advances in Intelligent Systems and Computing, 2015

ABSTRACT In recent years, research in text summarization has become very active for many language... more ABSTRACT In recent years, research in text summarization has become very active for many languages. Unfortunately, looking at the effort devoted to Arabic text summarization, we find much fewer attention paid to it. This paper presents a Machine Learning-based approach to Arabic text summarization which uses AdaBoost. This technique is employed to predict whether a new sentence is likely to be included in the summary or not. In order to evaluate the approach, we have used a corpus of Arabic articles. This approach was compared against other Machine Learning approaches and the results obtained show that the approach we suggest using AdaBoost outperforms other existing approaches.

Research paper thumbnail of Social Validation of Solutions in the Context of Online Communities

IFIP Advances in Information and Communication Technology, 2015

Online Communities are considered as a new organizational structure that allows individuals and g... more Online Communities are considered as a new organizational structure that allows individuals and groups of persons to collaborate and share their knowledge and experiences. These members need technological support in order to facilitate their learning activities (e.g. during a problem solving process).We address in this paper the problem of social validation, our aim being to support members of Online Communities of Learners to validate the proposed solutions. Our approach is based on the members' evaluations: we apply three machine learning techniques, namely a Genetic Algorithm, Artificial Neural Networks and the Naïve Bayes approach. The main objective is to determine a validity rating of a given solution. A preliminary experimentation of our approach within a Community of Learners whose main objective is to collaboratively learn the Java language shows that Neural Networks represent the most suitable approach in this context.

Research paper thumbnail of Rule-based Grammatical Evolution For Drug Repositioning

Research paper thumbnail of ANETAC: Arabic Named Entity Transliteration and Classification Dataset

arXiv (Cornell University), Jul 6, 2019

In this paper, we make freely accessible ANETAC 1 our English-Arabic named entity transliteration... more In this paper, we make freely accessible ANETAC 1 our English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79, 924 instances, each instance is a triplet (e, a, c), where e is the English named entity, a is its Arabic transliteration and c is its class that can be either a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed for the researchers that are working on Arabic named entity transliteration, but it can also be used for named entity classification purposes. This dataset was developed and used as part of a previous research study done by Hadj Ameur et al. [1].

Research paper thumbnail of Ontological Relation Classification Using WordNet, Word Embeddings and Deep Neural Networks

Modelling and Implementation of Complex Systems, 2020

Learning ontological relations is an important step on the way to automatically developing ontolo... more Learning ontological relations is an important step on the way to automatically developing ontologies. This paper introduces a novel way to exploit WordNet [16], the combination of pre-trained word embeddings and deep neural networks for the task of ontological relation classification. The data from WordNet and the knowledge encapsulated in the pre-trained word vectors are combined into an enriched dataset. In this dataset a pair of terms that are linked in WordNet through some ontological relation are represented by their word embeddings. A Deep Neural Network uses this dataset to learn the classification of ontological relations based on the word embeddings. The implementation of this approach has yielded encouraging results, which should help the ontology learning research community develop tools for ontological relation extraction.

Research paper thumbnail of Grammatical Evolution Association Rule Mining to Detect Gene-Gene Interaction

Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms, 2014

An important goal of human genetics is to identify DNA sequence variations that increase or decre... more An important goal of human genetics is to identify DNA sequence variations that increase or decrease specific disease susceptibility. Complex interactions among genes and environmental factors are known to play a role in common human disease etiology. Methods for association rule mining (ARM) are highly successful; especially that they produce rules which are easily interpretable. This has made them widely used in various domains. During the different stages of the knowledge discovery process, several problems are faced. It turns out that, the search characteristics of Evolutionary Algorithms make them suited to solve this kind of problems. In this study, we introduce GEARM, a novel approach for discovering association rules using Grammatical Evolution. We present the approach and evaluate it on simulated data that represents epistasis models. We show that this method improves the performance of gene-gene interaction detection.

Research paper thumbnail of An Automatic Approach for WordNet Enrichment Applied to Arabic WordNet

This paper introduces an automatic method to extend existing WordNets via machine translation. Ou... more This paper introduces an automatic method to extend existing WordNets via machine translation. Our proposal relies on the hierarchical skeleton of the English Princeton WordNet (PWN) as a backbone to extend their taxonomies. Our proposal is applied to the Arabic WordNet (AWN) to enrich it by adding new synsets, and also by providing vocalizations and usage examples for each inserted lemma. Around 12000 new potential synsets can be added to AWN with a precision of at least \(93\%\). As such the coverage of AWN in terms of synsets can be increased from 11269 to around 24000 a very promising achievement on the path of enriching the Arabic WordNet.

Research paper thumbnail of Ontology learning: Grand tour and challenges

Computer Science Review, 2021

Ontologies are at the core of the semantic web. As knowledge bases, they are very useful resource... more Ontologies are at the core of the semantic web. As knowledge bases, they are very useful resources for many artificial intelligence applications. Ontology learning, as a research area, proposes techniques to automate several tasks of the ontology construction process to simplify the tedious work of manually building ontologies. In this paper we present the state of the art of this field. Different classes of approaches are covered (linguistic, statistical, and machine learning), including some recent ones (deep-learning-based approaches). In addition, some relevant solutions (frameworks), which offer strategies and built-in methods for ontology learning, are presented. A descriptive summary is made to point out the capabilities of the different contributions based on criteria that have to do with the produced ontology components and the degree of automation. We also highlight the challenge of evaluating ontologies to make them reliable, since it is not a trivial task in this field; it actually represents a research area on its own. Finally, we identify some unresolved issues and open questions.

Research paper thumbnail of Arabic Machine Translation: A survey of the latest trends and challenges

Computer Science Review, 2020

Given that Arabic is one of the most widely used languages in the world, the task of Arabic Machi... more Given that Arabic is one of the most widely used languages in the world, the task of Arabic Machine Translation (MT) has recently received a great deal of attention from the research community. Indeed, the amount of research that has been devoted to this task has led to some important achievements and improvements. However, the current state of Arabic MT systems has not reached the quality achieved for some other languages. Thus, much research work is still needed to improve it. This survey paper introduces the Arabic language, its characteristics, and the challenges involved in its translation. It provides the reader with a full summary of the important research studies that have been accomplished with regard to Arabic MT along with the most important tools and resources that are available for building and testing new Arabic MT systems. Furthermore, the survey paper discusses the current state of Arabic MT and provides some insights into possible future research directions.

Research paper thumbnail of Recommendation of users in social networks: A semantic and social based classification approach

Expert Systems, 2020

Recently, the study of social network‐based recommender systems has become an active research top... more Recently, the study of social network‐based recommender systems has become an active research topic. The integration of the social relationships that exist between users can improve the accuracy of recommendation results since the users' preferences are similar or influenced by their connected friends. We focus in this article on the recommendation of users in social networks. Our approach is based on semantic and social representations of the users' profiles. We have formalized and illustrated these two dimensions using the Yelp social network. The novelty of our approach concerns the modelling of the credibility of the user, through his/her trust and commitment in the social network. Moreover, in order to optimize the performance of the recommendation process, we have used two classification techniques: an unsupervised technique that uses the K‐means algorithm (applied initially to all users); and a supervised technique that uses the K‐Nearest Neighbours algorithm (applied...

Research paper thumbnail of Improving Arabic neural machine translation via n-best list re-ranking

Machine Translation, 2019

Even though the rise of the Neural Machine Translation (NMT) paradigm has brought a great deal of... more Even though the rise of the Neural Machine Translation (NMT) paradigm has brought a great deal of improvement to the machine translation field, the current translation results are still not perfect. One of the main reasons for this imperfection is the decoding task complexity. Indeed, the problem of finding the one best translation from the space of all possible translations was and still is a challenging problem. One of the most successful ways to address it is via n-best list re-ranking which attempts to reorder the n-best decoder translations according to some defined features. In this paper, we propose a set of new re-ranking features that can be extracted directly from the parallel corpus without needing any external tools. The features set that we propose takes into account lexical, syntactic, and even semantic aspects of the n-best list translations. We also present a method for feature weights optimization that uses a Quantum-behaved Particle Swarm Optimization (QPSO) algorithm. Our system has been evaluated on multiple English-to-Arabic and Arabic-to-English machine translation test sets, and the obtained re-ranking results yield noticeable improvements over the baseline NMT systems.

Research paper thumbnail of Sentiment Analysis of Users on Social Networks: Overcoming the challenge of the Loose Usages of the Algerian Dialect

Procedia Computer Science, 2018

Abstract Sentiment Analysis (SA) focuses on the study and analysis of peoples’ opinions, sentimen... more Abstract Sentiment Analysis (SA) focuses on the study and analysis of peoples’ opinions, sentiments and emotions based on written language. It is currently a very active research area in NLP. The growth of Web 2.0 has given all internet users the additional power of “interactivity, interoperability, and collaboration [1]. This has rapidly opened the door for the development of social media and the large interaction between users in all walks of life. Social media platforms are currently exploited by many companies as a major channel to advertise and sell products. As such, tools are clearly needed to analyse peoples’ opinions on and reviews of the various products, feedback on events, etc. In recent years, researchers on Arabic NLP have made some good effort tackling the problem of SA. These efforts have been more focused during the last couple of years on Arabic dialects and, to a lesser extent, on the dialects of the Maghreb region, even less on the Algerian Dialect (AlgD). The processing of this dialect is made even more complex with the frequent code switching by its speakers between Arabic and Latin letters. Facebook being widely used in the Arab world, and in Algeria more specifically, we are interested in this paper in the SA of Algerian users’ comments on various Facebook pages. A painstaking pre-processing of a corpus of such comments is done, and two neural network models, MLP and CNN, are trained to classify comments as negative, neutral or positive. Though a complex dialect, we have obtained an 81.6% accuracy with the MLP network and 89.5% accuracy with the CNN. We find this as a very encouraging result.

Research paper thumbnail of Arabic Machine Transliteration using an Attention-based Encoder-decoder Model

Procedia Computer Science, 2017

Ti t l e Ar a bi c m a c hi n e t r a n slit e r a tio n u si n g a n a t t e n tio n-b a s e d e... more Ti t l e Ar a bi c m a c hi n e t r a n slit e r a tio n u si n g a n a t t e n tio n-b a s e d e n c o d e r-d e c o d e r m o d el

Research paper thumbnail of A statistical approach for the induction of a grammar of Arabic

2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 2015

Over the last decade, a lot of research has focused on Arabic Natural Language Processing (ANLP).... more Over the last decade, a lot of research has focused on Arabic Natural Language Processing (ANLP). Various approaches and techniques have been used to develop ANLP tools. Some of these are rule-based while others are statistical or machine-learning-based. However, the development of some ANLP tools depends on the availability of a good Arabic grammar which covers the entire language. It turns out that the Arabic grammar used by most of the developed approaches was hand-crafted and most often extracted from short sentences. This manual development process is painstaking and time consuming while the developed grammar cannot describe the entire Arabic language.

Research paper thumbnail of TALAA-ASC: A sentence compression corpus for Arabic

2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 2015

A lot of work has been performed for many languages other than Arabic in sentence compression. Un... more A lot of work has been performed for many languages other than Arabic in sentence compression. Unfortunately, there is a lack of effort devoted to Arabic sentence compression. One of the reasons behind the lack of work in Arabic sentence compression is the absence of Arabic sentence compression corpora. In order to build and evaluate sentence compression systems, parallel corpora consisting of source sentences and their corresponding compressions are needed. In this paper, we present TALAA-ASC, the first Arabic sentence compression corpus. We present the methodology we followed in order to construct the corpus. We also give the different statistics and analyses that we have performed on this corpus.

Research paper thumbnail of Drug Repurposing by Optimizing Mining of Genes Target Association

Computational Intelligence Methods for Bioinformatics and Biostatistics, 2015

ABSTRACT A major alternative strategy for the pharmacology industry is to find new uses for appro... more ABSTRACT A major alternative strategy for the pharmacology industry is to find new uses for approved drugs. Many studies show that target binding of a drug often affects not only the intended disease-related genes, leading to unexpected outcomes, if the perturbed genes are related to other diseases this permits to reposition an existing drug. Thus, we focus on finding hidden relations between drug targets and disease-related genes in order to find new hypotheses of new drug-disease pairs . Association rule mining is a very-well known technique of data mining that is widely used for the discovery of interesting relations in large data sets. In this study we applied a new computational intelligence approach to to 288 drugs and 267 diseases, forming 5018 known drug-disease pairs. Our method, based on the learned rule sets representing hidden relationships among gene targets using Grammatical Evolution (GEARM), was able to discover interesting pairs of drugs and diseases, some of which were previously reported in literature and others can be served as new hypotheses.

Research paper thumbnail of A Knowledge-Based Approach to Goal Recognition

We tackle in this paper the problem known as plan recognition. Our aim is to try to restudy the p... more We tackle in this paper the problem known as plan recognition. Our aim is to try to restudy the problem, paying particular attention to the underlying reasoning processes. We rename the problem as goal recognition, explain the details of the reasoning steps that are required for it, and argue in particular that forward reasoning is the central process in goal recognition though it needs to be supported by other forms of reasoning such as abduction. We also sketch algorithms for the implementation of the various processes, give an integrated algorithm, and illustrate it with various examples that highlight some of the problems that can be encountered such as that of inconsistency between the observations and the available knowledge. 1 Introduction During the last decade, a great deal of effort was devoted to Natural Language Understanding (computer) systems. In this endeavour, researchers have tried to analyse and model the way human beings communicate. Of particular interest was the...

Research paper thumbnail of A Methodology for a Semi-Automatic Evaluation of the Lexicons of Machine Translation Systems

Machine Translation

The lexicon is a major part of any Machine Translation (MT) system. If the lexicon of an MT syste... more The lexicon is a major part of any Machine Translation (MT) system. If the lexicon of an MT system is not adequate, this will affect the quality of the whole system. Building a comprehensive lexicon, i.e., one with a high lexical coverage, is a major activity in the process of developing a good MT system. As such, the evaluation of the lexicon of an MT system is clearly a pivotal issue for the process of evaluating MT systems. In this paper, we introduce a new methodology that was devised to enable developers and users of MT Systems to evaluate their lexicons semi-automatically. This new methodology is based on the idea of the importance of a specific word or, more precisely, word sense, to a given application domain. This importance, or weight, determines how the presence of such a word in, or its absence from, the lexicon affects the MT system's lexical quality, which in turn will naturally affect the overall output quality. The method, which adopts a black-box approach to eva...

Research paper thumbnail of A Neural-Network-Based Arabic Morphological Analyzer

Research paper thumbnail of Aicha Boutorh, Ahmed Guessoum: Grammatical Evolution Association Rule Mining to Detect Gene-Gene Interaction

Research paper thumbnail of A Supervised Approach to Arabic Text Summarization Using AdaBoost

Advances in Intelligent Systems and Computing, 2015

ABSTRACT In recent years, research in text summarization has become very active for many language... more ABSTRACT In recent years, research in text summarization has become very active for many languages. Unfortunately, looking at the effort devoted to Arabic text summarization, we find much fewer attention paid to it. This paper presents a Machine Learning-based approach to Arabic text summarization which uses AdaBoost. This technique is employed to predict whether a new sentence is likely to be included in the summary or not. In order to evaluate the approach, we have used a corpus of Arabic articles. This approach was compared against other Machine Learning approaches and the results obtained show that the approach we suggest using AdaBoost outperforms other existing approaches.

Research paper thumbnail of Social Validation of Solutions in the Context of Online Communities

IFIP Advances in Information and Communication Technology, 2015

Online Communities are considered as a new organizational structure that allows individuals and g... more Online Communities are considered as a new organizational structure that allows individuals and groups of persons to collaborate and share their knowledge and experiences. These members need technological support in order to facilitate their learning activities (e.g. during a problem solving process).We address in this paper the problem of social validation, our aim being to support members of Online Communities of Learners to validate the proposed solutions. Our approach is based on the members' evaluations: we apply three machine learning techniques, namely a Genetic Algorithm, Artificial Neural Networks and the Naïve Bayes approach. The main objective is to determine a validity rating of a given solution. A preliminary experimentation of our approach within a Community of Learners whose main objective is to collaboratively learn the Java language shows that Neural Networks represent the most suitable approach in this context.

Research paper thumbnail of Rule-based Grammatical Evolution For Drug Repositioning