Khaled Elghamry | Ain Shams University (original) (raw)

Uploads

Papers by Khaled Elghamry

Research paper thumbnail of الحرباء: لغة العنف في خطاب داعش باللغتين العربية والإنجليزية على شبكة الإنترنت

Research paper thumbnail of A Fine-Grained Emotion Lexicon for Arabic

The Egyptian Journal of Language Engineering, 2015

A list of Arabic adjectives that users on Twitter use to express their emotions. Source and more ... more A list of Arabic adjectives that users on Twitter use to express their emotions. Source and more details on how this list was constructed can be found in: Elghamry, Khaled. "MASHAEIR: Bootstrapping a Multi-Dialect Fine-Grained Emotion Thesaurus for Arabic Using Twitter". The Egyptian Journal of Language Engineering 2.2 (2015): 10-21.

Research paper thumbnail of Arabic Broken Plurals List

Research paper thumbnail of MASHAEIR: Bootstrapping a Multi-Dialect Fine- Grained Emotion Thesaurus for Arabic Using Twitter

The Egyptian Journal of Language Engineering, 2015

Research paper thumbnail of A Bilingual Approach for Arabic Paraphrases Acquisition : Preliminary Experiments

This paper presents preliminary experiments on a bi lingual approach for Arabic paraphrase acquis... more This paper presents preliminary experiments on a bi lingual approach for Arabic paraphrase acquisition; a research which is motivated by the importance of paraphrasing for overcoming sparseness of data and its importance for many NLP applications such as Question Answering (QA) and In formation Retrieval (IR). The proposed approach develops an unsupervised bilingua algorithm to acquire Arabic paraphrases at the phrase level which is rather mor e challenging than the elementary word-level paraphrasing and is less efficiently han dled by current Arabic paraphrasing systems. Preliminary results show that our approach manages to get term variations – orthographic, lexical and syntactic – for ~ 70% of 4000 randomly selected phrases.

Research paper thumbnail of A Corpus-based Arabic Valency Dictionary: The Case of Fighting Verbs

The Egyptian Journal of Language Engineering, Apr 10, 2017

Empirical pedagogical dictionaries aim at defining words in their context and presenting corpus-b... more Empirical pedagogical dictionaries aim at defining words in their context and presenting corpus-based evidence for each word. They are meant to teach language learners how to use a word correctly. Valency, which describes the arguments of a verb syntactically and semantically, is of unique importance to pedagogical dictionaries. Unfortunately, Arabic lacks corpus-based valency resources. Thus, this paper proposes a monolingual corpus-based valency dictionary, for Arabic learners, covering fighting verbs. The dictionary explores the valency of fighting verbs in Sketch Engine's uploaded Arabic TenTen corpus. The dictionary compiling method depends on both automatic word sketch function to identify the lexico-syntactic patterns of verbs and on threelayer manual annotation of corpus-driven examples to consolidate the results. Each verb entry, in the dictionary, displays (a) number; (b) phrase type; (c) semantic role; (d) grammatical function of its arguments and (e) definition of its different senses. At least, three annotated examples are provided for each verb sense to illustrate its usage authentically. The dictionary, integrating semantic and syntactic information, facilitates effective learning of new Arabic vocabulary.

Research paper thumbnail of A Prototype Theory-Based Study of Crimes in the English and Arabic Societies Using Web-as-Corpus

The Egyptian Journal of Language Engineering, Sep 15, 2017

Research paper thumbnail of Detecting Veracity in Selected Speeches of Egyptian Presidents (1956-2015) and American Presidents (1981- 2015): A Psycholinguistic Corpus-based Study

The Egyptian Journal of Language Engineering, Apr 11, 2018

Research paper thumbnail of Periphery Discourse: An Alternative Media Eye on the Geographical, Social and Media Peripheries in Egypt's Spring

Mediterranean Politics, 2015

The growing literature on the use of social media for social protests generally, and during the A... more The growing literature on the use of social media for social protests generally, and during the Arab Spring in particular, has generally failed to show a periphery-inclusive perspective. This article employs statistical data on the use of alternative media outlets (Facebook, Twitter, blogs and YouTube) in Egypt's spring to show how an alternative media structure was expanding which not only empowered social and geographic peripheral actors but was, in turn, also empowered by their contributions. YouTube videos and Twitter messages from peripheral areas exposed police brutality towards protestors in the backstreets that could otherwise have been unnoticed and saved lives in isolated areas in Egypt. Social media thus gained critical mass and expanded to the point that it had an overflow effect from the virtual sphere to the real world. Contrasting the roles of alternative and state-run media machines in different phases of the revolution, the article traces how peripheries could challenge the existing opportunity structure through alternative media, but also how their role has contracted again after the revolution reached its peak.

Research paper thumbnail of Machine Translation Oriented Syntactic Normalization of Noun Phrases in Arabic

It has been shown that syntactic normalization boosts text summarization and improves both precis... more It has been shown that syntactic normalization boosts text summarization and improves both precision and recall in information retrieval. This paper shows that syntactic normalization can also improve the performance of machine translation systems. This paper presents a method for identifying and normalizing the structural variations of the nominal 'Construct State', also known as iDafa, in Arabic. This type of noun phrases has the structure Noun1 Noun2, and is highly frequent in Arabic. The paper (i) describes the structural variations of this construction, (ii) describes the suggested method for identifying its structural variations, and (iii) finally tests the effect of normalizing these variants on the performance of Arabic-to-English machine translation systems. The results showed a 5-point improvement in MT performance. To the author's best knowledge, this is the first attempt for text normalization in Arabic, and its corresponding effect on Arabic MT.

Research paper thumbnail of Using the Web in Building a Corpus-Based Hypernymy-Hyponymy Lexicon with Hierarchical Structure for Arabic

The hypernymy-hyponymy links form the backbone of the noun hierarchy in a semantic lexicon. Carry... more The hypernymy-hyponymy links form the backbone of the noun hierarchy in a semantic lexicon. Carrying out this task manually is labor-intensive and time consuming, and could lead to inconsistencies and to problems in coverage, updating, and scaling up. This paper shows how a corpus-based hypernym-hyponym lexicon with partial hierarchical structure for Arabic can be created directly from the Web with minimal human supervision. The creation method bootstraps the acquisition process by searching the Web for the lexico-syntactic pattern " ‫ﺑﻌﺾ‬ x ‫ﻣﺜﻞ‬ y 1 …y n " (some x such as y 1 ,…y n). The results reported in this paper show the effectiveness of the suggested method and (when compared to the current version of the Arabic Wordnet) raise some important theoretical as well as practical issues on different levels and directions in (Web-) corpus-based approaches to linguistic knowledge acquisition, in general, and to semantic lexicon acquisition, in particular.

Research paper thumbnail of Arabic Anaphora Resolution: A Distributional, Monolingual and Bilingual Approach

Abstract—This paper presents an algorithm for Anaphora Resolution (AR) in Arabic. The paper is mo... more Abstract—This paper presents an algorithm for Anaphora Resolution (AR) in Arabic. The paper is motivated by the poor performance of current Arabic-English Machine Translation (MT) systems in terms of AR and the fact that AR is an understudied issue in Arabic Natural Language Processing (ANLP). The algorithm suggested follows a distributional, monolingual and bilingual bootstrapping approach to acquire AR-related features that cannot be provided by monolingual resources, using a second language (here English). ...

Research paper thumbnail of A Web-Based Approach for Arabic PP Attachment

Proceedings of the 6th International Conference on Informatics and Systems, 2008

This paper presents a web-based algorithm for Arabic PP attachment. The paper is motivated by the... more This paper presents a web-based algorithm for Arabic PP attachment. The paper is motivated by the importance of PP attachment for NLP applications and tasks and by the poor performance of current Arabic parser. The algorithm uses web frequencies to measure the collocational association between the PP and candidate binders; binder with the highest association is selected as the correct one. The algorithm achieves a performance rate of≈ 82%, which is higher than the used baseline performance (≈ 79%).

Research paper thumbnail of Arabic anaphora resolution using the Web as corpus

Proceedings of the 7th Conference on Language Engineering (CLE’07), Dec 5, 2007

Arabic Anaphora Resolution Using the Web as Corpus Abstract—This paper presents a dynamic algorit... more Arabic Anaphora Resolution Using the Web as Corpus Abstract—This paper presents a dynamic algorithm for Anaphora Resolution (AR) in Arabic unrestricted texts. The poor performance of current Arabic/English Machine Translation (MT) systems in terms of AR and the fact that AR is an understudied issue in Arabic Natural Language Processing (ANLP) are the main motivations for this paper. The algorithm suggested follows a statistical approach to AR and makes use of the web as corpus to overcome the inherit problem of statistical ...

Research paper thumbnail of Cue-based bootstrapping of Arabic semantic features

JADT, 2008

Motivated by the fact that semantic features are understudied in Arabic Natural Language Processi... more Motivated by the fact that semantic features are understudied in Arabic Natural Language Processing (ANLP) in spite of being essential for some Natural Language Processing (NLP) tasks such as Anaphora Resolution (AR), Word Sense Disambiguation (WSD) and Prepositional Phrase (PP) attachment, this paper presents a cue-based algorithm to build an Arabic lexicon that tackles such semantic features. The lexicon, whose entries are extracted from the World Wide Web (WWW) using bilingual and monolingual ...

Research paper thumbnail of A generalized cue-based approach to the automatic acquisition of subcategorization frames /

Thesis (Ph. D.)--Indiana University, 2004.

Research paper thumbnail of A Bilingual Approach for Arabic Paraphrases Acquisition: Preliminary Experiments

Arabic paraphrase acquisition; a research which is motivated by the importance of paraphrasing fo... more Arabic paraphrase acquisition; a research which is motivated by the importance of paraphrasing for overcoming sparseness of data and its importance for many NLP applications such as Question Answering (QA) and Information Retrieval (IR). The proposed approach develops an unsupervised bilingual algorithm to acquire Arabic paraphrases at the phrase level which is rather more challenging than the elementary word-level paraphrasing and is less efficiently handled by current Arabic paraphrasing systems. ...

Research paper thumbnail of Arabic Broken Plurals List

Research paper thumbnail of التقرير التأسيسي للمحتوى الرقمي العربي (الواقع- الدلالات- التحديات)

Research paper thumbnail of Mapping the Sunna-Shia Hate Speech on Arabic Twitter Using Sentiment Analysis Techniques

Research paper thumbnail of الحرباء: لغة العنف في خطاب داعش باللغتين العربية والإنجليزية على شبكة الإنترنت

Research paper thumbnail of A Fine-Grained Emotion Lexicon for Arabic

The Egyptian Journal of Language Engineering, 2015

A list of Arabic adjectives that users on Twitter use to express their emotions. Source and more ... more A list of Arabic adjectives that users on Twitter use to express their emotions. Source and more details on how this list was constructed can be found in: Elghamry, Khaled. "MASHAEIR: Bootstrapping a Multi-Dialect Fine-Grained Emotion Thesaurus for Arabic Using Twitter". The Egyptian Journal of Language Engineering 2.2 (2015): 10-21.

Research paper thumbnail of Arabic Broken Plurals List

Research paper thumbnail of MASHAEIR: Bootstrapping a Multi-Dialect Fine- Grained Emotion Thesaurus for Arabic Using Twitter

The Egyptian Journal of Language Engineering, 2015

Research paper thumbnail of A Bilingual Approach for Arabic Paraphrases Acquisition : Preliminary Experiments

This paper presents preliminary experiments on a bi lingual approach for Arabic paraphrase acquis... more This paper presents preliminary experiments on a bi lingual approach for Arabic paraphrase acquisition; a research which is motivated by the importance of paraphrasing for overcoming sparseness of data and its importance for many NLP applications such as Question Answering (QA) and In formation Retrieval (IR). The proposed approach develops an unsupervised bilingua algorithm to acquire Arabic paraphrases at the phrase level which is rather mor e challenging than the elementary word-level paraphrasing and is less efficiently han dled by current Arabic paraphrasing systems. Preliminary results show that our approach manages to get term variations – orthographic, lexical and syntactic – for ~ 70% of 4000 randomly selected phrases.

Research paper thumbnail of A Corpus-based Arabic Valency Dictionary: The Case of Fighting Verbs

The Egyptian Journal of Language Engineering, Apr 10, 2017

Empirical pedagogical dictionaries aim at defining words in their context and presenting corpus-b... more Empirical pedagogical dictionaries aim at defining words in their context and presenting corpus-based evidence for each word. They are meant to teach language learners how to use a word correctly. Valency, which describes the arguments of a verb syntactically and semantically, is of unique importance to pedagogical dictionaries. Unfortunately, Arabic lacks corpus-based valency resources. Thus, this paper proposes a monolingual corpus-based valency dictionary, for Arabic learners, covering fighting verbs. The dictionary explores the valency of fighting verbs in Sketch Engine's uploaded Arabic TenTen corpus. The dictionary compiling method depends on both automatic word sketch function to identify the lexico-syntactic patterns of verbs and on threelayer manual annotation of corpus-driven examples to consolidate the results. Each verb entry, in the dictionary, displays (a) number; (b) phrase type; (c) semantic role; (d) grammatical function of its arguments and (e) definition of its different senses. At least, three annotated examples are provided for each verb sense to illustrate its usage authentically. The dictionary, integrating semantic and syntactic information, facilitates effective learning of new Arabic vocabulary.

Research paper thumbnail of A Prototype Theory-Based Study of Crimes in the English and Arabic Societies Using Web-as-Corpus

The Egyptian Journal of Language Engineering, Sep 15, 2017

Research paper thumbnail of Detecting Veracity in Selected Speeches of Egyptian Presidents (1956-2015) and American Presidents (1981- 2015): A Psycholinguistic Corpus-based Study

The Egyptian Journal of Language Engineering, Apr 11, 2018

Research paper thumbnail of Periphery Discourse: An Alternative Media Eye on the Geographical, Social and Media Peripheries in Egypt's Spring

Mediterranean Politics, 2015

The growing literature on the use of social media for social protests generally, and during the A... more The growing literature on the use of social media for social protests generally, and during the Arab Spring in particular, has generally failed to show a periphery-inclusive perspective. This article employs statistical data on the use of alternative media outlets (Facebook, Twitter, blogs and YouTube) in Egypt's spring to show how an alternative media structure was expanding which not only empowered social and geographic peripheral actors but was, in turn, also empowered by their contributions. YouTube videos and Twitter messages from peripheral areas exposed police brutality towards protestors in the backstreets that could otherwise have been unnoticed and saved lives in isolated areas in Egypt. Social media thus gained critical mass and expanded to the point that it had an overflow effect from the virtual sphere to the real world. Contrasting the roles of alternative and state-run media machines in different phases of the revolution, the article traces how peripheries could challenge the existing opportunity structure through alternative media, but also how their role has contracted again after the revolution reached its peak.

Research paper thumbnail of Machine Translation Oriented Syntactic Normalization of Noun Phrases in Arabic

It has been shown that syntactic normalization boosts text summarization and improves both precis... more It has been shown that syntactic normalization boosts text summarization and improves both precision and recall in information retrieval. This paper shows that syntactic normalization can also improve the performance of machine translation systems. This paper presents a method for identifying and normalizing the structural variations of the nominal 'Construct State', also known as iDafa, in Arabic. This type of noun phrases has the structure Noun1 Noun2, and is highly frequent in Arabic. The paper (i) describes the structural variations of this construction, (ii) describes the suggested method for identifying its structural variations, and (iii) finally tests the effect of normalizing these variants on the performance of Arabic-to-English machine translation systems. The results showed a 5-point improvement in MT performance. To the author's best knowledge, this is the first attempt for text normalization in Arabic, and its corresponding effect on Arabic MT.

Research paper thumbnail of Using the Web in Building a Corpus-Based Hypernymy-Hyponymy Lexicon with Hierarchical Structure for Arabic

The hypernymy-hyponymy links form the backbone of the noun hierarchy in a semantic lexicon. Carry... more The hypernymy-hyponymy links form the backbone of the noun hierarchy in a semantic lexicon. Carrying out this task manually is labor-intensive and time consuming, and could lead to inconsistencies and to problems in coverage, updating, and scaling up. This paper shows how a corpus-based hypernym-hyponym lexicon with partial hierarchical structure for Arabic can be created directly from the Web with minimal human supervision. The creation method bootstraps the acquisition process by searching the Web for the lexico-syntactic pattern " ‫ﺑﻌﺾ‬ x ‫ﻣﺜﻞ‬ y 1 …y n " (some x such as y 1 ,…y n). The results reported in this paper show the effectiveness of the suggested method and (when compared to the current version of the Arabic Wordnet) raise some important theoretical as well as practical issues on different levels and directions in (Web-) corpus-based approaches to linguistic knowledge acquisition, in general, and to semantic lexicon acquisition, in particular.

Research paper thumbnail of Arabic Anaphora Resolution: A Distributional, Monolingual and Bilingual Approach

Abstract—This paper presents an algorithm for Anaphora Resolution (AR) in Arabic. The paper is mo... more Abstract—This paper presents an algorithm for Anaphora Resolution (AR) in Arabic. The paper is motivated by the poor performance of current Arabic-English Machine Translation (MT) systems in terms of AR and the fact that AR is an understudied issue in Arabic Natural Language Processing (ANLP). The algorithm suggested follows a distributional, monolingual and bilingual bootstrapping approach to acquire AR-related features that cannot be provided by monolingual resources, using a second language (here English). ...

Research paper thumbnail of A Web-Based Approach for Arabic PP Attachment

Proceedings of the 6th International Conference on Informatics and Systems, 2008

This paper presents a web-based algorithm for Arabic PP attachment. The paper is motivated by the... more This paper presents a web-based algorithm for Arabic PP attachment. The paper is motivated by the importance of PP attachment for NLP applications and tasks and by the poor performance of current Arabic parser. The algorithm uses web frequencies to measure the collocational association between the PP and candidate binders; binder with the highest association is selected as the correct one. The algorithm achieves a performance rate of≈ 82%, which is higher than the used baseline performance (≈ 79%).

Research paper thumbnail of Arabic anaphora resolution using the Web as corpus

Proceedings of the 7th Conference on Language Engineering (CLE’07), Dec 5, 2007

Arabic Anaphora Resolution Using the Web as Corpus Abstract—This paper presents a dynamic algorit... more Arabic Anaphora Resolution Using the Web as Corpus Abstract—This paper presents a dynamic algorithm for Anaphora Resolution (AR) in Arabic unrestricted texts. The poor performance of current Arabic/English Machine Translation (MT) systems in terms of AR and the fact that AR is an understudied issue in Arabic Natural Language Processing (ANLP) are the main motivations for this paper. The algorithm suggested follows a statistical approach to AR and makes use of the web as corpus to overcome the inherit problem of statistical ...

Research paper thumbnail of Cue-based bootstrapping of Arabic semantic features

JADT, 2008

Motivated by the fact that semantic features are understudied in Arabic Natural Language Processi... more Motivated by the fact that semantic features are understudied in Arabic Natural Language Processing (ANLP) in spite of being essential for some Natural Language Processing (NLP) tasks such as Anaphora Resolution (AR), Word Sense Disambiguation (WSD) and Prepositional Phrase (PP) attachment, this paper presents a cue-based algorithm to build an Arabic lexicon that tackles such semantic features. The lexicon, whose entries are extracted from the World Wide Web (WWW) using bilingual and monolingual ...

Research paper thumbnail of A generalized cue-based approach to the automatic acquisition of subcategorization frames /

Thesis (Ph. D.)--Indiana University, 2004.

Research paper thumbnail of A Bilingual Approach for Arabic Paraphrases Acquisition: Preliminary Experiments

Arabic paraphrase acquisition; a research which is motivated by the importance of paraphrasing fo... more Arabic paraphrase acquisition; a research which is motivated by the importance of paraphrasing for overcoming sparseness of data and its importance for many NLP applications such as Question Answering (QA) and Information Retrieval (IR). The proposed approach develops an unsupervised bilingual algorithm to acquire Arabic paraphrases at the phrase level which is rather more challenging than the elementary word-level paraphrasing and is less efficiently handled by current Arabic paraphrasing systems. ...

Research paper thumbnail of Arabic Broken Plurals List

Research paper thumbnail of التقرير التأسيسي للمحتوى الرقمي العربي (الواقع- الدلالات- التحديات)

Research paper thumbnail of Mapping the Sunna-Shia Hate Speech on Arabic Twitter Using Sentiment Analysis Techniques

Research paper thumbnail of Arabilight: A Corpus-Based Model for Spoken Modern Standard Arabic

The 34th Annual Symposium on Arabic Linguistics Arizona, Tucson,, 2020

One of the recommendations for the conversational agents being currently developed for Arabic is ... more One of the recommendations for the conversational agents being currently developed for Arabic is to use a language variant that sounds educated yet friendly and natural. Modern Standard Arabic (MSA) has been adopted so far as the initial candidate for such task. One of the issues here is how to handle word-final vowels in spoken MSA. There are two extremes in this respect: [a] to drop these vowels across the board, and [b] to pronounce these vowels regardless of the context. Building on previous studies (e.g., Hallberg 2016 and Bassiouney 2010), this paper argues that real-world corpora do not support any of these two scenarios. The contribution of this paper is two-fold. It presents a corpus-based analysis of the realization of word-final vowels. Using these results, it proposes and describes ArabiLight, a conversational-friendly model for spoken MSA. This model is based on what we call the “Sufficient Necessary Minimum”. The authors also discuss possible applications of this model in learning and teaching Arabic.
The corpus was collected from: [a] official news channels from all Arab countries, [b] channels covering pan-Arab issues such as Aljazeera, AlArabiya and Skynews, [c] and the Arabic-speaking versions of France 24, CNN, BBC, Euronews and CNBC. The total sample contains 224 minutes, divided equally between male and female news anchors.
The sample was manually transcribed by the authors. Each word was labelled for the presence or absence of a word-final vowel that marks the mood or case of such word, given its context. Long and short pauses were also marked. Each news piece was labelled with the anchor’s gender and nationality.
The preliminary results show that the full use of case endings seems to be an idealized scenario that exists only in the Arabic grammar textbooks. The suggestion of dropping case across the board seems hard to be realized in real-world corpora. For example, our analysis shows that Iraqi anchors pronounce word-final vowels the least with %42, and the Tunisians the highest with %61. The effect of topic and gender on this ratio is currently under investigation.
The transcribed corpus was statistically analyzed to identify the contexts where word-final vowels were consistently pronounced by the anchors from every source in the corpus. These include, for example, “tanween nasb”, the vowel marking case or mood in words immediately followed by “alif wasl”, and the final vowels in perfective verbs. The authors argue that these contexts represent the necessary minimum of pronouncing these vowels that is sufficient for the realization of a spoken MSA that sounds educated yet friendly and natural. We call it Arabilight.
The paper also discusses the advantages of using this model in teaching and learning spoken MSA that sounds educated, yet friendly and natural. In addition, it could also help learners and teachers focus mainly on the necessary minimum of the grammar rules for word-final vowel realization, and hence reduce the learner’s ‘fear’ of the grammar, and consequently increasing their motivation, and the teacher’s time teaching an idealized textbook grammar model.
This model is part of an ongoing project concerned with the corpus-based construction of resources and models for Arabic that also includes Mashhoor, a learner’s corpus-based familiarity dictionary for Arabic, and Darajaat, a graded dictionary for Arabic learning and teaching. We plan to improve our model using more data from other MSA sources.

Research paper thumbnail of Naguib: A Model for the Computational Analysis of Arabic Novels

Experimental Arabic Linguistics, 2021

In this talk we present Naguib (after the Arab literary Nobel Laureate, Naguib Mahfouz), a model ... more In this talk we present Naguib (after the Arab literary Nobel Laureate, Naguib Mahfouz), a model for a fine-grained computational analysis of Arabic literary texts, with a special focus on novels. The main idea in Naguib is that a literary text is essentially a sequence of linguistic signals selected by the author to perform a set of narrative functions, such as settings, affect, tone, mood, character, theme, plot and dialogue. The core of the model is a novel-oriented lexical-conceptual taxonomy that maps the textual signals into these narrative functions. In this talk, we present the details of this model and the results of its application in analyzing a sample of Arabic novels and identifying the lexical-conceptual and cultural DNA of the given text. This is work in progress with the ultimate purpose of providing a robust suite of tools for researchers in the domains of literary and cultural studies and related fields for a deeper and richer analysis of Arabic literary texts on a much larger scale.

Research paper thumbnail of Detecting and Measuring Hate Speech in Arabic Social Media

The 34th Annual Symposium on Arabic Linguistics, 2020

There has been an increasing interest recently in detecting and monitoring hate speech on social ... more There has been an increasing interest recently in detecting and monitoring hate speech on social media. However, the research on the subject so far is very limited, almost non-existing on Arabic-speaking social media. This paper describes [1] the steps for using a sample of Twitter corpus in Arabic in bootstrapping a hate speech lexicon for Arabic, [2] a novel method for calculating a “seriousness” score for hate terms that predicts the likelihood of this hate speech leading to violent acts in reality, and [3] the steps for using this lexicon in a template-based fashion to identify potential targets of hate speech on the Arabic-speaking Twitter.

Research paper thumbnail of MASHHOOR: A Learner's Familiarity Dictionary for Arabic Using the Web as Corpus

6th Florida Linguistics Yearly Meeting (FLYM 2020) University of Florida, 2020

Word familiarity is needed in language learning, conversational agents, translation and localizat... more Word familiarity is needed in language learning, conversational agents, translation and localization content, word perception and readability studies (Seraye 2016 , Al-Khalifa et al 2010). Existing resources for Arabic are either subjective (Hasmam et al. 2016) or based on simple frequency dictionaries not taking into consideration the distribution of words in different types of corpora and in different Arabic-speaking countries (Buckwalter and Parkinson 2014). This paper presents Mashhoor (an Arabic word for 'well-known'), a dictionary that provides words with their corpus-based familiarity scores (a la Nusbaum et al. 1984). Our suggested familiarity score is a function of the different aspects of word frequency and regional distribution in a large corpus, and the changes in these frequencies over time. This paper is part of a larger ongoing project " Thamaraat " (an Arabic word for fruits), concerned with developing automated methods and tools for constructing lexical resources that would reduce time and effort in syllabus design and meeting the everyday needs of both learners and teachers of Arabic. Our suggested methods and tools can be easily used to build similar resources for other languages. This paper uses corpora crawled from 623 Arabic-speaking websites. It was crawled such that it covers all Arab countries. The preprocessing of the corpus was limited to removing encodings and punctuation. It contains 28.5 million documents and about 6.9 billion tokens. This corpus was used to compute the familiarity scores for each word form in the corpus, using a function based on the following values: [a] the overall frequency of the word in the corpus, [b] its overall frequency in the corpus from a given country, weighted proportionally to the population size of the country, [c] its frequency over time. To measure the different weights for words, we used TF-IDF (term frequency-inverse document frequency, a statistic used in information retrieval intended to reflect how important a word is to a document in a corpus. The output of applying our equation to the corpus is a dictionary with two main familiarity scores for each word: the first indicates the familiarity of the word within Arab country, and the other reflects its familiarity among Arabic speakers in general.. For example, our familiarity scores reveal that the word ‫"ﺻﺎﺣﺐ"‬ (fs: 0.97) is more familiar than the word ‫"ﺻﺪﯾﻖ"‬ (fs: 0.86), both are Arabic synonyms for "friend". One possible application of this dictionary is to help familiarize learners of Arabic with words that are common in different cultural contexts as well in different regions in the Arab world. It will also the teacher design the syllabus based on a clear measure of word commonality and familiarity, optimizing the process of aligning lexical difficulty and the proper proficiency level. Our plan is to extend our corpus and familiarity computation method to include data from Facebook and Twitter in the last 10 years, on the assumption that the frequency of words in these channels is a good indicator of its overall familiarity.

Research paper thumbnail of مستقبل العلوم الإنسانية: تحديات وحلول مقترحة

ندوة "مستقبل الإنسانيات في مصر", 2013

Research paper thumbnail of نحو مرصد عربي لخطاب الكراهية على موقع التواصل الاجتماعي

Research paper thumbnail of Arabic Broken Plurals List

Research paper thumbnail of Arabic Broken Plurals List Two .. Plain Text

Research paper thumbnail of Arabic Broken Plurals List One .. Plain Text

Research paper thumbnail of A Lexical-Syntactic Solution to the Problem of Broken Plural in Arabic

Research paper thumbnail of Mosaic: Resources for the Study of Variation in Modern Standard and Dialectal Arabic

Tools and resources for constructing and examining linguistic variation in Modern Standard and Di... more Tools and resources for constructing and examining linguistic variation in Modern Standard and Dialectal Arabic

Research paper thumbnail of A corpus-driven study of Emotions Prototypicality in Arabic and English

Using the principles of the prototype theory, the prototypes of emotions are explored in the onli... more Using the principles of the prototype theory, the prototypes of emotions are explored in the online English and Arabic expert and folkloric writings. It uses the web as corpus to investigate the inductive pattern " emotion* such as " in three web domains: " .com " , " .org " and " .edu " in Arabic and English. The collected data is analyzed using AntConc software program. Indicated statistical tests are calculated to measure the universality of conceptualizing emotions across Arabic and English speaking worlds. The study introduces quantitative evidence on emotion universality at the superordinate scale. However, cross-cultural difference is, to a lesser extent, probed as regards the polarity of the prevailing sentimentality. The study, moreover, suggests, for the first time, a web-based scaled ARABIC lexicon of sentimentality, which can be used in computing sentiments and opinion mining. Results reveal that organizational cognition about emotions prototypes is societally-oriented. However, folkloric cognition of the same concept yields individual-based prototypes. Implicating on cross-cultural difference, the English-speaking world is really more into expressing positive emotions than the Arabic speaking world. Unifying the broader level, hypernyms, the emotion prototypes, suggested by Shaver (1987) et al, have been enriched and updated.

Research paper thumbnail of برج مغِيزِل: مواقع التواصل الاجتماعي ومواسم الونس والإدمان والهجرة

شهد شهر إبريل عام 2016، زيادة مفاجئة في أعداد الشباب من المصريين الذي هربوا إلى أوروبا، ومن وقتها... more شهد شهر إبريل عام 2016، زيادة مفاجئة في أعداد الشباب من المصريين الذي هربوا إلى أوروبا، ومن وقتها وهذه الأعداد في زيادة مستمرة، فأصبحت بعض القرى المصرية شبه خالية من شبابها، وأصبحت حوادث غرق قوارب تحمل مهاجرين غير شرعيين خبرا معتادا. "برج مغيزل" هي إحدى قرى محافظة كفر الشيخ المصرية التي حدث فيها ذلك. وهناك ما يزيد عن أربعة آلاف من أهل المحافظة في السجون الأوروبية بتهم تهريب مهاجرين. فما علاقة مواقع التواصل الاجتماعي بكل ذلك؟ هناك حتى الآن أكثر من مئة موقع وتطبيق للتواصل الاجتماعي في العالم، أشهرها فيسبوك وانستجرام وتويتر وتيك توك. بدأت هذه المواقع في الظهور في نهايات القرن العشرين عندما ظهرت شبكة "SixDegrees" في 1997 وفي 2004 إنطلق فيسبوك وتحول الأمر إلى ما يشبه الطوفان، وأصبح فيسبوك بمستخدميه الذين يقترب عددهم من الثلاثة مليارات، هو الأكبر والأخطر والأكثر تأثيرا. وقد تطورت هذه المواقع بسرعة لم يكن أحد مستعدا لمواجهة عواقبها؛ فلا كانت الحكومات ولا المؤسسات جاهزة للتعامل مع نهاية احتكارها للمعلومات، ولا الشعوب جاهزة لممارسة السلطة الجديدة ولا التكيف نفسيا معها. والباحثون يلهثون وراءها لدراستها وفهم آثارها الخطيرة. فما عواقب ذلك؟ ربما تكون شركات التواصل الاجتماعي هي الوحيدة التي تعرف ماذا تريد وتعرف كيف تحققه. هذه الشركات تعرف أن للفضفضة سحرا، وللونس سطوة، وللاعجاب والقبول الاجتماعي أثر الإدمان، وتعرف تماما كيف "تهندس" كل ذلك. يقول المبتكر وكاتب قصص الخيال العلمي "آرثر سي كلارك": "إن أي تكنولوجيا متقدمة بما فيه الكفاية يتعذر تمييزها عن السحر!". وما تفعله هذه الشركات كذلك، وربما أكثر! فكيف تفعل هذه الشركات ذلك؟ هذا الكتاب محاولة للإجابة على هذه الأسئلة.

Research paper thumbnail of نبوءة آمون: الإنترنت من الحرب الباردة إلى حروب الجيل الرابع وفتنة الأجيال

تقول الأسطورة المصرية القديمة: عندما اخترع "تحوتي" الكتابة اقترح على "آمون" أن يستخدمها المصريون ... more تقول الأسطورة المصرية القديمة: عندما اخترع "تحوتي" الكتابة اقترح على "آمون" أن يستخدمها المصريون لتزيدهم حكمة وقوة في الذاكرة، لكن آمون حذره: إن اختراعك سيجعل المتعلمين يهملون ذاكرتهم، ولن يقدم لهم الحقيقة بل شبيها لها، سوف يظهرون وكأنهم أصحاب علم، لكنهم في الحقيقة لا يعرفون شيئا، وسوف يصبحون أكثر صخبا وأعلى صوتا وسوف تكون صحبتهم مزعجة. كانت نبوءة آمون ومخاوفه صغيرة على مقاس رموز الكتابة، لكن مع انتشار الإنترنت تضاعفت هذه المخاوف وظهرت مخاوف جديدة بحجم الإنترنت ومواقع التواصل الاجتماعي والملايين الذين يستخدمونها!

بدأت الإنترنت في وزارة الدفاع الأمريكية أثناء الحرب الباردة ثم خرجت من الترسانة العسكرية إلى ملايين البشر فغيرت قواعد اللعبة في عالم الاقتصاد والسياسة والأمن والسلطة على المعلومات والثقافة والمعرفة، وظهرت مواقع التواصل الاجتماعي وفعلت ما فعلت بنفوس البشر وآداب الحوار بينهم، وحولت العلاقات والمشاعر الإنسانية إلى مجموعة من الأزرار والرموز.

يقول فريق أن الإنترنت جاءت لتمكين الجماهير وإزاحة النخبة، وهناك من يقول أن هذا سوف يضر بجودة الثقافة والمعرفة، وسوف يساعد أيضا في إشعال الفتنة بين الأجيال وتفتيت المجتمعات، أو ما يعرف بحروب الجيل الرابع.

وأدركت الدول الكبرى القدرات العسكرية للإنترنت فراحت تتجسس على مستخدميها وأيضا تستخدمها في تضليل الرأي العام بالصور والشائعات وأنصاف الحقائق.

معادلة الإنترنت باختصار أنها يسرت لنا أن نعرف الكثير ونشاهد الكثير ونقول الكثير ونتواصل كثيرا ونسأل كثيرا، فنجاب– والثمن هو أننا أصبحنا أقل صبرا وأقل تركيزا وأكثر تطرفا وأقل ذوقا. فالتكنولوجيا عموما تعطي البشرية الكثير بيد وتأخذ منها الكثير باليد الأخرى. وقد قال الكاتب اليوناني القديم سوفوخليس "لا يدخل حياة الإنسان شئ كبير إلا ومعه لعنة كبيرة!"

هذا الكتاب عن الإنترنت - هذا "الشئ الكبير"، وعن "اللعنة الكبيرة" التي جاءت معها!

Research paper thumbnail of Arabic Broken Plurals

Gurt, 2008

Lists of Arabic Broken Plural Forms