Jim Yaghi - Academia.edu (original) (raw)

Papers by Jim Yaghi

Research paper thumbnail of T-Code Compression for Arabic Computational Morphology

It is impossible to perform root-based searching, concordancing, and grammar checking in Arabic w... more It is impossible to perform root-based searching, concordancing, and grammar checking in Arabic without a method to match words with roots and vice versa. A comprehensive word list is essential for incremental searching, predictive SMS messaging, and spell checking, but due to the derivational and inflectional nature of Arabic, a comprehensive word list is taxing on storage space and access speed. This paper describes a method for compactly storing and efficiently accessing an extensive dictionary of Arabic words by their morphological properties and roots. Compression of the dictionary is based on T-Code encoding, which follows the Huffman encoding model. The special characteristics inherent in the recursive augmentation method with which codes are created allow compact storage on disk and in memory. They also facilitate the eÏcient use of bandwidth, for Arabic text transmission, over intranets and the Internet.

Research paper thumbnail of WORKSHOP PROGRAM

acl.ldc.upenn.edu

8:30 – 9:00 Computer Processing of Arabic Script-based Languages: Current State and Future Direct... more 8:30 – 9:00 Computer Processing of Arabic Script-based Languages: Current State and Future Directions Ali Farghaly ... 9:00 – 9:30 Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools Mohamed Maamouri and Ann Bies ... 9:30 – 10:00 Preliminary Lexical Framework for English-Arabic Semantic Resource Construction Anne R. Diekema ... 10:00 – 10:30 The Architecture of a Standard Arabic Lexical Database: Some Figures, Ratios and Categories from the DIINAR.1 Source Program Ramzi Abbès, Joseph Dichy and Mohamed Hassoun

Research paper thumbnail of A Computer Aided Lexicography Tool for Making Dictionaries on  Historical Principles

Despite the long standing tradition of lexicography that Arabic prides itself on, the language do... more Despite the long standing tradition of lexicography that Arabic prides itself on, the language does not have a dictionary that states the origin of words and that traces their development across time. Several attempts have been made at it recently but failed, resulting in frustration and in the conclusion that the task is daunting. The main reason for failure is the sheer volume of work required. In this paper, we present a computational tool that would facilitate the compilation of an Arabic dictionary on historical principles. There are no openly available tools for Arabic dictionary making; if they do exist, they are jealously guarded for their commercial value; hence, they are unavailable to scholars who might want to take part in the grand endeavor of building an etymological Arabic dictionary. This research shall make its tool available to the open source community to encourage further development and refinement. The computational tool can also be used in the development of computer-assisted language learning software. Concordances, for example, are by-products of this research, yet they are invaluable to the teaching of grammar and morphology; they encourage learning by discovery.

Research paper thumbnail of Tracking Morphophonemic Transformation in ArabicWord Generation and Root Extraction

Performing root-based searching, concordancing, and grammar checking in Arabic requires an effici... more Performing root-based searching, concordancing, and grammar checking in Arabic requires an efficient method for matching stems with roots and vice versa. Such mapping is complicated by the hundreds of manifestations of the same root. An algorithm based on the generation method used by native speakers is proposed here to provide a mapping from roots to stems. Verb roots are classified by the types of their radicals and the stems they generate. Roots are moulded with morphosemantic and morphosyntactic patterns to generate stems modified for tense, voice, and mode, and affixed for different subject number, gender, and person. The surface forms of applicable morphophonemic transformations are then derived using finite state machines. This paper defines what is meant by ‘stem’, describes a stem generation engine that the
authors developed, and outlines how a generated stem database is compiled for all Arabic verbs.

Research paper thumbnail of A Framework for Benchmarking Arabic Verb Morphological Tools

Innovations in E-learning, Instruction Technology, Assessment, and Engineering Education, 2007

Arabic morphology tools are numerous, but there have been no standard tests of performance with w... more Arabic morphology tools are numerous, but there have been no standard tests of performance with which success and extent of coverage can be gauged. Much of the testing has been done by developers in accordance with ad hoc standards of their own. Although many claim success, users remain skeptical of the efficiency and level of coverage. In this study, we

Research paper thumbnail of Proceedings of the Australasian Language Technology Workshop 2003}

A key feature of this year's workshop is a special session of invited speakers from industry. In ... more A key feature of this year's workshop is a special session of invited speakers from industry. In this 'industry session', which aims to bridge the gap between academia and industry, members of different companies talk about how they apply language technology (research) in their work. Another exciting feature of this year's workshop is the Language Technology Programming Competition. It is formatted as a "shared task": all participants compete to solve the same problem. The problem highlights an active area of research and programming in the area of language technology. Details of the shared task are published in the proceedings and also presented in a special session (along with the winner of the competition.)

Research paper thumbnail of WORKSHOP PROGRAM

acl.ldc.upenn.edu

8:30 – 9:00 Computer Processing of Arabic Script-based Languages: Current State and Future Direct... more 8:30 – 9:00 Computer Processing of Arabic Script-based Languages: Current State and Future Directions Ali Farghaly ... 9:00 – 9:30 Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools Mohamed Maamouri and Ann Bies ... 9:30 – 10:00 Preliminary Lexical Framework for English-Arabic Semantic Resource Construction Anne R. Diekema ... 10:00 – 10:30 The Architecture of a Standard Arabic Lexical Database: Some Figures, Ratios and Categories from the DIINAR.1 Source Program Ramzi Abbès, Joseph Dichy and Mohamed Hassoun

Research paper thumbnail of Systematic verb stem generation for Arabic

Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages - Semitic '04, 2004

Performing root-based searching, concordancing, and grammar checking in Arabic requires an effici... more Performing root-based searching, concordancing, and grammar checking in Arabic requires an efficient method for matching stems with roots and vice versa. Such mapping is complicated by the hundreds of manifestations of the same root. Unlike many other attempts at encompassing the entirety of the problem, an algorithm is proposed here for attempting to simulate human thought processes. Roots are classified by their types of letters and by the types of stems they generate. The roots are moulded with morphosemantic and morphosyntactic patterns to generate stems modified for tense, voice, mode, number, and gender. The surface forms of applicable morphophonemic transformations are then derived using finite state machines. This paper defines what is meant by ‘stem’, describes a stem generation engine that the authors developed, and outlines how a generated stem database is compiled for all Arabic verbs.

Research paper thumbnail of T-Code Compression for Arabic Computational Morphology

It is impossible to perform root-based searching, concordancing, and grammar checking in Arabic w... more It is impossible to perform root-based searching, concordancing, and grammar checking in Arabic without a method to match words with roots and vice versa. A comprehensive word list is essential for incremental searching, predictive SMS messaging, and spell checking, but due to the derivational and inflectional nature of Arabic, a comprehensive word list is taxing on storage space and access speed. This paper describes a method for compactly storing and efficiently accessing an extensive dictionary of Arabic words by their morphological properties and roots. Compression of the dictionary is based on T-Code encoding, which follows the Huffman encoding model. The special characteristics inherent in the recursive augmentation method with which codes are created allow compact storage on disk and in memory. They also facilitate the eÏcient use of bandwidth, for Arabic text transmission, over intranets and the Internet.

Research paper thumbnail of WORKSHOP PROGRAM

acl.ldc.upenn.edu

8:30 – 9:00 Computer Processing of Arabic Script-based Languages: Current State and Future Direct... more 8:30 – 9:00 Computer Processing of Arabic Script-based Languages: Current State and Future Directions Ali Farghaly ... 9:00 – 9:30 Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools Mohamed Maamouri and Ann Bies ... 9:30 – 10:00 Preliminary Lexical Framework for English-Arabic Semantic Resource Construction Anne R. Diekema ... 10:00 – 10:30 The Architecture of a Standard Arabic Lexical Database: Some Figures, Ratios and Categories from the DIINAR.1 Source Program Ramzi Abbès, Joseph Dichy and Mohamed Hassoun

Research paper thumbnail of A Computer Aided Lexicography Tool for Making Dictionaries on  Historical Principles

Despite the long standing tradition of lexicography that Arabic prides itself on, the language do... more Despite the long standing tradition of lexicography that Arabic prides itself on, the language does not have a dictionary that states the origin of words and that traces their development across time. Several attempts have been made at it recently but failed, resulting in frustration and in the conclusion that the task is daunting. The main reason for failure is the sheer volume of work required. In this paper, we present a computational tool that would facilitate the compilation of an Arabic dictionary on historical principles. There are no openly available tools for Arabic dictionary making; if they do exist, they are jealously guarded for their commercial value; hence, they are unavailable to scholars who might want to take part in the grand endeavor of building an etymological Arabic dictionary. This research shall make its tool available to the open source community to encourage further development and refinement. The computational tool can also be used in the development of computer-assisted language learning software. Concordances, for example, are by-products of this research, yet they are invaluable to the teaching of grammar and morphology; they encourage learning by discovery.

Research paper thumbnail of Tracking Morphophonemic Transformation in ArabicWord Generation and Root Extraction

Performing root-based searching, concordancing, and grammar checking in Arabic requires an effici... more Performing root-based searching, concordancing, and grammar checking in Arabic requires an efficient method for matching stems with roots and vice versa. Such mapping is complicated by the hundreds of manifestations of the same root. An algorithm based on the generation method used by native speakers is proposed here to provide a mapping from roots to stems. Verb roots are classified by the types of their radicals and the stems they generate. Roots are moulded with morphosemantic and morphosyntactic patterns to generate stems modified for tense, voice, and mode, and affixed for different subject number, gender, and person. The surface forms of applicable morphophonemic transformations are then derived using finite state machines. This paper defines what is meant by ‘stem’, describes a stem generation engine that the
authors developed, and outlines how a generated stem database is compiled for all Arabic verbs.

Research paper thumbnail of A Framework for Benchmarking Arabic Verb Morphological Tools

Innovations in E-learning, Instruction Technology, Assessment, and Engineering Education, 2007

Arabic morphology tools are numerous, but there have been no standard tests of performance with w... more Arabic morphology tools are numerous, but there have been no standard tests of performance with which success and extent of coverage can be gauged. Much of the testing has been done by developers in accordance with ad hoc standards of their own. Although many claim success, users remain skeptical of the efficiency and level of coverage. In this study, we

Research paper thumbnail of Proceedings of the Australasian Language Technology Workshop 2003}

A key feature of this year's workshop is a special session of invited speakers from industry. In ... more A key feature of this year's workshop is a special session of invited speakers from industry. In this 'industry session', which aims to bridge the gap between academia and industry, members of different companies talk about how they apply language technology (research) in their work. Another exciting feature of this year's workshop is the Language Technology Programming Competition. It is formatted as a "shared task": all participants compete to solve the same problem. The problem highlights an active area of research and programming in the area of language technology. Details of the shared task are published in the proceedings and also presented in a special session (along with the winner of the competition.)

Research paper thumbnail of WORKSHOP PROGRAM

acl.ldc.upenn.edu

8:30 – 9:00 Computer Processing of Arabic Script-based Languages: Current State and Future Direct... more 8:30 – 9:00 Computer Processing of Arabic Script-based Languages: Current State and Future Directions Ali Farghaly ... 9:00 – 9:30 Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools Mohamed Maamouri and Ann Bies ... 9:30 – 10:00 Preliminary Lexical Framework for English-Arabic Semantic Resource Construction Anne R. Diekema ... 10:00 – 10:30 The Architecture of a Standard Arabic Lexical Database: Some Figures, Ratios and Categories from the DIINAR.1 Source Program Ramzi Abbès, Joseph Dichy and Mohamed Hassoun

Research paper thumbnail of Systematic verb stem generation for Arabic

Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages - Semitic '04, 2004

Performing root-based searching, concordancing, and grammar checking in Arabic requires an effici... more Performing root-based searching, concordancing, and grammar checking in Arabic requires an efficient method for matching stems with roots and vice versa. Such mapping is complicated by the hundreds of manifestations of the same root. Unlike many other attempts at encompassing the entirety of the problem, an algorithm is proposed here for attempting to simulate human thought processes. Roots are classified by their types of letters and by the types of stems they generate. The roots are moulded with morphosemantic and morphosyntactic patterns to generate stems modified for tense, voice, mode, number, and gender. The surface forms of applicable morphophonemic transformations are then derived using finite state machines. This paper defines what is meant by ‘stem’, describes a stem generation engine that the authors developed, and outlines how a generated stem database is compiled for all Arabic verbs.