Khalid Shakir Hussein | Thi-Qar UNIVERSITY (original) (raw)

Uploads

Papers by Khalid Shakir Hussein

Research paper thumbnail of Authorship Verification in Arabic Using Function Words: A Controversial Case Study of Imam Ali's Book Peak of Eloquence

International Journal of Humanities and Arts Computing

This paper addresses the viability of two multivariate methods (Principal Components Analysis and... more This paper addresses the viability of two multivariate methods (Principal Components Analysis and Cluster Analysis) in verifying the disputed authorship of a famous Arabic religious book called (Nahjul-Balagha/ Peak of Eloquence). This book occupies an exceptional position in the history of the huge debates held between the two basic Islamic sectors: Sunni'e and Shia. Therefore, it represents a serious challenge to the viability of the multivariate techniques in resolving certain types of historical and sectarian conflicts and controversies. Furthermore, verifying the authorship of this book could be a good opportunity to find out whether there are certain quantitative techniques of attribution that hold for different languages such as English and Arabic. Function words have been targeted in this paper as possible indicators of the author's identity. Accordingly, a set of Arabic function words would be tested using WordSmith Tools (version 5). It turned out that the multivar...

Research paper thumbnail of 625761.pdf

Corpus linguistics provides the methodology to extract meaning from texts.

Research paper thumbnail of Lexical-Richness Curve as A Marker In Genre Analysis A Corpus-based Study

This paper investigates 80 textual (computerized) samples belonging to four different genres: Pop... more This paper investigates 80 textual (computerized) samples belonging to four different genres: Popular Science, Fiction, Research Articles and Editorials. All are downloaded from the World Wide Web. Wordsmith Tools (5.0) package is used to calculate the frequencies of types and tokens for each sample. These frequencies are used, with the aid of Excel spreadsheets, to figure out lexical richness curve for each group of samples belonging to one genre. It is quite evident that this curve might be used efficiently to bring out statistical distinctions of genres: editorials are lexically the richest and Research articles are the lowest. As for Fiction and Popular Science, they come somewhere in between scoring a vocabulary size that can not compete with the Editorials and is rather more sizeable that that of Research articles.

Research paper thumbnail of THE POTENTIALITIES OF CORPUS-BASED TECHNIQUES FOR ANALYZING LITERATURE

This paper presents an attempt to explore the analytical potential of five corpus-based technique... more This paper presents an attempt to explore the analytical potential of five corpus-based techniques: concordances, frequency lists, keyword lists, collocate lists, and dispersion plots. The basic question addressed is related to the contribution that these techniques make to gain more objective and insightful knowledge of the way literary meanings are encoded and of the way the literary language is organized. Three sizable English novels (Joyc's Ulysses, Woolf's The Waves, and Faulkner's As I Lay Dying) are laid to corpus linguistic analysis. It is only by virtue of corpus-based techniques that huge amounts of literary data are analyzable. Otherwise, the data will keep on to be not more than several lines of poetry or short excerpts of narrative. The corpus-based techniques presented throughout this paper contribute more or less to a sort of rigorous interpretation of literary texts far from the intuitive approaches usually utilized in traditional stylistics.

Research paper thumbnail of Plagiarism and Patchwriting Detection in EFL Students' Graduation Research Writing

Research on Humanities and Social Sciences, 2015

This study aims at detecting plagiarism and patchwritting in Iraqi EFL students' graduation resea... more This study aims at detecting plagiarism and patchwritting in Iraqi EFL students' graduation research papers. To accomplish this aim, five graduation research papers were analyzed. Findings indicate that Iraqi EFL students when writing from sources commit extensive copying from the source (plagiarsism) or stitching sentences together to form a paragraph (patchwriting). So instead of writing about the source they find themselves coping from it. The researchers find that this misuse of sources is due to many reasons that were revealed in a questionnaire conducted throughout the study on 20 Iraqi fourth stage University students, exploring why they commit direct coping from their sources. Figures indicate evidently that (60%) of the students expressed their lack of knowledge in the possible techniques that could be used when writing from sources like summarizing and paraphrasing. This lack of knowledge plays an essential role in the difficulty students face in their research writing process. To avoid plagiarism and patchwriting, further research is most definitely urgent so that the Iraqi students might be more aware of their misuses of sources and of the misconducting techniques they employ in writing their graduation research papers.

Research paper thumbnail of An Experiment in Plagiarism Detection in Academic Research Articles Using Attributional Techniques

Research on Humanities and Social Sciences, 2014

There are certain overlapping aspects that brings plagiarism detection and authorship attribution... more There are certain overlapping aspects that brings plagiarism detection and authorship attribution together in one basket. Suspected cases of plagiarism can be interpreted as special cases of disputed or misattributed authorship. Being so, the same techniques used to resolve doubtful cases of attribution can be used to investigate any potential existence of plagiarism. Principal components analysis and cluster analysis (henceforth PCA and CA) are among the popular statistical techniques used to proceed with various attributional scenarios. These two techniques are used throughout this paper to explore the patterns of function words displayed in seventeen samples of one specific genre (academic research articles). A survey is conducted over various cases of academic attribution: academic writing in English as a First Language and as a Second Language, and even cases of research articles with mixed authorship. Function words have been targeted in this paper as possible indicators of the author's identity. Accordingly a set of English function words is tested using WordSmith Tools (version 5.0). It turned out that the multivariate techniques (represented by PCA and CA) are most likely robust for addressing the type of issues raised about plagiarism and authorship attribution. Besides, it appeared that the statistical patterns of function words usage are rather relevant markers to deal with various scenarios of potential cases of plagiarism. This could explain the three different clusters plotted in the data environment for Halliday's samples, the Phillippine's samples together with his collaboratively authored ones, and the Iraqi's suspected samples that represented a highly potential case of plagiarism.

Research paper thumbnail of The Comparative Power of Type/Token and Hapax legomena/Type Ratios: A Corpus-based Study of Authorial Differentiation

Advances in Language and Literary Studies, Jun 1, 2014

This paper presents an attempt to verify the comparative power of two statistical features: Type/... more This paper presents an attempt to verify the comparative power of two statistical features: Type/Token, and Hapax legomena/Token ratios (henceforth TTR and HTR). A corpus of ten novels is compiled. Then sixteen samples (each is 5,000 tokens in length) are taken randomly out of these novels as representative blocks. The researchers observe the way TTR and HTR behave in discriminating four novelists: Joyce, Woolf, Faulkner and Hemingway. When compared to the traditional statistical features (e.g. word length average, Sentence length average, etc.), TTR and HTR are by far more competent in comparing the distinctive quantitative behavior of each novelist. It turns out that TTR and HTR contribute more or less in creating a sort of statistical identity which can be used in giving a vivid comparison and discrimination of the four novelists involved in this paper. Nevertheless, HTR sounds more viable in achieving the discriminating task than TTR.

Research paper thumbnail of ELR Journal

This paper investigates a number of text corpora belonging to different genres, and applies vario... more This paper investigates a number of text corpora belonging to different genres, and applies various statistical methods on features extracted from them. Through this empirical analysis we can identify internal criteria which support the assignment of genres to texts based on external ones. Table 5: The top 50 key content-words in academic, newspaper and literature texts (P < 0.05) Academic Newspaper Literature Order Key word Keyness Key word Keyness Key word Keyness

Research paper thumbnail of Measuring Lexical Richness Through Type-Token Curve: a Corpus-based Analysis of Arabic and English Texts

WordSmith Tools (5.0) is used to analyze samples from texts of different genres written by eight ... more WordSmith Tools (5.0) is used to analyze samples from texts of different genres written by eight different authors. These texts are grouped into two corpora: Arabic and English. The Arabic corpus includes textual samples from the Qur'an, Al-Sahifa al-Sajjadiyya (a prayer manual), Modern Standard Arabic and Eliot's Adam Bede. Each textual sample is statistically analyzed to find about its lexical richness or vocabulary size. The number of tokens (total number of words) and the number of types (distinct vocabulary words) are counted for each sample. Then both numbers are plotted against each other using Microsoft Office Excel diagrams. The resulted curves in both corpora give a vivid idea about the lexical richness of each textual sample. They open an active avenue to compare between the different authors in terms of their vocabulary size and the range at which they begin to exhaust their linguistic repertoire by repetition. The curves for Imam Ali's Nahjul-Balagah (Arabic corpus) and Conrad's Heart of Darkness (English corpus) rise up high reaching the maximum. By contrast, Qur'anic Verses and The New Testament have the lowest curve for the ritualistic quality of their texts.

Research paper thumbnail of A Corpus-based Stylistic Analysis of Body-Soul and Heaviness-Lightness Metaphors in Kundera's Novel The Unbearable Lightness of Being

This paper represents an attempt to conduct a corpus-based stylistic analysis of two conceptual ... more This paper represents an attempt to conduct a corpus-based stylistic analysis of two conceptual metaphors in The Unbearable Lightness of Being, which is a novel written by Milan Kundera. Soul-body and lightness-heaviness metaphors are foregrounded as being central themes all through the novel. The way such metaphors are used in the novel indicates an insightful employment of metaphor as a cognitive tool which empowers language users with a capacity of conceptualizing different experiences. The researcher adopts conceptual metaphor theory to produce a sort of conceptual analysis incorporating Leech's semantic componential analysis within the overall analytic procedure. Different techniques are figured out in relation to the creative ways of manipulating the cognitive level of language, such as conceptual switching, conceptual extension, and conceptual fusion. These creative techniques are carefully used in the novel under investigation with different ranges of metaphorical creativity. Conceptual switching might be simple but very active in deviating from the conventional conceptual system. Conceptual extension marks certain minute elaborations conventional metaphors undergo extending the limits of cognitive conceptualization. As for conceptual fusion, it proves to be interestingly powerful in producing certain aggregations of metaphorical mappings.

Research paper thumbnail of Authorship Verification in Arabic Using Function Words: A Controversial Case Study of Imam Ali's Book Peak of Eloquence

International Journal of Humanities and Arts Computing

This paper addresses the viability of two multivariate methods (Principal Components Analysis and... more This paper addresses the viability of two multivariate methods (Principal Components Analysis and Cluster Analysis) in verifying the disputed authorship of a famous Arabic religious book called (Nahjul-Balagha/ Peak of Eloquence). This book occupies an exceptional position in the history of the huge debates held between the two basic Islamic sectors: Sunni'e and Shia. Therefore, it represents a serious challenge to the viability of the multivariate techniques in resolving certain types of historical and sectarian conflicts and controversies. Furthermore, verifying the authorship of this book could be a good opportunity to find out whether there are certain quantitative techniques of attribution that hold for different languages such as English and Arabic. Function words have been targeted in this paper as possible indicators of the author's identity. Accordingly, a set of Arabic function words would be tested using WordSmith Tools (version 5). It turned out that the multivar...

Research paper thumbnail of 625761.pdf

Corpus linguistics provides the methodology to extract meaning from texts.

Research paper thumbnail of Lexical-Richness Curve as A Marker In Genre Analysis A Corpus-based Study

This paper investigates 80 textual (computerized) samples belonging to four different genres: Pop... more This paper investigates 80 textual (computerized) samples belonging to four different genres: Popular Science, Fiction, Research Articles and Editorials. All are downloaded from the World Wide Web. Wordsmith Tools (5.0) package is used to calculate the frequencies of types and tokens for each sample. These frequencies are used, with the aid of Excel spreadsheets, to figure out lexical richness curve for each group of samples belonging to one genre. It is quite evident that this curve might be used efficiently to bring out statistical distinctions of genres: editorials are lexically the richest and Research articles are the lowest. As for Fiction and Popular Science, they come somewhere in between scoring a vocabulary size that can not compete with the Editorials and is rather more sizeable that that of Research articles.

Research paper thumbnail of THE POTENTIALITIES OF CORPUS-BASED TECHNIQUES FOR ANALYZING LITERATURE

This paper presents an attempt to explore the analytical potential of five corpus-based technique... more This paper presents an attempt to explore the analytical potential of five corpus-based techniques: concordances, frequency lists, keyword lists, collocate lists, and dispersion plots. The basic question addressed is related to the contribution that these techniques make to gain more objective and insightful knowledge of the way literary meanings are encoded and of the way the literary language is organized. Three sizable English novels (Joyc's Ulysses, Woolf's The Waves, and Faulkner's As I Lay Dying) are laid to corpus linguistic analysis. It is only by virtue of corpus-based techniques that huge amounts of literary data are analyzable. Otherwise, the data will keep on to be not more than several lines of poetry or short excerpts of narrative. The corpus-based techniques presented throughout this paper contribute more or less to a sort of rigorous interpretation of literary texts far from the intuitive approaches usually utilized in traditional stylistics.

Research paper thumbnail of Plagiarism and Patchwriting Detection in EFL Students' Graduation Research Writing

Research on Humanities and Social Sciences, 2015

This study aims at detecting plagiarism and patchwritting in Iraqi EFL students' graduation resea... more This study aims at detecting plagiarism and patchwritting in Iraqi EFL students' graduation research papers. To accomplish this aim, five graduation research papers were analyzed. Findings indicate that Iraqi EFL students when writing from sources commit extensive copying from the source (plagiarsism) or stitching sentences together to form a paragraph (patchwriting). So instead of writing about the source they find themselves coping from it. The researchers find that this misuse of sources is due to many reasons that were revealed in a questionnaire conducted throughout the study on 20 Iraqi fourth stage University students, exploring why they commit direct coping from their sources. Figures indicate evidently that (60%) of the students expressed their lack of knowledge in the possible techniques that could be used when writing from sources like summarizing and paraphrasing. This lack of knowledge plays an essential role in the difficulty students face in their research writing process. To avoid plagiarism and patchwriting, further research is most definitely urgent so that the Iraqi students might be more aware of their misuses of sources and of the misconducting techniques they employ in writing their graduation research papers.

Research paper thumbnail of An Experiment in Plagiarism Detection in Academic Research Articles Using Attributional Techniques

Research on Humanities and Social Sciences, 2014

There are certain overlapping aspects that brings plagiarism detection and authorship attribution... more There are certain overlapping aspects that brings plagiarism detection and authorship attribution together in one basket. Suspected cases of plagiarism can be interpreted as special cases of disputed or misattributed authorship. Being so, the same techniques used to resolve doubtful cases of attribution can be used to investigate any potential existence of plagiarism. Principal components analysis and cluster analysis (henceforth PCA and CA) are among the popular statistical techniques used to proceed with various attributional scenarios. These two techniques are used throughout this paper to explore the patterns of function words displayed in seventeen samples of one specific genre (academic research articles). A survey is conducted over various cases of academic attribution: academic writing in English as a First Language and as a Second Language, and even cases of research articles with mixed authorship. Function words have been targeted in this paper as possible indicators of the author's identity. Accordingly a set of English function words is tested using WordSmith Tools (version 5.0). It turned out that the multivariate techniques (represented by PCA and CA) are most likely robust for addressing the type of issues raised about plagiarism and authorship attribution. Besides, it appeared that the statistical patterns of function words usage are rather relevant markers to deal with various scenarios of potential cases of plagiarism. This could explain the three different clusters plotted in the data environment for Halliday's samples, the Phillippine's samples together with his collaboratively authored ones, and the Iraqi's suspected samples that represented a highly potential case of plagiarism.

Research paper thumbnail of The Comparative Power of Type/Token and Hapax legomena/Type Ratios: A Corpus-based Study of Authorial Differentiation

Advances in Language and Literary Studies, Jun 1, 2014

This paper presents an attempt to verify the comparative power of two statistical features: Type/... more This paper presents an attempt to verify the comparative power of two statistical features: Type/Token, and Hapax legomena/Token ratios (henceforth TTR and HTR). A corpus of ten novels is compiled. Then sixteen samples (each is 5,000 tokens in length) are taken randomly out of these novels as representative blocks. The researchers observe the way TTR and HTR behave in discriminating four novelists: Joyce, Woolf, Faulkner and Hemingway. When compared to the traditional statistical features (e.g. word length average, Sentence length average, etc.), TTR and HTR are by far more competent in comparing the distinctive quantitative behavior of each novelist. It turns out that TTR and HTR contribute more or less in creating a sort of statistical identity which can be used in giving a vivid comparison and discrimination of the four novelists involved in this paper. Nevertheless, HTR sounds more viable in achieving the discriminating task than TTR.

Research paper thumbnail of ELR Journal

This paper investigates a number of text corpora belonging to different genres, and applies vario... more This paper investigates a number of text corpora belonging to different genres, and applies various statistical methods on features extracted from them. Through this empirical analysis we can identify internal criteria which support the assignment of genres to texts based on external ones. Table 5: The top 50 key content-words in academic, newspaper and literature texts (P < 0.05) Academic Newspaper Literature Order Key word Keyness Key word Keyness Key word Keyness

Research paper thumbnail of Measuring Lexical Richness Through Type-Token Curve: a Corpus-based Analysis of Arabic and English Texts

WordSmith Tools (5.0) is used to analyze samples from texts of different genres written by eight ... more WordSmith Tools (5.0) is used to analyze samples from texts of different genres written by eight different authors. These texts are grouped into two corpora: Arabic and English. The Arabic corpus includes textual samples from the Qur'an, Al-Sahifa al-Sajjadiyya (a prayer manual), Modern Standard Arabic and Eliot's Adam Bede. Each textual sample is statistically analyzed to find about its lexical richness or vocabulary size. The number of tokens (total number of words) and the number of types (distinct vocabulary words) are counted for each sample. Then both numbers are plotted against each other using Microsoft Office Excel diagrams. The resulted curves in both corpora give a vivid idea about the lexical richness of each textual sample. They open an active avenue to compare between the different authors in terms of their vocabulary size and the range at which they begin to exhaust their linguistic repertoire by repetition. The curves for Imam Ali's Nahjul-Balagah (Arabic corpus) and Conrad's Heart of Darkness (English corpus) rise up high reaching the maximum. By contrast, Qur'anic Verses and The New Testament have the lowest curve for the ritualistic quality of their texts.

Research paper thumbnail of A Corpus-based Stylistic Analysis of Body-Soul and Heaviness-Lightness Metaphors in Kundera's Novel The Unbearable Lightness of Being

This paper represents an attempt to conduct a corpus-based stylistic analysis of two conceptual ... more This paper represents an attempt to conduct a corpus-based stylistic analysis of two conceptual metaphors in The Unbearable Lightness of Being, which is a novel written by Milan Kundera. Soul-body and lightness-heaviness metaphors are foregrounded as being central themes all through the novel. The way such metaphors are used in the novel indicates an insightful employment of metaphor as a cognitive tool which empowers language users with a capacity of conceptualizing different experiences. The researcher adopts conceptual metaphor theory to produce a sort of conceptual analysis incorporating Leech's semantic componential analysis within the overall analytic procedure. Different techniques are figured out in relation to the creative ways of manipulating the cognitive level of language, such as conceptual switching, conceptual extension, and conceptual fusion. These creative techniques are carefully used in the novel under investigation with different ranges of metaphorical creativity. Conceptual switching might be simple but very active in deviating from the conventional conceptual system. Conceptual extension marks certain minute elaborations conventional metaphors undergo extending the limits of cognitive conceptualization. As for conceptual fusion, it proves to be interestingly powerful in producing certain aggregations of metaphorical mappings.