Langa Khumalo | University of KwaZulu-Natal (original) (raw)

Papers by Langa Khumalo

Research paper thumbnail of Looking beyond Meaning in the Advanced Ndebele Dictionary 102-111

Lexikos, Oct 20, 2011

It is an established view in lexicography that the most important function of early dictionaries ... more It is an established view in lexicography that the most important function of early dictionaries was to provide information on the meaning of words of a particular language. Over the years, tendencies have emerged with modern dictionaries providing detailed linguistic information resulting in more informative dictionaries. This article discusses the presentation of grammatical information, pronunciation, tone marking and usage labels and the structure and content of the back matter in the prospective Advanced Ndebele Dictionary (henceforth AND), which will be a successor to Isichazamazwi SesiNdebele (2001), the first-ever monolingual dictionary in Ndebele. 1 It is therefore the inclusion of this additional information that is examined in this article. The AND is still restricted to the planning stages. The work that has been done on the dictionary has been confined to academic articles about the dictionary's structure and content. The current article is a third instalment on the AND following on Khumalo 2003 and 2007.

Research paper thumbnail of Defining Formats and Corpusbased Examples in the General Ndebele Dictionary, Isichazamazwi SesiNdebele *

Lexikos, Oct 20, 2011

In this article the writer evaluates the defining formats that were used in defining headwords in... more In this article the writer evaluates the defining formats that were used in defining headwords in the first monolingual General Ndebele Dictionary, Isichazamazwi SesiNdebele (ISN). The emphasis in the ISN was on the concept of user-friendliness. The article establishes that defining formats in the ISN are a judicious mixture mainly of the defining formats of the Collins Birmingham University International Language Database (COBUILD) and of what has been referred to as traditional formats. The first part of this article is an analysis of the decisions taken by the ISN editors in formulating their defining formats. It assesses the COBUILD defining principle vis-à-vis its application in defining headwords in the ISN and the impact of this principle on the userfriendliness of the dictionary. It further discusses other formats, including the decision to retain traditional defining formats for defining headwords. One of the traditional defining styles agreed upon was that the editors were to give the hypernym in the case of semantic sets, and then to identify the concept being defined by specifying aspects that distinguish it from others of its type. The second part of the article evaluates the importance and use of the corpus in providing both definitions and examples for the ISN. However, it is further argued that since a corpus has to be "representative" in terms of size in order to be appropriately used as basis for such corpus-based dictionaries, the ISN editors whose corpus was relatively small, could not avoid relying on intuitive knowledge in constructing some examples.

Research paper thumbnail of Editorial: The Role of Language in Human Existence, Education, Innovation and Research, and the Intellectualisation of African Languages

Alternation Interdisciplinary Journal for the Study of the Arts and Humanities in Southern Africa

Research paper thumbnail of RESEARCH ARTICLE Embracing the mobile phone technology: its social and linguistic impact with special reference to Zimbabwean Ndebele

Research paper thumbnail of A Corpus-based Critical Discourse Analysis of Gender Sensitivity in isiZulu: Towards an isiZulu Gender Dictionary

Alternation Interdisciplinary Journal for the Study of the Arts and Humanities in Southern Africa

There is paucity of specialized dictionaries for African languages. In this paper we argue for a ... more There is paucity of specialized dictionaries for African languages. In this paper we argue for a gender dictionary in isiZulu using a study that evaluates gender sensitivity in isiZulu. This study uses a judicious mixture of Corpus Linguistics (CL) and Critical Discourse Analysis (CDA) in the analysis of gender sensitivity in the Zulu language and culture. Using these two strands, the study critically analyses language and gender issues in isiZulu. The study is framed against the patriarchal background of the Zulu culture whose prejudice against the female gender is expressed through a unique linguistic custom called isiHlonipho (Dowling 1988; Rudwick & Shange 2006; Finlayson 1982). CDA developed from Critical Linguistics as a theory that focuses on discourse as a 'form of social practice' (Fairclough & Wodak, 1997:258). CDA explores the relationship between language, power, and society with a critical focus on the crucial role that context plays in discourse. While this framework has been criticised for its lack of objectivity on the part of the analyst, with regards to preferred choice of examples that suit the assumptions of the analyst, and its focus on the fragmented texts rather than full texts, in this study it is used to frame power relations in language use particularly when it refers to gender in isiZulu. To mitigate the evident limitations of CDA this study uses CL to provide sufficient data and context. From the evidence adduced from the corpora used in this study, we motivate for a gender dictionary in isiZulu.

Research paper thumbnail of The effects of a corpus on isiZulu spellcheckers based on N-grams

Correct spelling contributes to good content accessibility and readability for textual documents.... more Correct spelling contributes to good content accessibility and readability for textual documents. However, there are few spellcheckers for Bantu languages such as isiZulu, the major language in South Africa. The objective of this research is to investigate development of spellcheckers for isiZulu and, more generally, an approach that can be reused across Bantu languages. To fill this gap in an extensible way, we used data-driven statistical language models with trigrams and quadrigrams. The models were trained on three different isiZulu corpora, being Ukwabelana, a selection of the isiZulu National Corpus, and a small corpus of news items. The system performed better with trigrams than with quadrigrams, and performance depended on the training and testing corpora. When the system was trained with old text (bible in isiZulu), it did not perform well when tested with the two corpora that contain more recent texts, such as the constitution and news items. The highest accuracy obtained was 89%. Given that data-driven statistical language models constitute a language-independent approach, we conclude that data-driven spellcheckers for all Bantu languages are indeed feasible. They are, however, sensitive to the training and testing data. This is less resource-intensive compared to manual specification of rules, and therefore the potential impact on realising spellcheckers for Bantu languages is now practically within reach. The potential societal impact of spellchecker-supported tools and apps is incalculable.

Research paper thumbnail of Evaluation of the E ffects of a S pellchecker on the I ntellectuali s ation of I siZulu

Alternation, Dec 31, 2017

Through its bilingual language policy and plan that recognises English and isiZulu as official la... more Through its bilingual language policy and plan that recognises English and isiZulu as official languages of the University of KwaZulu-Natal (UKZN), UKZN has aggressively promoted the intellectualisation of isiZulu as an effective strategy in advancing indigenous, under-resourced African languages as vehicles for innovation, science, and technology research in Higher Education and Training institutions. UKZN recently launched human language technologies (HLTs) in isiZulu as enablers towards the intellectualisation of the language. One of these is an isiZulu spellchecker, which was trained on an organic isiZulu National Corpus. We evaluate the isiZulu spellchecker's effects on the intellectualisation of isiZulu. Two surveys were conducted with the target end-users, consisting of relevant questions and the System Usability Scale, and an analysis of words added to the spellchecker. It is evident that the spellchecker has had a positive impact on the work of target end-users, who also perceive it as an enabler in the intellectualisation of isiZulu. The survey responses show modest success for a first version of the tool. The analysis of the words added to the spellchecker indicates that new words are being added to the isiZulu lexicon.

Research paper thumbnail of On the verbalization patterns of part-whole relations in isiZulu

In the highly multilingual setting in South Africa, developing computational tools to support the... more In the highly multilingual setting in South Africa, developing computational tools to support the 11 official languages will facilitate effective communication. The exigency to develop these tools for healthcare applications and doctor-patient interaction is there. An important component in this setup is generating sentences in the language isiZulu, which involves part-whole relations to communicate, for instance, which part of one's body hurts. From a NLG viewpoint, the main challenge is the fluid use of terminology and the consequent complex agreement system inherent in the language, which is further complicated by phonological conditioning in the linguistic realisation stage. Through using a combined approach of examples and various literature, we devised verbalisation patterns for both meronymic and mereological relations, being structural/general parthood, involvement, containment, membership, subquantities, participation, and constitution. All patterns were then converted into algorithms and have been implemented as a proof-of-concept.

Research paper thumbnail of Toward Verbalizing Ontologies in isiZulu

Lecture Notes in Computer Science, 2014

IsiZulu is one of the eleven official languages of South Africa and roughly half the population c... more IsiZulu is one of the eleven official languages of South Africa and roughly half the population can speak it. It is the first (home) language for over 10 million people in South Africa. Only a few computational resources exist for isiZulu and its related Nguni languages, yet the imperative for tool development exists. We focus on natural language generation, and the grammar options and preferences in particular, which will inform verbalization of knowledge representation languages and could contribute to machine translation. The verbalization pattern specification shows that the grammar rules are elaborate and there are several options of which one may have preference. We devised verbalization patterns for subsumption, basic disjointness, existential and universal quantification, and conjunction. This was evaluated in a survey among linguists and non-linguists. Some differences between linguists and non-linguists can be observed, with the former much more in agreement, and preferences depend on the overall structure of the sentence, such as singular for subsumption and plural in other cases.

Research paper thumbnail of Grammar rules for the isiZulu complex verb

Southern African Linguistics and Applied Language Studies, Apr 3, 2017

The isiZulu verb is known for its morphological complexity, which is a subject for ongoing lingui... more The isiZulu verb is known for its morphological complexity, which is a subject for ongoing linguistics research, as well as for prospects of computational use, such as controlled natural language interfaces, machine translation, and spellcheckers. To this end, we seek to answer the question as to what the precise grammar rules for the isiZulu complex verb are (and, by extension, the Bantu verb morphology). To this end, we iteratively specify the grammar as a Context Free Grammar, and evaluate it computationally. The grammar presented in this paper covers the subject and object concords, negation, present tense, aspect, mood, and the causative, applicative, stative, and the reciprocal verbal extensions, politeness, the wh-question modifiers, and aspect doubling, ensuring their correct order as they appear in verbs. The grammar conforms to specification.

Research paper thumbnail of Toward a knowledge-to-text controlled natural language of isiZulu

Language Resources and Evaluation, Feb 4, 2016

The language isiZulu belongs to the Nguni group of languages, which also include isiXhosa, isiNde... more The language isiZulu belongs to the Nguni group of languages, which also include isiXhosa, isiNdebele and siSwati. Of the four Nguni languages, isiZulu is the most dominant language in South Africa, which is spoken by 22.7% of the country's 51.8 million population. However, isiZulu (and even more so the other Nguni languages) still remains an under-resourced language for software applications. In this article we focus on controlled natural languages for structured knowledge-to-text viewed from a potential utility for verbalising business rules and OWL ontologies. IsiZulu grammar-and by extension, all Bantu languages-shows that a template-based approach is infeasible. This is due to, mainly, the noun class system, the agglutination and verb conjugation with concords for each noun class. We present verbalisation patterns for existential and universal quantification, taxonomic subsumption, axioms with simple properties, and basic cases of negation. Based on the preliminary user assessment of the patterns, selected ones are refined into algorithms for verbalisation to generate correct isiZulu sentences, which have been evaluated.

Research paper thumbnail of Pluralising Nouns in isiZulu and Related Languages

Lecture Notes in Computer Science, 2018

There are compelling reasons for a Controlled Natural Language of isiZulu in software application... more There are compelling reasons for a Controlled Natural Language of isiZulu in software applications, which requires pluralising nouns. Only 'canonical' singular/plural pairs exist, however, which are insufficient for computational use of isiZulu. Starting from these rules, we take an experimental approach as virtuous spiral to refine the rules by repeatedly testing two test sets against successive versions of refined rules for pluralisation. This resulted in the elucidation of additional pluralisation rules not included in typical isiZulu textbooks and grammar resources and motivated design choices for algorithm development. We assessed the potential for reuse of the approach and the type of deviations with Runyankore, which demonstrated encouraging results.

Research paper thumbnail of Making Open Scholarship More Equitable and Inclusive

Publications

Democratizing access to information is an enabler for our digital future. It can transform how kn... more Democratizing access to information is an enabler for our digital future. It can transform how knowledge is created, preserved, and shared, and strengthen the connection between academics and the communities they serve. Yet, open scholarship is influenced by history and politics. This article explores the foundations underlying open scholarship as a quest for more just, equitable, and inclusive societies. It analyzes the origins of the open scholarship movement and explores how systemic factors have impacted equality and equity of knowledge access and production according to location, nationality, race, age, gender, and socio-economic circumstances. It highlights how the privileges of the global North permeate academic and technical standards, norms, and infrastructures. It also reviews how the collective design of more open and collaborative networks can engage a richer diversity of communities, enabling greater social inclusion, and presents key examples. By fostering dialogue wit...

Research paper thumbnail of The Intellectualization of African Languages through Terminology and Lexicography: Methodological Reflections with Special Reference to Lexicographic Products of the University of KwaZulu-Natal

Lexikos

Terminology development and practical lexicography are crucial in language intel­lectualization. ... more Terminology development and practical lexicography are crucial in language intel­lectualization. In South Africa, the Department of Sport, Arts and Culture, National Lexicography Units, universities, commercial publishers and other organizations have been developing terminol­ogy and publishing terminographical/lexicographical resources to facilitate the use of African languages alongside English and Afrikaans in prestigious domains. Theoretical literature in the field of lexicography (e.g., Bergenholtz and Nielsen (2006); Bergenholtz and Tarp (1995; 2010); Gouws 2020) has attempted to resolve traditional distinctions between lexicography and termi­nology while also addressing terminological imprecisions in the relevant scholarship. Taking the cue from such scholarship, this article reflects on the methodological approaches for developing lexicographical products for specific subject fields, i.e., resources that document and describe ter­minology from specialized academic and profess...

Research paper thumbnail of Disrupting Language Hegemony: Intellectualising African Languages

Research paper thumbnail of IsiNdebele

Research paper thumbnail of Digital Humanities Outlooks beyond the West

Bloomsbury Academic eBooks, 2022

Research paper thumbnail of The Passive and Stative Constructions in Ndebele 1 : A Comparative Analysis

This paper presents a comparison between the passive and the stative derivations. The stative der... more This paper presents a comparison between the passive and the stative derivations. The stative derivation, which is variously referred to in the literature as the neuter, neuter-passive, quasipassive, neuter-stative, metastatic-potential, descriptive passive 2 (Satyo 1985), is described by Doke (1947) as closely similar to the passive derivation. Doke (1947) refers to what we will call the stative derivation here as the 'Middle or Quasi-passive'. This closeness has motivated detailed comparisons of the two derivational forms. While there is no uniformity in the literature as to what the stative derivation is, our choice of the label 'stative' is well motivated. As stated in Mchombo (2004: 95), 'stative' is based on the observation that the verb denotes the result state of the base verb. It is also a label that is widely used. Mchombo (1993, 2004) looks at the passive and the stative constructions, as two distinct types of verbal extensions, working within the lexicalist theory of syntax, the Lexical Functional Grammar (LFG) theoretical framework. He proposes that the passive morpheme suppresses the agent of the transitive predicate, while the stative morpheme deletes it. Dubinsky and Simango (1996) go further arguing that the passive alters mapping from arguments to grammatical functions, as currently assumed in the Lexical Mapping Theory (henceforth LMT), and the stative performs a perfectly analogous operation on the Lexical Conceptual Structure (LCS), that is argument structure, itself. They present several differences between the two derivations beyond those originally proposed by Mchombo (1993) but are later noted in Mchombo (2004). We use the LMT to analyse the passive and stative derivation in Ndebele. The paper demonstrates that Ndebele deviates from the assumptions arrived at by both Dubinsky and Simango (1996) and Mchombo (1993 & 2004). This paper also demonstrates that the stative derivation is more restricted in Chichewa 3 than is the case in Ndebele.

Research paper thumbnail of On the Reciprocal in Ndebele

Nordic Journal of African Studies, 2014

This article presents an analysis of the reciprocal extension in the Ndebele language (S.44, ISO ... more This article presents an analysis of the reciprocal extension in the Ndebele language (S.44, ISO 639-3 nde; not to be confused with South African Ndebele, S.407, ISO 639-3nbl) using the apparatus of the Lexical Functional Grammar’s Lexical Mapping Theory. The reciprocal in Ndebele, like in most Bantu languages, is clearly marked by the verbal suffix an-. Its typical properties are that the subject NP must be plural or alternatively must be a coordinate structure and that it is an argument changing verbal extension. This article will demonstrate that in Ndebele the reciprocal verb can take the direct object. It will further show that the reciprocal in Ndebele can co-occur with the passive and finally the paper will show that the notion of transitivity is not so straightforward both at syntactic and semantic levels when viewed in the context of certain reciprocal constructions.

Research paper thumbnail of Theories

The language isiZulu is the largest in South Africa by numbers of first language speakers, yet, i... more The language isiZulu is the largest in South Africa by numbers of first language speakers, yet, it is still an underresourced language. In this paper, we approach the grammar piecemeal from a natural language generation approach, and viewed from a potential utility for verbalizing OWL ontologies as a tangible use case. The elaborate rules of the grammar show that a grammar engine and dictionary is essential even for basic verbalizations in OWL 2 EL. This is due to, mainly, the 17 noun classes with embedded semantics and the agglutinative nature of isiZulu. The verbalization of basic constructs requires merging a prefix with a noun and distinguishing an 'and' between a list and linking clauses.

Research paper thumbnail of Looking beyond Meaning in the Advanced Ndebele Dictionary 102-111

Lexikos, Oct 20, 2011

It is an established view in lexicography that the most important function of early dictionaries ... more It is an established view in lexicography that the most important function of early dictionaries was to provide information on the meaning of words of a particular language. Over the years, tendencies have emerged with modern dictionaries providing detailed linguistic information resulting in more informative dictionaries. This article discusses the presentation of grammatical information, pronunciation, tone marking and usage labels and the structure and content of the back matter in the prospective Advanced Ndebele Dictionary (henceforth AND), which will be a successor to Isichazamazwi SesiNdebele (2001), the first-ever monolingual dictionary in Ndebele. 1 It is therefore the inclusion of this additional information that is examined in this article. The AND is still restricted to the planning stages. The work that has been done on the dictionary has been confined to academic articles about the dictionary's structure and content. The current article is a third instalment on the AND following on Khumalo 2003 and 2007.

Research paper thumbnail of Defining Formats and Corpusbased Examples in the General Ndebele Dictionary, Isichazamazwi SesiNdebele *

Lexikos, Oct 20, 2011

In this article the writer evaluates the defining formats that were used in defining headwords in... more In this article the writer evaluates the defining formats that were used in defining headwords in the first monolingual General Ndebele Dictionary, Isichazamazwi SesiNdebele (ISN). The emphasis in the ISN was on the concept of user-friendliness. The article establishes that defining formats in the ISN are a judicious mixture mainly of the defining formats of the Collins Birmingham University International Language Database (COBUILD) and of what has been referred to as traditional formats. The first part of this article is an analysis of the decisions taken by the ISN editors in formulating their defining formats. It assesses the COBUILD defining principle vis-à-vis its application in defining headwords in the ISN and the impact of this principle on the userfriendliness of the dictionary. It further discusses other formats, including the decision to retain traditional defining formats for defining headwords. One of the traditional defining styles agreed upon was that the editors were to give the hypernym in the case of semantic sets, and then to identify the concept being defined by specifying aspects that distinguish it from others of its type. The second part of the article evaluates the importance and use of the corpus in providing both definitions and examples for the ISN. However, it is further argued that since a corpus has to be "representative" in terms of size in order to be appropriately used as basis for such corpus-based dictionaries, the ISN editors whose corpus was relatively small, could not avoid relying on intuitive knowledge in constructing some examples.

Research paper thumbnail of Editorial: The Role of Language in Human Existence, Education, Innovation and Research, and the Intellectualisation of African Languages

Alternation Interdisciplinary Journal for the Study of the Arts and Humanities in Southern Africa

Research paper thumbnail of RESEARCH ARTICLE Embracing the mobile phone technology: its social and linguistic impact with special reference to Zimbabwean Ndebele

Research paper thumbnail of A Corpus-based Critical Discourse Analysis of Gender Sensitivity in isiZulu: Towards an isiZulu Gender Dictionary

Alternation Interdisciplinary Journal for the Study of the Arts and Humanities in Southern Africa

There is paucity of specialized dictionaries for African languages. In this paper we argue for a ... more There is paucity of specialized dictionaries for African languages. In this paper we argue for a gender dictionary in isiZulu using a study that evaluates gender sensitivity in isiZulu. This study uses a judicious mixture of Corpus Linguistics (CL) and Critical Discourse Analysis (CDA) in the analysis of gender sensitivity in the Zulu language and culture. Using these two strands, the study critically analyses language and gender issues in isiZulu. The study is framed against the patriarchal background of the Zulu culture whose prejudice against the female gender is expressed through a unique linguistic custom called isiHlonipho (Dowling 1988; Rudwick & Shange 2006; Finlayson 1982). CDA developed from Critical Linguistics as a theory that focuses on discourse as a 'form of social practice' (Fairclough & Wodak, 1997:258). CDA explores the relationship between language, power, and society with a critical focus on the crucial role that context plays in discourse. While this framework has been criticised for its lack of objectivity on the part of the analyst, with regards to preferred choice of examples that suit the assumptions of the analyst, and its focus on the fragmented texts rather than full texts, in this study it is used to frame power relations in language use particularly when it refers to gender in isiZulu. To mitigate the evident limitations of CDA this study uses CL to provide sufficient data and context. From the evidence adduced from the corpora used in this study, we motivate for a gender dictionary in isiZulu.

Research paper thumbnail of The effects of a corpus on isiZulu spellcheckers based on N-grams

Correct spelling contributes to good content accessibility and readability for textual documents.... more Correct spelling contributes to good content accessibility and readability for textual documents. However, there are few spellcheckers for Bantu languages such as isiZulu, the major language in South Africa. The objective of this research is to investigate development of spellcheckers for isiZulu and, more generally, an approach that can be reused across Bantu languages. To fill this gap in an extensible way, we used data-driven statistical language models with trigrams and quadrigrams. The models were trained on three different isiZulu corpora, being Ukwabelana, a selection of the isiZulu National Corpus, and a small corpus of news items. The system performed better with trigrams than with quadrigrams, and performance depended on the training and testing corpora. When the system was trained with old text (bible in isiZulu), it did not perform well when tested with the two corpora that contain more recent texts, such as the constitution and news items. The highest accuracy obtained was 89%. Given that data-driven statistical language models constitute a language-independent approach, we conclude that data-driven spellcheckers for all Bantu languages are indeed feasible. They are, however, sensitive to the training and testing data. This is less resource-intensive compared to manual specification of rules, and therefore the potential impact on realising spellcheckers for Bantu languages is now practically within reach. The potential societal impact of spellchecker-supported tools and apps is incalculable.

Research paper thumbnail of Evaluation of the E ffects of a S pellchecker on the I ntellectuali s ation of I siZulu

Alternation, Dec 31, 2017

Through its bilingual language policy and plan that recognises English and isiZulu as official la... more Through its bilingual language policy and plan that recognises English and isiZulu as official languages of the University of KwaZulu-Natal (UKZN), UKZN has aggressively promoted the intellectualisation of isiZulu as an effective strategy in advancing indigenous, under-resourced African languages as vehicles for innovation, science, and technology research in Higher Education and Training institutions. UKZN recently launched human language technologies (HLTs) in isiZulu as enablers towards the intellectualisation of the language. One of these is an isiZulu spellchecker, which was trained on an organic isiZulu National Corpus. We evaluate the isiZulu spellchecker's effects on the intellectualisation of isiZulu. Two surveys were conducted with the target end-users, consisting of relevant questions and the System Usability Scale, and an analysis of words added to the spellchecker. It is evident that the spellchecker has had a positive impact on the work of target end-users, who also perceive it as an enabler in the intellectualisation of isiZulu. The survey responses show modest success for a first version of the tool. The analysis of the words added to the spellchecker indicates that new words are being added to the isiZulu lexicon.

Research paper thumbnail of On the verbalization patterns of part-whole relations in isiZulu

In the highly multilingual setting in South Africa, developing computational tools to support the... more In the highly multilingual setting in South Africa, developing computational tools to support the 11 official languages will facilitate effective communication. The exigency to develop these tools for healthcare applications and doctor-patient interaction is there. An important component in this setup is generating sentences in the language isiZulu, which involves part-whole relations to communicate, for instance, which part of one's body hurts. From a NLG viewpoint, the main challenge is the fluid use of terminology and the consequent complex agreement system inherent in the language, which is further complicated by phonological conditioning in the linguistic realisation stage. Through using a combined approach of examples and various literature, we devised verbalisation patterns for both meronymic and mereological relations, being structural/general parthood, involvement, containment, membership, subquantities, participation, and constitution. All patterns were then converted into algorithms and have been implemented as a proof-of-concept.

Research paper thumbnail of Toward Verbalizing Ontologies in isiZulu

Lecture Notes in Computer Science, 2014

IsiZulu is one of the eleven official languages of South Africa and roughly half the population c... more IsiZulu is one of the eleven official languages of South Africa and roughly half the population can speak it. It is the first (home) language for over 10 million people in South Africa. Only a few computational resources exist for isiZulu and its related Nguni languages, yet the imperative for tool development exists. We focus on natural language generation, and the grammar options and preferences in particular, which will inform verbalization of knowledge representation languages and could contribute to machine translation. The verbalization pattern specification shows that the grammar rules are elaborate and there are several options of which one may have preference. We devised verbalization patterns for subsumption, basic disjointness, existential and universal quantification, and conjunction. This was evaluated in a survey among linguists and non-linguists. Some differences between linguists and non-linguists can be observed, with the former much more in agreement, and preferences depend on the overall structure of the sentence, such as singular for subsumption and plural in other cases.

Research paper thumbnail of Grammar rules for the isiZulu complex verb

Southern African Linguistics and Applied Language Studies, Apr 3, 2017

The isiZulu verb is known for its morphological complexity, which is a subject for ongoing lingui... more The isiZulu verb is known for its morphological complexity, which is a subject for ongoing linguistics research, as well as for prospects of computational use, such as controlled natural language interfaces, machine translation, and spellcheckers. To this end, we seek to answer the question as to what the precise grammar rules for the isiZulu complex verb are (and, by extension, the Bantu verb morphology). To this end, we iteratively specify the grammar as a Context Free Grammar, and evaluate it computationally. The grammar presented in this paper covers the subject and object concords, negation, present tense, aspect, mood, and the causative, applicative, stative, and the reciprocal verbal extensions, politeness, the wh-question modifiers, and aspect doubling, ensuring their correct order as they appear in verbs. The grammar conforms to specification.

Research paper thumbnail of Toward a knowledge-to-text controlled natural language of isiZulu

Language Resources and Evaluation, Feb 4, 2016

The language isiZulu belongs to the Nguni group of languages, which also include isiXhosa, isiNde... more The language isiZulu belongs to the Nguni group of languages, which also include isiXhosa, isiNdebele and siSwati. Of the four Nguni languages, isiZulu is the most dominant language in South Africa, which is spoken by 22.7% of the country's 51.8 million population. However, isiZulu (and even more so the other Nguni languages) still remains an under-resourced language for software applications. In this article we focus on controlled natural languages for structured knowledge-to-text viewed from a potential utility for verbalising business rules and OWL ontologies. IsiZulu grammar-and by extension, all Bantu languages-shows that a template-based approach is infeasible. This is due to, mainly, the noun class system, the agglutination and verb conjugation with concords for each noun class. We present verbalisation patterns for existential and universal quantification, taxonomic subsumption, axioms with simple properties, and basic cases of negation. Based on the preliminary user assessment of the patterns, selected ones are refined into algorithms for verbalisation to generate correct isiZulu sentences, which have been evaluated.

Research paper thumbnail of Pluralising Nouns in isiZulu and Related Languages

Lecture Notes in Computer Science, 2018

There are compelling reasons for a Controlled Natural Language of isiZulu in software application... more There are compelling reasons for a Controlled Natural Language of isiZulu in software applications, which requires pluralising nouns. Only 'canonical' singular/plural pairs exist, however, which are insufficient for computational use of isiZulu. Starting from these rules, we take an experimental approach as virtuous spiral to refine the rules by repeatedly testing two test sets against successive versions of refined rules for pluralisation. This resulted in the elucidation of additional pluralisation rules not included in typical isiZulu textbooks and grammar resources and motivated design choices for algorithm development. We assessed the potential for reuse of the approach and the type of deviations with Runyankore, which demonstrated encouraging results.

Research paper thumbnail of Making Open Scholarship More Equitable and Inclusive

Publications

Democratizing access to information is an enabler for our digital future. It can transform how kn... more Democratizing access to information is an enabler for our digital future. It can transform how knowledge is created, preserved, and shared, and strengthen the connection between academics and the communities they serve. Yet, open scholarship is influenced by history and politics. This article explores the foundations underlying open scholarship as a quest for more just, equitable, and inclusive societies. It analyzes the origins of the open scholarship movement and explores how systemic factors have impacted equality and equity of knowledge access and production according to location, nationality, race, age, gender, and socio-economic circumstances. It highlights how the privileges of the global North permeate academic and technical standards, norms, and infrastructures. It also reviews how the collective design of more open and collaborative networks can engage a richer diversity of communities, enabling greater social inclusion, and presents key examples. By fostering dialogue wit...

Research paper thumbnail of The Intellectualization of African Languages through Terminology and Lexicography: Methodological Reflections with Special Reference to Lexicographic Products of the University of KwaZulu-Natal

Lexikos

Terminology development and practical lexicography are crucial in language intel­lectualization. ... more Terminology development and practical lexicography are crucial in language intel­lectualization. In South Africa, the Department of Sport, Arts and Culture, National Lexicography Units, universities, commercial publishers and other organizations have been developing terminol­ogy and publishing terminographical/lexicographical resources to facilitate the use of African languages alongside English and Afrikaans in prestigious domains. Theoretical literature in the field of lexicography (e.g., Bergenholtz and Nielsen (2006); Bergenholtz and Tarp (1995; 2010); Gouws 2020) has attempted to resolve traditional distinctions between lexicography and termi­nology while also addressing terminological imprecisions in the relevant scholarship. Taking the cue from such scholarship, this article reflects on the methodological approaches for developing lexicographical products for specific subject fields, i.e., resources that document and describe ter­minology from specialized academic and profess...

Research paper thumbnail of Disrupting Language Hegemony: Intellectualising African Languages

Research paper thumbnail of IsiNdebele

Research paper thumbnail of Digital Humanities Outlooks beyond the West

Bloomsbury Academic eBooks, 2022

Research paper thumbnail of The Passive and Stative Constructions in Ndebele 1 : A Comparative Analysis

This paper presents a comparison between the passive and the stative derivations. The stative der... more This paper presents a comparison between the passive and the stative derivations. The stative derivation, which is variously referred to in the literature as the neuter, neuter-passive, quasipassive, neuter-stative, metastatic-potential, descriptive passive 2 (Satyo 1985), is described by Doke (1947) as closely similar to the passive derivation. Doke (1947) refers to what we will call the stative derivation here as the 'Middle or Quasi-passive'. This closeness has motivated detailed comparisons of the two derivational forms. While there is no uniformity in the literature as to what the stative derivation is, our choice of the label 'stative' is well motivated. As stated in Mchombo (2004: 95), 'stative' is based on the observation that the verb denotes the result state of the base verb. It is also a label that is widely used. Mchombo (1993, 2004) looks at the passive and the stative constructions, as two distinct types of verbal extensions, working within the lexicalist theory of syntax, the Lexical Functional Grammar (LFG) theoretical framework. He proposes that the passive morpheme suppresses the agent of the transitive predicate, while the stative morpheme deletes it. Dubinsky and Simango (1996) go further arguing that the passive alters mapping from arguments to grammatical functions, as currently assumed in the Lexical Mapping Theory (henceforth LMT), and the stative performs a perfectly analogous operation on the Lexical Conceptual Structure (LCS), that is argument structure, itself. They present several differences between the two derivations beyond those originally proposed by Mchombo (1993) but are later noted in Mchombo (2004). We use the LMT to analyse the passive and stative derivation in Ndebele. The paper demonstrates that Ndebele deviates from the assumptions arrived at by both Dubinsky and Simango (1996) and Mchombo (1993 & 2004). This paper also demonstrates that the stative derivation is more restricted in Chichewa 3 than is the case in Ndebele.

Research paper thumbnail of On the Reciprocal in Ndebele

Nordic Journal of African Studies, 2014

This article presents an analysis of the reciprocal extension in the Ndebele language (S.44, ISO ... more This article presents an analysis of the reciprocal extension in the Ndebele language (S.44, ISO 639-3 nde; not to be confused with South African Ndebele, S.407, ISO 639-3nbl) using the apparatus of the Lexical Functional Grammar’s Lexical Mapping Theory. The reciprocal in Ndebele, like in most Bantu languages, is clearly marked by the verbal suffix an-. Its typical properties are that the subject NP must be plural or alternatively must be a coordinate structure and that it is an argument changing verbal extension. This article will demonstrate that in Ndebele the reciprocal verb can take the direct object. It will further show that the reciprocal in Ndebele can co-occur with the passive and finally the paper will show that the notion of transitivity is not so straightforward both at syntactic and semantic levels when viewed in the context of certain reciprocal constructions.

Research paper thumbnail of Theories

The language isiZulu is the largest in South Africa by numbers of first language speakers, yet, i... more The language isiZulu is the largest in South Africa by numbers of first language speakers, yet, it is still an underresourced language. In this paper, we approach the grammar piecemeal from a natural language generation approach, and viewed from a potential utility for verbalizing OWL ontologies as a tangible use case. The elaborate rules of the grammar show that a grammar engine and dictionary is essential even for basic verbalizations in OWL 2 EL. This is due to, mainly, the 17 noun classes with embedded semantics and the agglutinative nature of isiZulu. The verbalization of basic constructs requires merging a prefix with a noun and distinguishing an 'and' between a list and linking clauses.