Decy N'dezz | Universitas Kuningan (original) (raw)

Related Authors

Paul Rodrigues

Ahmed Magdy Ezzeldin

Dan Tufis

Dan Tufis

Romanian Academy Institute for Artificial Intelligence

Harald Hammarström

Jesus Vilares

Iadh Ounis

Marco  Boella

Uploads

Papers by Decy N'dezz

Research paper thumbnail of Text classification and multilinguism: Getting at words via n-grams of characters

Proceedings of SCI-2002, 6th World …, Jan 1, 2002

Genuine numerical multilingual text classification is almost impossible if only words are treated... more Genuine numerical multilingual text classification is almost impossible if only words are treated as the privileged unit of information. Although text tokenization (in which words are considered as tokens) is relatively easy in English or French, it is much more difficult for other languages such as German or Arabic. Moreover, stemming, typically used to normalize and reduce the size of the lexicon, constitutes another challenge. The notion of N-grams of words (i.e. sequences of N words, with N typically equals to 2, 3 or 4) which, for the last ten years, seems to have produced good results both in language identification and speech analysis, has recently become a privileged research axis in several areas of knowledge extraction from text. In this paper, we present a text classification software based on N-grams of characters (not words), evaluate its results on documents containing text written in English and French, and compare these results with those obtained from a different classification tool based exclusively on the processing of words. An interesting feature of our software is that it does not need to perform any language-specific processing and is thus appropriate for multilingual text classification.

Research paper thumbnail of Multilingualism and the education of minority children

Policy and practice in bilingual education: …, Jan 1, 1995

... speaks German/Dutch/Danish/English all day long at the day care centre/in school, much more t... more ... speaks German/Dutch/Danish/English all day long at the day care centre/in school, much more than Turkish, she even uses German with siblings, so German/Dutch/Danish/English must be ... The child is labelled as a majority language speaker, or she is denied teaching in the ...

Research paper thumbnail of Multilingualism

By looking at the effect of language difference, rather than at theories of language, John Edward... more By looking at the effect of language difference, rather than at theories of language, John Edwards examines the interaction of language with nationalism, politics, history, identity and education.< br> He illustrates his arguments with a rangew of examples, from recent ...

Research paper thumbnail of Text classification and multilinguism: Getting at words via n-grams of characters

Proceedings of SCI-2002, 6th World …, Jan 1, 2002

Genuine numerical multilingual text classification is almost impossible if only words are treated... more Genuine numerical multilingual text classification is almost impossible if only words are treated as the privileged unit of information. Although text tokenization (in which words are considered as tokens) is relatively easy in English or French, it is much more difficult for other languages such as German or Arabic. Moreover, stemming, typically used to normalize and reduce the size of the lexicon, constitutes another challenge. The notion of N-grams of words (i.e. sequences of N words, with N typically equals to 2, 3 or 4) which, for the last ten years, seems to have produced good results both in language identification and speech analysis, has recently become a privileged research axis in several areas of knowledge extraction from text. In this paper, we present a text classification software based on N-grams of characters (not words), evaluate its results on documents containing text written in English and French, and compare these results with those obtained from a different classification tool based exclusively on the processing of words. An interesting feature of our software is that it does not need to perform any language-specific processing and is thus appropriate for multilingual text classification.

Research paper thumbnail of Multilingualism and the education of minority children

Policy and practice in bilingual education: …, Jan 1, 1995

... speaks German/Dutch/Danish/English all day long at the day care centre/in school, much more t... more ... speaks German/Dutch/Danish/English all day long at the day care centre/in school, much more than Turkish, she even uses German with siblings, so German/Dutch/Danish/English must be ... The child is labelled as a majority language speaker, or she is denied teaching in the ...

Research paper thumbnail of Multilingualism

By looking at the effect of language difference, rather than at theories of language, John Edward... more By looking at the effect of language difference, rather than at theories of language, John Edwards examines the interaction of language with nationalism, politics, history, identity and education.< br> He illustrates his arguments with a rangew of examples, from recent ...

Log In