Deep end of the feature pool (original) (raw)

SYNTACTIC DEPENDENCIES IN MANDARIN CHINESE

1997

This thesis investigates two kinds of syntactic dependencies in Mandarin Chinese: uninterpretable feature checking of clause-level functional heads and dou quantificational binding. Three issues are discussed with respect to checking: object shift, yes/no questions, and the aspect particle le raising. Chinese object shift is argued to be triggered by a focus marker which adjoins to an object. This study also presents a unified treatment of various types of yes/no questions in Chinese. The uninterpretable [Q] of yes/no question C is checked either by the merged particle ma, by overt movement of bu/mei(you)-V from Neg to C, or by covert movement of [Q] of shi-bu-shi and A-not-A words. In both object shift and questions the optionality between overt and covert checking occurs. The thesis argues that the strong value of the feature strength of a functional head can be triggered by a certain feature in its complement domain. Thus the choice of overt vs. covert checking can be determined in the computation.

AHP 1 A Response to Ways and the Syntax of Noun Phrases in Qinghai Chinese Dialects

In the course of offering a review of Zhang Chengcai's Ways, this paper describes the syntax of noun phrases in the Chinese dialect of Huangshui, in Qinghai Province. Unlike other Chinese dialects, this dialect employs several postpositions for indicating syntactic nominal relationships. The origin of this phenomenon in contact with non-Sinitic languages in the region and its significance are also explored.

C L ] 2 3 A pr 2 01 8 Detecting Syntactic Features of Translated Chinese

We present a machine learning approach to distinguish texts translated to Chinese (by humans) from texts originally written in Chinese, with a focus on a wide range of syntactic features. Using Support Vector Machines (SVMs) as classifier on a genre-balanced corpus in translation studies of Chinese, we find that constituent parse trees and dependency triples as features without lexical information perform very well on the task, with an Fmeasure above 90%, close to the results of lexical n-gram features, without the risk of learning topic information rather than translation features. Thus, we claim syntactic features alone can accurately distinguish translated from original Chinese. Translated Chinese exhibits an increased use of determiners, subject position pronouns, NP + "的" as NP modifiers, multiple NPs or VPs conjoined by "、", among other structures. We also interpret the syntactic features with reference to previous translation studies in Chinese, particularly the usage of pronouns.

Intertwined model of syntactic borrowing in the Gansu Qinghai linguistic area

This paper studies two grammatical cases in the Gansu-Qinghai linguistic area. Accusative-dative, a syncretic case largely attested in Sinitic languages, is also found in Bao'an and Tu, even if in a very limited use. The Sinitic languages have acquired this syncretic case marking through pattern reduplication due to language contact, while Bao'an and Tu have this innovation owing to the internal mechanisms of their language. The second phenomenon concerns possessor constructions in which the subject-possessor must be marked by a dative case. This marking is seen in all non-Sinitic languages in the Gansu-Qinghai linguistic area and has begun to appear in Sinitic languages. Multiple paths for borrowing between and inside languages in this area present an intertwined model of language borrowing. Linxia City and its closest counties should be the spreading center of these new syntactic devices, and Muslim populations speaking different languages may form a spreading net.

Corpus-based Study and Identification of Mandarin Chinese Light Verb Variations

Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, 2014

When PRC was founded on mainland China and the KMT retreated to Taiwan in 1949, the relation between mainland China and Taiwan became a classical Cold War instance. Neither travel, visit, nor correspondences were allowed between the people until 1987, when government on both sides started to allow small number of Taiwan people with relatives in China to return to visit through a third location. Although the thawing eventually lead to frequent exchanges, direct travel links, and close commercial ties between Taiwan and mainland China today, 38 years of total isolation from each other did allow the language use to develop into different varieties, which have become a popular topic for mainly lexical studies (e.g., Xu, 1995; Zeng, 1995; Wang & Li, 1996). Grammatical difference of these two variants, however, was not well studied beyond anecdotal observation, partly because the near identity of their grammatical systems. This paper focuses on light verb variations in Mainland and Taiwan variants and finds that the light verbs of these two variants indeed show distributional tendencies. Light verbs are chosen for two reasons: first, they are semantically bleached hence more susceptible to changes and variations. Second, the classification of light verbs is a challenging topic in NLP. We hope our study will contribute to the study of light verbs in Chinese in general. The data adopted for this study was a comparable corpus extracted from Chinese Gigaword Corpus and manually annotated with contextual features that may contribute to light verb variations. A multivariate analysis was conducted to show that for each light verb there is at least one context where the two variants show differences in tendencies (usually the presence/absence of a tendency rather than contrasting tendencies) and can be differentiated. In addition, we carried out a K-Means clustering analysis for the variations and the results are consistent with the multivariate analysis, i.e. the light verbs in Mainland and Taiwan indeed have variations and the variations can be successfully differentiated. 1 Introduction: Language Variations in the Chinese Context Commonly dichotomy of language and dialect is not easily maintained in the context of Chinese language(s). Cantonese, Min, Hakka, and Wu are traditionally referred to as dialects of Chinese but are mutually unintelligible. However, they do share a common writing system and literary and textual tradition, which allows speakers to have a shared linguistic identity. To overcome the mutual unintelligibility problem, a variant of Northern Mandarin Chinese, is designated as the common language about a hundred years ago (called 普通話 Putonghua 'common language' in Mainland China, and 國語 Guoyu 'national language' in Taiwan). Referred to as Mandarin or Mandarin Chinese, or simply Chinese nowadays, this is the one of the most commonly learned first or second languages in the world now. However, not unlike English, with the fast globalization of the Chinese language, both the term 'World Chineses' and the recognition that there are different variants of Chinese emerged. In this paper, we studied two of the most important variants of Chinese, Mainland Mandarin and Taiwan Mandarin.

Detecting Syntactic Features of Translated Chinese

Proceedings of the Second Workshop on Stylistic Variation

We present a machine learning approach to distinguish texts translated to Chinese (by humans) from texts originally written in Chinese, with a focus on a wide range of syntactic features. Using Support Vector Machines (SVMs) as classifier on a genre-balanced corpus in translation studies of Chinese, we find that constituent parse trees and dependency triples as features without lexical information perform very well on the task, with an Fmeasure above 90%, close to the results of lexical n-gram features, without the risk of learning topic information rather than translation features. Thus, we claim syntactic features alone can accurately distinguish translated from original Chinese. Translated Chinese exhibits an increased use of determiners, subject position pronouns, NP + "的" as NP modifiers, multiple NPs or VPs conjoined by "、", among other structures. We also interpret the syntactic features with reference to previous translation studies in Chinese, particularly the usage of pronouns.

Syntax of Noun Phrases in Qinghai Chinese

In the course of offering a review of Zhāng Chéngcái's Ways, this paper describes the syntax of noun phrases in the Chinese dialect of Huángshuĭ, in Qīnghăi Province. Unlike other Chinese dialects, this dialect employs several postpositions for indicating syntactic nominal relationships. The origin of this phenomenon in contact with non-Sinitic languages in the region and its significance are also explored.

PAPER for 13th International conference on Cantonese and Yue dialects

Although investigation of the phonology of the Chinese dialects is by no means exhaustive, our knowledge of that level of the language is by far the more complete. Here, therefore, I attempt to discuss some lexical phenomena. As a result of historical background, geography conditions, living environment, manners and customs and other factors, Cantonese and Mandarin have many differences in vocabulary. Lexical contrast between dialects forms a broad field. There are many interesting phenomena that deserve us to discuss. This paper investigates one phenomenon, which is called ‘using each morpheme of the paratactic compound’. One of the major characteristics of Chinese vocabulary is that paratactic compounds take up a comparatively larger portion, since this word-formation model is productive. Comparing Mandarin with Cantonese, we find that there are some paratactic compounds that existed in written Chinese, which Cantonese always uses one morpheme of the paratactic compound to represent the whole meaning of the compound, while Mandarin uses another morpheme. This paper first describes data of the phenomenon ‘using each morpheme’; second, analyzes the characteristics of this phenomenon; third, concludes possible reasons of ‘using each morpheme’. A short conclusion is appended to the paper.