Keh-jiann Chen | Academia Sinica (original) (raw)

Papers by Keh-jiann Chen

Research paper thumbnail of An augmented chart parsing algorithm integrating unification grammar and Markov language model for continuous speech recognition

International Conference on Acoustics, Speech, and Signal Processing

In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy wo... more In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy word lattices (sets of word hypotheses obtained in continuous speech recognition) which include problems such as word boundary overlapping, homonyms, lexical ambiguities, recognition uncertainty and errors, etc. An augmented chart is first proposed, and the new algorithm is then derived on this chart. This algorithm properly integrates the global structural synthesis capabilities of the unification grammar and the local relation estimation capabilities of the Markov language model. The parsing algorithm is island-driven and best-first. In this way, not only the features of the grammatical and statistical approaches can be combined, but the effects of the two different approaches are reflected in a single algorithm such that the overall selectivity can be appropriately

Research paper thumbnail of A Mathematical Model for Chinese Input

Research paper thumbnail of A Semantic Analysis of Time Intervals — Core Senses and Relational Senses of a Time Interval

Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes ... more Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes the relationship between time elements and events in complex knowledge networks. Logical compatibility between temporal elements and event types strongly influence semantic interpretation and grammaticality of sentences. It is one of the most complicated, frequently used, and not well understood topics in linguistics. In this paper, we focus our attention on duration only. We made fine-grain distinctions for time intervals and provided explanatory reasons for their common functionalities and idiosyncrasies. We pointed out that types of collocated events and semantic of time intervals are main factors which control the usage of time interval words. Furthermore, we also proved that morpho-syntactic structure of time interval words also reduces the flexibility of their usages. We had listed four different types of morpho-syntactic structures for duration expressions and provided the constraints of their usages.

Research paper thumbnail of Extended-HowNet- A Representational Framework for Concepts

Natural languages are means to denote concepts. However word sense ambiguities make natural langu... more Natural languages are means to denote concepts. However word sense ambiguities make natural language processing and conceptual processing almost impossible. To bridge the gaps between natural language representations and conceptual representations, we propose a universal concept representational mechanism, called Extended-HowNet, which was evolved from HowNet. It extends the word sense definition mechanism of HowNet and uses WordNet synsets as vocabulary to describe concepts. Each word sense (or concept) is defined by some simpler concepts. The simple concepts used in the definitions can be further decomposed into even simpler concepts, until primitive or basic concepts are reached. Therefore the definition of a concept can be dynamically decomposed and unified into Extended-HowNet at different levels of representations. Extended-HowNet are language independent. Any word sense of any language can be defined and achieved near-canonical representation. For any two concepts, not only their semantic distances but also their sense similarity and difference are known by checking their definitions. In addition to taxonomy links, concepts are also associated by their shared conceptual features. Fine-grain differences among near-synonyms can be differentiated by adding new features.

Research paper thumbnail of A model for Lexical Analysis and Parsing of Chinese Sentences

Research paper thumbnail of Knowledge Representation and Sense Disambiguation for Interrogatives in E-HowNet

In order to train machines to 'understand' natural language, we propose a meaning representation ... more In order to train machines to 'understand' natural language, we propose a meaning representation mechanism called E-HowNet to encode lexical senses. In this paper, we take interrogatives as examples to demonstrate the mechanisms of semantic representation and composition of interrogative constructions under the framework of E-HowNet. We classify the interrogative words into five classes according to their query types, and represent each type of interrogatives with fine-grained features and operators. The process of semantic composition and the difficulties of representation, such as word sense disambiguation, are addressed. Finally, machine understanding is tested by showing how machines derive the same deep semantic structure for synonymous sentences with different surface structures.

Research paper thumbnail of Semantic Representation and Composition for Unknown Compounds in E-HowNet

This paper describes a universal concept representational mechanism called E-HowNet, to handle di... more This paper describes a universal concept representational mechanism called E-HowNet, to handle difficulties caused by unknown words in natural language processing. Semantic structures and sense disambiguation of unknown words are discovered by analogy. We intend to achieve that any concept can be defined by E-HowNet and the representation is near-canonical. The design for easy semantic composition and decomposition makes the automation of semantic processing for unknown words, phrases and even sentences possible.

Research paper thumbnail of The Identification of Thematic Roles in Parsing Chinese

Journal of Information Science and Engineering

Research paper thumbnail of A Chinese Natural Language Processing System Based Upon the Theory of Empty Categories

In this paper, we will present a device specially designed on the basis of the theory of empty ca... more In this paper, we will present a device specially designed on the basis of the theory of empty categories. This device cooperates with a bottom-up parser and is used as an elegant and efficient approachtotreatthetroublesome problems of the transformations of passivization,relativizatlon; toplcalization, ba-transformation and the use of zero pronouns in Chinese natural language. With the aid of the device, the grammar rules for Chinese will be much more simplified and easier to design, and the processing capability can be significantly improved.

Research paper thumbnail of A Study on Word Similarity using Context Vector Models

Research paper thumbnail of A syllable-based very-large-vocabulary voice retrieval system for Chinese databases with textual attributes

Research paper thumbnail of Unconstrained speech retrieval for Chinese document databases with very large vocabulary and unlimited domains

Research paper thumbnail of Chinese language model adaptation based on document classification and multiple domain-specific language models

Research paper thumbnail of Mandarin Chinese Character Frequency List Based on National Phonetic Alphabets

Research paper thumbnail of 《资讯处理用中文分词规范》设计理念及规范内容 Design Criteria and Content of ‘Segmentation Standard for Chinese Information Processing

Research paper thumbnail of Intelligent retrieval of very large Chinese dictionaries with speech queries

Research paper thumbnail of Mandarin Chinese Word Frequency Dictionary

Research paper thumbnail of Sinica Corpus

Research paper thumbnail of 中文句結構樹資料庫的構建 Project Report: Sinica Treebank

m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®... more m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®èÌ«÷±Ìì±I¡7vÍ 1m^Ò Ã ±°6LQLFD &RUSXV ±¼7 W Ð a! Ñ ï °, QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±1d5¤a!.Ì7=°Þ ï3ë«èÌ«ñÐB*èÌ31+À¤Èõ ãÐ3`%% Àí^Ð*3`1%^¤`Íô 1èÌn¤ Ê3`^zH< à<û1: +õ m®èÌ«÷± 6LQLFD 7UHHEDQN m^Òà ["7 ¶3vÍ m^Òñ°6LQLFD &RUSXV ±m¼7W7=°3ë«èÌ« ã Ð3`%%_Àíõ1¬S1«à`m®èÌ«÷±ìÌ1ÌÞ>1"<m®Þ¥ â<ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷±m¼ 75^Hö751¼78õÞz13ëÓ-äD_Yó! ®1>1"/m®èÌ«÷±Ìì±I¡=1 ¶¯÷ Ê2^3ë m®Wz<àÐ=!Ñï°,QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±@ 5¤4!+}Ià@±¤¢+!m®èÌ« ¶1èÌ ë^kéBà¢! pt1àÂz #2/a!1Ñï°, &* ±d5¤

Research paper thumbnail of 漢語動詞詞彙語意分析: 表達模式與研究方法 Analysis of Mandarin Lexical Semantics: Representational Model and Research Methodology

Research paper thumbnail of An augmented chart parsing algorithm integrating unification grammar and Markov language model for continuous speech recognition

International Conference on Acoustics, Speech, and Signal Processing

In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy wo... more In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy word lattices (sets of word hypotheses obtained in continuous speech recognition) which include problems such as word boundary overlapping, homonyms, lexical ambiguities, recognition uncertainty and errors, etc. An augmented chart is first proposed, and the new algorithm is then derived on this chart. This algorithm properly integrates the global structural synthesis capabilities of the unification grammar and the local relation estimation capabilities of the Markov language model. The parsing algorithm is island-driven and best-first. In this way, not only the features of the grammatical and statistical approaches can be combined, but the effects of the two different approaches are reflected in a single algorithm such that the overall selectivity can be appropriately

Research paper thumbnail of A Mathematical Model for Chinese Input

Research paper thumbnail of A Semantic Analysis of Time Intervals — Core Senses and Relational Senses of a Time Interval

Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes ... more Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes the relationship between time elements and events in complex knowledge networks. Logical compatibility between temporal elements and event types strongly influence semantic interpretation and grammaticality of sentences. It is one of the most complicated, frequently used, and not well understood topics in linguistics. In this paper, we focus our attention on duration only. We made fine-grain distinctions for time intervals and provided explanatory reasons for their common functionalities and idiosyncrasies. We pointed out that types of collocated events and semantic of time intervals are main factors which control the usage of time interval words. Furthermore, we also proved that morpho-syntactic structure of time interval words also reduces the flexibility of their usages. We had listed four different types of morpho-syntactic structures for duration expressions and provided the constraints of their usages.

Research paper thumbnail of Extended-HowNet- A Representational Framework for Concepts

Natural languages are means to denote concepts. However word sense ambiguities make natural langu... more Natural languages are means to denote concepts. However word sense ambiguities make natural language processing and conceptual processing almost impossible. To bridge the gaps between natural language representations and conceptual representations, we propose a universal concept representational mechanism, called Extended-HowNet, which was evolved from HowNet. It extends the word sense definition mechanism of HowNet and uses WordNet synsets as vocabulary to describe concepts. Each word sense (or concept) is defined by some simpler concepts. The simple concepts used in the definitions can be further decomposed into even simpler concepts, until primitive or basic concepts are reached. Therefore the definition of a concept can be dynamically decomposed and unified into Extended-HowNet at different levels of representations. Extended-HowNet are language independent. Any word sense of any language can be defined and achieved near-canonical representation. For any two concepts, not only their semantic distances but also their sense similarity and difference are known by checking their definitions. In addition to taxonomy links, concepts are also associated by their shared conceptual features. Fine-grain differences among near-synonyms can be differentiated by adding new features.

Research paper thumbnail of A model for Lexical Analysis and Parsing of Chinese Sentences

Research paper thumbnail of Knowledge Representation and Sense Disambiguation for Interrogatives in E-HowNet

In order to train machines to 'understand' natural language, we propose a meaning representation ... more In order to train machines to 'understand' natural language, we propose a meaning representation mechanism called E-HowNet to encode lexical senses. In this paper, we take interrogatives as examples to demonstrate the mechanisms of semantic representation and composition of interrogative constructions under the framework of E-HowNet. We classify the interrogative words into five classes according to their query types, and represent each type of interrogatives with fine-grained features and operators. The process of semantic composition and the difficulties of representation, such as word sense disambiguation, are addressed. Finally, machine understanding is tested by showing how machines derive the same deep semantic structure for synonymous sentences with different surface structures.

Research paper thumbnail of Semantic Representation and Composition for Unknown Compounds in E-HowNet

This paper describes a universal concept representational mechanism called E-HowNet, to handle di... more This paper describes a universal concept representational mechanism called E-HowNet, to handle difficulties caused by unknown words in natural language processing. Semantic structures and sense disambiguation of unknown words are discovered by analogy. We intend to achieve that any concept can be defined by E-HowNet and the representation is near-canonical. The design for easy semantic composition and decomposition makes the automation of semantic processing for unknown words, phrases and even sentences possible.

Research paper thumbnail of The Identification of Thematic Roles in Parsing Chinese

Journal of Information Science and Engineering

Research paper thumbnail of A Chinese Natural Language Processing System Based Upon the Theory of Empty Categories

In this paper, we will present a device specially designed on the basis of the theory of empty ca... more In this paper, we will present a device specially designed on the basis of the theory of empty categories. This device cooperates with a bottom-up parser and is used as an elegant and efficient approachtotreatthetroublesome problems of the transformations of passivization,relativizatlon; toplcalization, ba-transformation and the use of zero pronouns in Chinese natural language. With the aid of the device, the grammar rules for Chinese will be much more simplified and easier to design, and the processing capability can be significantly improved.

Research paper thumbnail of A Study on Word Similarity using Context Vector Models

Research paper thumbnail of A syllable-based very-large-vocabulary voice retrieval system for Chinese databases with textual attributes

Research paper thumbnail of Unconstrained speech retrieval for Chinese document databases with very large vocabulary and unlimited domains

Research paper thumbnail of Chinese language model adaptation based on document classification and multiple domain-specific language models

Research paper thumbnail of Mandarin Chinese Character Frequency List Based on National Phonetic Alphabets

Research paper thumbnail of 《资讯处理用中文分词规范》设计理念及规范内容 Design Criteria and Content of ‘Segmentation Standard for Chinese Information Processing

Research paper thumbnail of Intelligent retrieval of very large Chinese dictionaries with speech queries

Research paper thumbnail of Mandarin Chinese Word Frequency Dictionary

Research paper thumbnail of Sinica Corpus

Research paper thumbnail of 中文句結構樹資料庫的構建 Project Report: Sinica Treebank

m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®... more m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®èÌ«÷±Ìì±I¡7vÍ 1m^Ò Ã ±°6LQLFD &RUSXV ±¼7 W Ð a! Ñ ï °, QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±1d5¤a!.Ì7=°Þ ï3ë«èÌ«ñÐB*èÌ31+À¤Èõ ãÐ3`%% Àí^Ð*3`1%^¤`Íô 1èÌn¤ Ê3`^zH< à<û1: +õ m®èÌ«÷± 6LQLFD 7UHHEDQN m^Òà ["7 ¶3vÍ m^Òñ°6LQLFD &RUSXV ±m¼7W7=°3ë«èÌ« ã Ð3`%%_Àíõ1¬S1«à`m®èÌ«÷±ìÌ1ÌÞ>1"<m®Þ¥ â<ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷±m¼ 75^Hö751¼78õÞz13ëÓ-äD_Yó! ®1>1"/m®èÌ«÷±Ìì±I¡=1 ¶¯÷ Ê2^3ë m®Wz<àÐ=!Ñï°,QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±@ 5¤4!+}Ià@±¤¢+!m®èÌ« ¶1èÌ ë^kéBà¢! pt1àÂz #2/a!1Ñï°, &* ±d5¤

Research paper thumbnail of 漢語動詞詞彙語意分析: 表達模式與研究方法 Analysis of Mandarin Lexical Semantics: Representational Model and Research Methodology