Keh-jiann Chen - Profile on Academia.edu (original) (raw)

Papers by Keh-jiann Chen

Research paper thumbnail of An augmented chart parsing algorithm integrating unification grammar and Markov language model for continuous speech recognition

International Conference on Acoustics, Speech, and Signal Processing

In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy wo... more In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy word lattices (sets of word hypotheses obtained in continuous speech recognition) which include problems such as word boundary overlapping, homonyms, lexical ambiguities, recognition uncertainty and errors, etc. An augmented chart is first proposed, and the new algorithm is then derived on this chart. This algorithm properly integrates the global structural synthesis capabilities of the unification grammar and the local relation estimation capabilities of the Markov language model. The parsing algorithm is island-driven and best-first. In this way, not only the features of the grammatical and statistical approaches can be combined, but the effects of the two different approaches are reflected in a single algorithm such that the overall selectivity can be appropriately

Research paper thumbnail of A Mathematical Model for Chinese Input

A Mathematical Model for Chinese Input

Research paper thumbnail of A Semantic Analysis of Time Intervals — Core Senses and Relational Senses of a Time Interval

Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes ... more Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes the relationship between time elements and events in complex knowledge networks. Logical compatibility between temporal elements and event types strongly influence semantic interpretation and grammaticality of sentences. It is one of the most complicated, frequently used, and not well understood topics in linguistics. In this paper, we focus our attention on duration only. We made fine-grain distinctions for time intervals and provided explanatory reasons for their common functionalities and idiosyncrasies. We pointed out that types of collocated events and semantic of time intervals are main factors which control the usage of time interval words. Furthermore, we also proved that morpho-syntactic structure of time interval words also reduces the flexibility of their usages. We had listed four different types of morpho-syntactic structures for duration expressions and provided the constraints of their usages.

Research paper thumbnail of Extended-HowNet- A Representational Framework for Concepts

Natural languages are means to denote concepts. However word sense ambiguities make natural langu... more Natural languages are means to denote concepts. However word sense ambiguities make natural language processing and conceptual processing almost impossible. To bridge the gaps between natural language representations and conceptual representations, we propose a universal concept representational mechanism, called Extended-HowNet, which was evolved from HowNet. It extends the word sense definition mechanism of HowNet and uses WordNet synsets as vocabulary to describe concepts. Each word sense (or concept) is defined by some simpler concepts. The simple concepts used in the definitions can be further decomposed into even simpler concepts, until primitive or basic concepts are reached. Therefore the definition of a concept can be dynamically decomposed and unified into Extended-HowNet at different levels of representations. Extended-HowNet are language independent. Any word sense of any language can be defined and achieved near-canonical representation. For any two concepts, not only their semantic distances but also their sense similarity and difference are known by checking their definitions. In addition to taxonomy links, concepts are also associated by their shared conceptual features. Fine-grain differences among near-synonyms can be differentiated by adding new features.

Research paper thumbnail of A model for Lexical Analysis and Parsing of Chinese Sentences

A model for Lexical Analysis and Parsing of Chinese Sentences

Research paper thumbnail of Knowledge Representation and Sense Disambiguation for Interrogatives in E-HowNet

In order to train machines to 'understand' natural language, we propose a meaning representation ... more In order to train machines to 'understand' natural language, we propose a meaning representation mechanism called E-HowNet to encode lexical senses. In this paper, we take interrogatives as examples to demonstrate the mechanisms of semantic representation and composition of interrogative constructions under the framework of E-HowNet. We classify the interrogative words into five classes according to their query types, and represent each type of interrogatives with fine-grained features and operators. The process of semantic composition and the difficulties of representation, such as word sense disambiguation, are addressed. Finally, machine understanding is tested by showing how machines derive the same deep semantic structure for synonymous sentences with different surface structures.

Research paper thumbnail of Semantic Representation and Composition for Unknown Compounds in E-HowNet

This paper describes a universal concept representational mechanism called E-HowNet, to handle di... more This paper describes a universal concept representational mechanism called E-HowNet, to handle difficulties caused by unknown words in natural language processing. Semantic structures and sense disambiguation of unknown words are discovered by analogy. We intend to achieve that any concept can be defined by E-HowNet and the representation is near-canonical. The design for easy semantic composition and decomposition makes the automation of semantic processing for unknown words, phrases and even sentences possible.

Research paper thumbnail of The Identification of Thematic Roles in Parsing Chinese

The Identification of Thematic Roles in Parsing Chinese

Journal of Information Science and Engineering

Research paper thumbnail of A Study on Word Similarity using Context Vector Models

There is a need to measure word similarity when processing natural languages, especially when usi... more There is a need to measure word similarity when processing natural languages, especially when using generalization, classification, or example -based approaches. Usually, measures of similarity between two words are defined according to the distance between their semantic classes in a semantic taxonomy . The taxonomy approaches are more or less semantic -based that do not consider syntactic similarit ies. However, in real applications, both semantic and syntactic similarities are required and weighted differently. Word similarity based on context vectors is a mixture of syntactic and semantic similarit ies. In this paper, we propose using only syntactic related co-occurrences as context vectors and adopt information theoretic models to solve the problems of data sparseness and characteristic precision. The probabilistic distribution of co-occurrence context features is derived by parsing the contextual environment of each word , and all the context features are adjusted according to their IDF (inverse document frequency) values. The agglomerative clustering algorithm is applied to group similar words according to their similarity values. It turns out that words with similar syntactic categories and semantic classes are grouped together.

Research paper thumbnail of A syllable-based very-large-vocabulary voice retrieval system for Chinese databases with textual attributes

A syllable-based very-large-vocabulary voice retrieval system for Chinese databases with textual attributes

Research paper thumbnail of Unconstrained speech retrieval for Chinese document databases with very large vocabulary and unlimited domains

Unconstrained speech retrieval for Chinese document databases with very large vocabulary and unlimited domains

Research paper thumbnail of Chinese language model adaptation based on document classification and multiple domain-specific language models

Chinese language model adaptation based on document classification and multiple domain-specific language models

Research paper thumbnail of Mandarin Chinese Character Frequency List Based on National Phonetic Alphabets

Mandarin Chinese Character Frequency List Based on National Phonetic Alphabets

ABSTRACT

Research paper thumbnail of 《资讯处理用中文分词规范》设计理念及规范内容 Design Criteria and Content of ‘Segmentation Standard for Chinese Information Processing

《资讯处理用中文分词规范》设计理念及规范内容 Design Criteria and Content of ‘Segmentation Standard for Chinese Information Processing

Research paper thumbnail of Intelligent retrieval of very large Chinese dictionaries with speech queries

Intelligent retrieval of very large Chinese dictionaries with speech queries

Research paper thumbnail of Mandarin Chinese Word Frequency Dictionary

Mandarin Chinese Word Frequency Dictionary

ABSTRACT

Research paper thumbnail of Sinica Corpus

Research paper thumbnail of 中文句結構樹資料庫的構建 Project Report: Sinica Treebank

m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®... more m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®èÌ«÷±Ìì±I¡7vÍ 1m^Ò Ã ±°6LQLFD &RUSXV ±¼7 W Ð a! Ñ ï °, QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±1d5¤a!.Ì7=°Þ ï3ë«èÌ«ñÐB*èÌ31+À¤Èõ ãÐ3`%% Àí^Ð*3`1%^¤`Íô 1èÌn¤ Ê3`^zH< à<û1: +õ m®èÌ«÷± 6LQLFD 7UHHEDQN m^Òà ["7 ¶3vÍ m^Òñ°6LQLFD &RUSXV ±m¼7W7=°3ë«èÌ« ã Ð3`%%_Àíõ1¬S1«à`m®èÌ«÷±ìÌ1ÌÞ>1"<m®Þ¥ â<ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷±m¼ 75^Hö751¼78õÞz13ëÓ-äD_Yó! ®1>1"/m®èÌ«÷±Ìì±I¡=1 ¶¯÷ Ê2^3ë m®Wz<àÐ=!Ñï°,QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±@ 5¤4!+}Ià@±¤¢+!m®èÌ« ¶1èÌ ë^kéBà¢! pt1àÂz #2/a!1Ñï°, &* ±d5¤

Research paper thumbnail of 漢語動詞詞彙語意分析: 表達模式與研究方法 Analysis of Mandarin Lexical Semantics: Representational Model and Research Methodology

Research paper thumbnail of Academia Sinica balanced corpus (Version 3)

Academia Sinica balanced corpus (Version 3)

Research paper thumbnail of An augmented chart parsing algorithm integrating unification grammar and Markov language model for continuous speech recognition

International Conference on Acoustics, Speech, and Signal Processing

In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy wo... more In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy word lattices (sets of word hypotheses obtained in continuous speech recognition) which include problems such as word boundary overlapping, homonyms, lexical ambiguities, recognition uncertainty and errors, etc. An augmented chart is first proposed, and the new algorithm is then derived on this chart. This algorithm properly integrates the global structural synthesis capabilities of the unification grammar and the local relation estimation capabilities of the Markov language model. The parsing algorithm is island-driven and best-first. In this way, not only the features of the grammatical and statistical approaches can be combined, but the effects of the two different approaches are reflected in a single algorithm such that the overall selectivity can be appropriately

Research paper thumbnail of A Mathematical Model for Chinese Input

A Mathematical Model for Chinese Input

Research paper thumbnail of A Semantic Analysis of Time Intervals — Core Senses and Relational Senses of a Time Interval

Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes ... more Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes the relationship between time elements and events in complex knowledge networks. Logical compatibility between temporal elements and event types strongly influence semantic interpretation and grammaticality of sentences. It is one of the most complicated, frequently used, and not well understood topics in linguistics. In this paper, we focus our attention on duration only. We made fine-grain distinctions for time intervals and provided explanatory reasons for their common functionalities and idiosyncrasies. We pointed out that types of collocated events and semantic of time intervals are main factors which control the usage of time interval words. Furthermore, we also proved that morpho-syntactic structure of time interval words also reduces the flexibility of their usages. We had listed four different types of morpho-syntactic structures for duration expressions and provided the constraints of their usages.

Research paper thumbnail of Extended-HowNet- A Representational Framework for Concepts

Natural languages are means to denote concepts. However word sense ambiguities make natural langu... more Natural languages are means to denote concepts. However word sense ambiguities make natural language processing and conceptual processing almost impossible. To bridge the gaps between natural language representations and conceptual representations, we propose a universal concept representational mechanism, called Extended-HowNet, which was evolved from HowNet. It extends the word sense definition mechanism of HowNet and uses WordNet synsets as vocabulary to describe concepts. Each word sense (or concept) is defined by some simpler concepts. The simple concepts used in the definitions can be further decomposed into even simpler concepts, until primitive or basic concepts are reached. Therefore the definition of a concept can be dynamically decomposed and unified into Extended-HowNet at different levels of representations. Extended-HowNet are language independent. Any word sense of any language can be defined and achieved near-canonical representation. For any two concepts, not only their semantic distances but also their sense similarity and difference are known by checking their definitions. In addition to taxonomy links, concepts are also associated by their shared conceptual features. Fine-grain differences among near-synonyms can be differentiated by adding new features.

Research paper thumbnail of A model for Lexical Analysis and Parsing of Chinese Sentences

A model for Lexical Analysis and Parsing of Chinese Sentences

Research paper thumbnail of Knowledge Representation and Sense Disambiguation for Interrogatives in E-HowNet

In order to train machines to 'understand' natural language, we propose a meaning representation ... more In order to train machines to 'understand' natural language, we propose a meaning representation mechanism called E-HowNet to encode lexical senses. In this paper, we take interrogatives as examples to demonstrate the mechanisms of semantic representation and composition of interrogative constructions under the framework of E-HowNet. We classify the interrogative words into five classes according to their query types, and represent each type of interrogatives with fine-grained features and operators. The process of semantic composition and the difficulties of representation, such as word sense disambiguation, are addressed. Finally, machine understanding is tested by showing how machines derive the same deep semantic structure for synonymous sentences with different surface structures.

Research paper thumbnail of Semantic Representation and Composition for Unknown Compounds in E-HowNet

This paper describes a universal concept representational mechanism called E-HowNet, to handle di... more This paper describes a universal concept representational mechanism called E-HowNet, to handle difficulties caused by unknown words in natural language processing. Semantic structures and sense disambiguation of unknown words are discovered by analogy. We intend to achieve that any concept can be defined by E-HowNet and the representation is near-canonical. The design for easy semantic composition and decomposition makes the automation of semantic processing for unknown words, phrases and even sentences possible.

Research paper thumbnail of The Identification of Thematic Roles in Parsing Chinese

The Identification of Thematic Roles in Parsing Chinese

Journal of Information Science and Engineering

Research paper thumbnail of A Study on Word Similarity using Context Vector Models

There is a need to measure word similarity when processing natural languages, especially when usi... more There is a need to measure word similarity when processing natural languages, especially when using generalization, classification, or example -based approaches. Usually, measures of similarity between two words are defined according to the distance between their semantic classes in a semantic taxonomy . The taxonomy approaches are more or less semantic -based that do not consider syntactic similarit ies. However, in real applications, both semantic and syntactic similarities are required and weighted differently. Word similarity based on context vectors is a mixture of syntactic and semantic similarit ies. In this paper, we propose using only syntactic related co-occurrences as context vectors and adopt information theoretic models to solve the problems of data sparseness and characteristic precision. The probabilistic distribution of co-occurrence context features is derived by parsing the contextual environment of each word , and all the context features are adjusted according to their IDF (inverse document frequency) values. The agglomerative clustering algorithm is applied to group similar words according to their similarity values. It turns out that words with similar syntactic categories and semantic classes are grouped together.

Research paper thumbnail of A syllable-based very-large-vocabulary voice retrieval system for Chinese databases with textual attributes

A syllable-based very-large-vocabulary voice retrieval system for Chinese databases with textual attributes

Research paper thumbnail of Unconstrained speech retrieval for Chinese document databases with very large vocabulary and unlimited domains

Unconstrained speech retrieval for Chinese document databases with very large vocabulary and unlimited domains

Research paper thumbnail of Chinese language model adaptation based on document classification and multiple domain-specific language models

Chinese language model adaptation based on document classification and multiple domain-specific language models

Research paper thumbnail of Mandarin Chinese Character Frequency List Based on National Phonetic Alphabets

Mandarin Chinese Character Frequency List Based on National Phonetic Alphabets

ABSTRACT

Research paper thumbnail of 《资讯处理用中文分词规范》设计理念及规范内容 Design Criteria and Content of ‘Segmentation Standard for Chinese Information Processing

《资讯处理用中文分词规范》设计理念及规范内容 Design Criteria and Content of ‘Segmentation Standard for Chinese Information Processing

Research paper thumbnail of Intelligent retrieval of very large Chinese dictionaries with speech queries

Intelligent retrieval of very large Chinese dictionaries with speech queries

Research paper thumbnail of Mandarin Chinese Word Frequency Dictionary

Mandarin Chinese Word Frequency Dictionary

ABSTRACT

Research paper thumbnail of Sinica Corpus

Research paper thumbnail of 中文句結構樹資料庫的構建 Project Report: Sinica Treebank

m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®... more m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®èÌ«÷±Ìì±I¡7vÍ 1m^Ò Ã ±°6LQLFD &RUSXV ±¼7 W Ð a! Ñ ï °, QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±1d5¤a!.Ì7=°Þ ï3ë«èÌ«ñÐB*èÌ31+À¤Èõ ãÐ3`%% Àí^Ð*3`1%^¤`Íô 1èÌn¤ Ê3`^zH< à<û1: +õ m®èÌ«÷± 6LQLFD 7UHHEDQN m^Òà ["7 ¶3vÍ m^Òñ°6LQLFD &RUSXV ±m¼7W7=°3ë«èÌ« ã Ð3`%%_Àíõ1¬S1«à`m®èÌ«÷±ìÌ1ÌÞ>1"<m®Þ¥ â<ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷±m¼ 75^Hö751¼78õÞz13ëÓ-äD_Yó! ®1>1"/m®èÌ«÷±Ìì±I¡=1 ¶¯÷ Ê2^3ë m®Wz<àÐ=!Ñï°,QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±@ 5¤4!+}Ià@±¤¢+!m®èÌ« ¶1èÌ ë^kéBà¢! pt1àÂz #2/a!1Ñï°, &* ±d5¤

Research paper thumbnail of 漢語動詞詞彙語意分析: 表達模式與研究方法 Analysis of Mandarin Lexical Semantics: Representational Model and Research Methodology

Research paper thumbnail of Academia Sinica balanced corpus (Version 3)

Academia Sinica balanced corpus (Version 3)