Keh-jiann Chen | Academia Sinica (original) (raw)
Papers by Keh-jiann Chen
International Conference on Acoustics, Speech, and Signal Processing
In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy wo... more In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy word lattices (sets of word hypotheses obtained in continuous speech recognition) which include problems such as word boundary overlapping, homonyms, lexical ambiguities, recognition uncertainty and errors, etc. An augmented chart is first proposed, and the new algorithm is then derived on this chart. This algorithm properly integrates the global structural synthesis capabilities of the unification grammar and the local relation estimation capabilities of the Markov language model. The parsing algorithm is island-driven and best-first. In this way, not only the features of the grammatical and statistical approaches can be combined, but the effects of the two different approaches are reflected in a single algorithm such that the overall selectivity can be appropriately
Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes ... more Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes the relationship between time elements and events in complex knowledge networks. Logical compatibility between temporal elements and event types strongly influence semantic interpretation and grammaticality of sentences. It is one of the most complicated, frequently used, and not well understood topics in linguistics. In this paper, we focus our attention on duration only. We made fine-grain distinctions for time intervals and provided explanatory reasons for their common functionalities and idiosyncrasies. We pointed out that types of collocated events and semantic of time intervals are main factors which control the usage of time interval words. Furthermore, we also proved that morpho-syntactic structure of time interval words also reduces the flexibility of their usages. We had listed four different types of morpho-syntactic structures for duration expressions and provided the constraints of their usages.
Natural languages are means to denote concepts. However word sense ambiguities make natural langu... more Natural languages are means to denote concepts. However word sense ambiguities make natural language processing and conceptual processing almost impossible. To bridge the gaps between natural language representations and conceptual representations, we propose a universal concept representational mechanism, called Extended-HowNet, which was evolved from HowNet. It extends the word sense definition mechanism of HowNet and uses WordNet synsets as vocabulary to describe concepts. Each word sense (or concept) is defined by some simpler concepts. The simple concepts used in the definitions can be further decomposed into even simpler concepts, until primitive or basic concepts are reached. Therefore the definition of a concept can be dynamically decomposed and unified into Extended-HowNet at different levels of representations. Extended-HowNet are language independent. Any word sense of any language can be defined and achieved near-canonical representation. For any two concepts, not only their semantic distances but also their sense similarity and difference are known by checking their definitions. In addition to taxonomy links, concepts are also associated by their shared conceptual features. Fine-grain differences among near-synonyms can be differentiated by adding new features.
In order to train machines to 'understand' natural language, we propose a meaning representation ... more In order to train machines to 'understand' natural language, we propose a meaning representation mechanism called E-HowNet to encode lexical senses. In this paper, we take interrogatives as examples to demonstrate the mechanisms of semantic representation and composition of interrogative constructions under the framework of E-HowNet. We classify the interrogative words into five classes according to their query types, and represent each type of interrogatives with fine-grained features and operators. The process of semantic composition and the difficulties of representation, such as word sense disambiguation, are addressed. Finally, machine understanding is tested by showing how machines derive the same deep semantic structure for synonymous sentences with different surface structures.
This paper describes a universal concept representational mechanism called E-HowNet, to handle di... more This paper describes a universal concept representational mechanism called E-HowNet, to handle difficulties caused by unknown words in natural language processing. Semantic structures and sense disambiguation of unknown words are discovered by analogy. We intend to achieve that any concept can be defined by E-HowNet and the representation is near-canonical. The design for easy semantic composition and decomposition makes the automation of semantic processing for unknown words, phrases and even sentences possible.
Journal of Information Science and Engineering
In this paper, we will present a device specially designed on the basis of the theory of empty ca... more In this paper, we will present a device specially designed on the basis of the theory of empty categories. This device cooperates with a bottom-up parser and is used as an elegant and efficient approachtotreatthetroublesome problems of the transformations of passivization,relativizatlon; toplcalization, ba-transformation and the use of zero pronouns in Chinese natural language. With the aid of the device, the grammar rules for Chinese will be much more simplified and easier to design, and the processing capability can be significantly improved.
m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®... more m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®èÌ«÷±Ìì±I¡7vÍ 1m^Ò Ã ±°6LQLFD &RUSXV ±¼7 W Ð a! Ñ ï °, QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±1d5¤a!.Ì7=°Þ ï3ë«èÌ«ñÐB*èÌ31+À¤Èõ ãÐ3`%% Àí^Ð*3`1%^¤`Íô 1èÌn¤ Ê3`^zH< à<û1: +õ m®èÌ«÷± 6LQLFD 7UHHEDQN m^Òà ["7 ¶3vÍ m^Òñ°6LQLFD &RUSXV ±m¼7W7=°3ë«èÌ« ã Ð3`%%_Àíõ1¬S1«à`m®èÌ«÷±ìÌ1ÌÞ>1"<m®Þ¥ â<ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷±m¼ 75^Hö751¼78õÞz13ëÓ-äD_Yó! ®1>1"/m®èÌ«÷±Ìì±I¡=1 ¶¯÷ Ê2^3ë m®Wz<àÐ=!Ñï°,QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±@ 5¤4!+}Ià@±¤¢+!m®èÌ« ¶1èÌ ë^kéBà¢! pt1àÂz #2/a!1Ñï°, &* ±d5¤
International Conference on Acoustics, Speech, and Signal Processing
In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy wo... more In this paper, an efficient algorithm is developed to handle the difficulties in parsing noisy word lattices (sets of word hypotheses obtained in continuous speech recognition) which include problems such as word boundary overlapping, homonyms, lexical ambiguities, recognition uncertainty and errors, etc. An augmented chart is first proposed, and the new algorithm is then derived on this chart. This algorithm properly integrates the global structural synthesis capabilities of the unification grammar and the local relation estimation capabilities of the Markov language model. The parsing algorithm is island-driven and best-first. In this way, not only the features of the grammatical and statistical approaches can be combined, but the effects of the two different approaches are reflected in a single algorithm such that the overall selectivity can be appropriately
Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes ... more Temporal relation, includes duration, aspect, frequency, time point and sequence etc., describes the relationship between time elements and events in complex knowledge networks. Logical compatibility between temporal elements and event types strongly influence semantic interpretation and grammaticality of sentences. It is one of the most complicated, frequently used, and not well understood topics in linguistics. In this paper, we focus our attention on duration only. We made fine-grain distinctions for time intervals and provided explanatory reasons for their common functionalities and idiosyncrasies. We pointed out that types of collocated events and semantic of time intervals are main factors which control the usage of time interval words. Furthermore, we also proved that morpho-syntactic structure of time interval words also reduces the flexibility of their usages. We had listed four different types of morpho-syntactic structures for duration expressions and provided the constraints of their usages.
Natural languages are means to denote concepts. However word sense ambiguities make natural langu... more Natural languages are means to denote concepts. However word sense ambiguities make natural language processing and conceptual processing almost impossible. To bridge the gaps between natural language representations and conceptual representations, we propose a universal concept representational mechanism, called Extended-HowNet, which was evolved from HowNet. It extends the word sense definition mechanism of HowNet and uses WordNet synsets as vocabulary to describe concepts. Each word sense (or concept) is defined by some simpler concepts. The simple concepts used in the definitions can be further decomposed into even simpler concepts, until primitive or basic concepts are reached. Therefore the definition of a concept can be dynamically decomposed and unified into Extended-HowNet at different levels of representations. Extended-HowNet are language independent. Any word sense of any language can be defined and achieved near-canonical representation. For any two concepts, not only their semantic distances but also their sense similarity and difference are known by checking their definitions. In addition to taxonomy links, concepts are also associated by their shared conceptual features. Fine-grain differences among near-synonyms can be differentiated by adding new features.
In order to train machines to 'understand' natural language, we propose a meaning representation ... more In order to train machines to 'understand' natural language, we propose a meaning representation mechanism called E-HowNet to encode lexical senses. In this paper, we take interrogatives as examples to demonstrate the mechanisms of semantic representation and composition of interrogative constructions under the framework of E-HowNet. We classify the interrogative words into five classes according to their query types, and represent each type of interrogatives with fine-grained features and operators. The process of semantic composition and the difficulties of representation, such as word sense disambiguation, are addressed. Finally, machine understanding is tested by showing how machines derive the same deep semantic structure for synonymous sentences with different surface structures.
This paper describes a universal concept representational mechanism called E-HowNet, to handle di... more This paper describes a universal concept representational mechanism called E-HowNet, to handle difficulties caused by unknown words in natural language processing. Semantic structures and sense disambiguation of unknown words are discovered by analogy. We intend to achieve that any concept can be defined by E-HowNet and the representation is near-canonical. The design for easy semantic composition and decomposition makes the automation of semantic processing for unknown words, phrases and even sentences possible.
Journal of Information Science and Engineering
In this paper, we will present a device specially designed on the basis of the theory of empty ca... more In this paper, we will present a device specially designed on the basis of the theory of empty categories. This device cooperates with a bottom-up parser and is used as an elegant and efficient approachtotreatthetroublesome problems of the transformations of passivization,relativizatlon; toplcalization, ba-transformation and the use of zero pronouns in Chinese natural language. With the aid of the device, the grammar rules for Chinese will be much more simplified and easier to design, and the processing capability can be significantly improved.
m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®... more m®èÌ«÷± 6LQLFD 7UHHEDQN ìÌ1ÌÞ>1"<m®Þ¥ầ ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷± m¼75^Hö751¼7VõÞz13ëÓ-äh_ Yó!®/m®èÌ«÷±Ìì±I¡7vÍ 1m^Ò Ã ±°6LQLFD &RUSXV ±¼7 W Ð a! Ñ ï °, QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±1d5¤a!.Ì7=°Þ ï3ë«èÌ«ñÐB*èÌ31+À¤Èõ ãÐ3`%% Àí^Ð*3`1%^¤`Íô 1èÌn¤ Ê3`^zH< à<û1: +õ m®èÌ«÷± 6LQLFD 7UHHEDQN m^Òà ["7 ¶3vÍ m^Òñ°6LQLFD &RUSXV ±m¼7W7=°3ë«èÌ« ã Ð3`%%_Àíõ1¬S1«à`m®èÌ«÷±ìÌ1ÌÞ>1"<m®Þ¥ â<ûÒ+! ¶3`±1Ò0^zñÐU!m®èÌ«÷±m¼ 75^Hö751¼78õÞz13ëÓ-äD_Yó! ®1>1"/m®èÌ«÷±Ìì±I¡=1 ¶¯÷ Ê2^3ë m®Wz<àÐ=!Ñï°,QIRUPDWLRQ EDVHG &DVH *UDPPDU ,&* ±@ 5¤4!+}Ià@±¤¢+!m®èÌ« ¶1èÌ ë^kéBà¢! pt1àÂz #2/a!1Ñï°, &* ±d5¤