Hyungsuk Ji - Academia.edu (original) (raw)
Papers by Hyungsuk Ji
Http Www Theses Fr, 2004
Dans cette thèse nous présentons une approche théorique du concept et un modèle linguistico-infor... more Dans cette thèse nous présentons une approche théorique du concept et un modèle linguistico-informatique. Cette théorie, non définitionnelle, est fondée sur une représentation gaussienne du concept. Nous introduisons le terme « contexonyme », une formalisation de la relation de contexte entre les mots. Cette notion lie la théorie du concept au modèle informatique. Basé sur ces deux notions, notre modèle informatique apprend des contexonymes de manière automatique à partir de corpus de taille importante non annotés. Pour chaque mot donné, le modèle propose la liste de ses contexonymes et les organise par une méthode de classification hiérarchique. Les contexonymes ainsi obtenus reflètent des connaissances encyclopédiques ainsi que diverses caractéristiques langagières comme l'usage des mots ou encore les fines différences sémantiques entre synonymes. Les résultats sur des tests montrent que le modèle peut être utilisé pour des tâches de TAL ainsi que comme ressource lexicale dynamique.
Lecture Notes in Computer Science
For an advanced next-generation Human-Computer Interaction (HCI) interface, combining natural lan... more For an advanced next-generation Human-Computer Interaction (HCI) interface, combining natural language processing (NLP) is an inevitable choice as human language is the most common and sophisticated communication device. Among various topics in NLP, word sense related topic is one of the most challenging areas. In this paper we present a method adopted from ACOM that automatically generate bilingual lexicon using aligned parallel corpus. The results of the test on the predefined test set for the English and French Bibles show the method correctly produce target words with 70% correct ratio. Besides, the proposed method generates target words that reflect contextual relationship between source and target words such as garrison and Philistins.
Lecture Notes in Computer Science, 2007
Studies on the effect of text width on readability have encouraged the use of fixed text-width we... more Studies on the effect of text width on readability have encouraged the use of fixed text-width web/electronic text design. The drawback of this type of design is the loss of users' interactivity with regard to text modification. In this paper, we investigate the web design of the world's top 100 websites and present an alternative interactive user interface for text display.
The Journal of Object Technology, 2007
Today very popular Web portal sites, social networking sites, online media sites, commerce sites,... more Today very popular Web portal sites, social networking sites, online media sites, commerce sites, etc., provide platforms for millions of visitors to visit daily and express their opinions on a wide variety of subjects. The site operators strive to increase the number of visitors, and the visitors often make use of several means at their disposal to participate in the formation of collective opinions. In this article, we examine the various means becoming available to the Web site visitors to express their opinions, and the challenges that both the site operators and the general public face to ensure the visitors' opinions are fairly and accurately reflected in the collective opinions. Recently, social networking sites (such as YouTube, Digg, Flickr, MySpace, Facebook, LinkedIn, Cyworld (in Korea), etc.), Web portal sites (such as Yahoo, Baidu (in China), Naver (in Korea), etc.), media sites (such as New York Times, ESPN, CNN, FoxNews, Chosun (in Korea), etc.), commerce sites (such as Amazon, Hotel, Gmarket (in Korea), etc.), learning sites (such as Wikipedia, About, etc.) are drawing anywhere from hundreds of thousands to tens of millions of visitors daily. The site operators provide contents and/or platforms on which the visitors may upload and share user-generated contents (UGCs) and to express their opinions in any of several means. These means include posting comments, participating in discussions or forums, responding to polling questions, voting "like/dislike" (e.g., 'digg it/bury it' on Digg) on other people's postings, voting "thumbs up/down" on other visitors' comments, sharing contents with "friends", saving contents for future viewing, copying contents in their blogs, etc. It is desirable that the opinions expressed by the Web site visitors be fairly and accurately reflected in the formation of collective opinions. The most important reason is that the collective opinions, although expressed by a minority of all Web site visitors, can influence the formation of the general public opinions, and in turn government's policies on a full range of momentous issues as the election of national leaders, waging a war, national security, education and welfare reforms, immigration policies, etc. Further, the collective opinions conveyed by the very popular Web sites
The Journal of Object Technology, 2008
Internet search engines have become an indispensable part of everyday living and business today. ... more Internet search engines have become an indispensable part of everyday living and business today. Although the capabilities of Internet search engines are incrementally improving steadily, it may be time for us to explore a few new directions that can take the search engines to the next level. In this article, we will summarize the current activities in advancing the state of Internet search engines, and explore a few directions of research and development.
Behavior Research Methods, 2008
The general aim of this study is to validate the cognitive relevance of the geometric model used ... more The general aim of this study is to validate the cognitive relevance of the geometric model used in the semantic atlases (SA). With this goal in mind, we compare the results obtained by the automatic contexonym organizing model (ACOM)-an SA-derived model for word sense representation based on contextual links-with human subjects' responses on a word association task. We begin by positioning the geometric paradigm with respect to the hierarchical paradigm (WordNet) and the vector paradigm (latent semantic analysis [LSA] and the hyperspace analogue to language model). Then we compare ACOM's responses with Hirsh and Tree's (2001) word association norms based on the responses of two groups of subjects. The results showed that words associated by 50% or more of the Hirsh and Tree subjects were also proposed by ACOM (e.g., 71% of the words in the norms were also given by ACOM). Finally, we compare ACOM and LSA on the basis of the same association norms. The results indicate better performance for the geometric model.
In this paper we describe two geometrical models of meaning representation, the Semantic Atlas (S... more In this paper we describe two geometrical models of meaning representation, the Semantic Atlas (SA) and the Automatic Contexonym Organizing Model (ACOM). The SA provides maps of meaning generated through correspondence factor analysis. The models can handle different types of word relations: synonymy in the SA and co-occurrence in ACOM. Their originality relies on an artifact called 'cliques'-a fine grained infra linguistic sub-unit of meaning. The SA is composed of several dictionaries and thesauri enhanced with a process of symmetrisation. It is currently available for French and English in monolingual versions as well as in a bilingual translation version. Other languages are under development and testing. ACOM deals with unannotated corpora. The models are used by research teams worldwide that investigate synonymy, translation processes, genre comparison, psycholinguistics and polysemy modeling. Both models can be consulted online via a flexible interface allowing for interactive navigation on http://dico.isc.cnrs.fr. This site is the most consulted address of the French National Center for Scientific Research's domain (CNRS), one of the major research bodies in France. The international interest it has triggered led us to initiate the process of going open source. In the meantime, all our databases are freely available on request.
Computational Linguistics, 2003
This article describes a spatial model for matching semantic values between two languages, French... more This article describes a spatial model for matching semantic values between two languages, French and English. Based on semantic similarity links, the model constructs a map that represents a word in the source language. Then the algorithm projects the map values onto a space in the target language. The new space abides by the semantic similarity links specific to the second language. Then the two maps are projected onto the same plane in order to detect overlapping values. For instructional purposes, the different steps are presented here using a few examples. The entire set of results is available at the following address: http://dico.isc.cnrs.fr .
Inter-word associations like stagger - drunken, or intra-word sense divisions (e.g. write a diary... more Inter-word associations like stagger - drunken, or intra-word sense divisions (e.g. write a diary vs.
Http Www Theses Fr, 2004
Dans cette thèse nous présentons une approche théorique du concept et un modèle linguistico-infor... more Dans cette thèse nous présentons une approche théorique du concept et un modèle linguistico-informatique. Cette théorie, non définitionnelle, est fondée sur une représentation gaussienne du concept. Nous introduisons le terme « contexonyme », une formalisation de la relation de contexte entre les mots. Cette notion lie la théorie du concept au modèle informatique. Basé sur ces deux notions, notre modèle informatique apprend des contexonymes de manière automatique à partir de corpus de taille importante non annotés. Pour chaque mot donné, le modèle propose la liste de ses contexonymes et les organise par une méthode de classification hiérarchique. Les contexonymes ainsi obtenus reflètent des connaissances encyclopédiques ainsi que diverses caractéristiques langagières comme l'usage des mots ou encore les fines différences sémantiques entre synonymes. Les résultats sur des tests montrent que le modèle peut être utilisé pour des tâches de TAL ainsi que comme ressource lexicale dynamique.
Lecture Notes in Computer Science
For an advanced next-generation Human-Computer Interaction (HCI) interface, combining natural lan... more For an advanced next-generation Human-Computer Interaction (HCI) interface, combining natural language processing (NLP) is an inevitable choice as human language is the most common and sophisticated communication device. Among various topics in NLP, word sense related topic is one of the most challenging areas. In this paper we present a method adopted from ACOM that automatically generate bilingual lexicon using aligned parallel corpus. The results of the test on the predefined test set for the English and French Bibles show the method correctly produce target words with 70% correct ratio. Besides, the proposed method generates target words that reflect contextual relationship between source and target words such as garrison and Philistins.
Lecture Notes in Computer Science, 2007
Studies on the effect of text width on readability have encouraged the use of fixed text-width we... more Studies on the effect of text width on readability have encouraged the use of fixed text-width web/electronic text design. The drawback of this type of design is the loss of users' interactivity with regard to text modification. In this paper, we investigate the web design of the world's top 100 websites and present an alternative interactive user interface for text display.
The Journal of Object Technology, 2007
Today very popular Web portal sites, social networking sites, online media sites, commerce sites,... more Today very popular Web portal sites, social networking sites, online media sites, commerce sites, etc., provide platforms for millions of visitors to visit daily and express their opinions on a wide variety of subjects. The site operators strive to increase the number of visitors, and the visitors often make use of several means at their disposal to participate in the formation of collective opinions. In this article, we examine the various means becoming available to the Web site visitors to express their opinions, and the challenges that both the site operators and the general public face to ensure the visitors' opinions are fairly and accurately reflected in the collective opinions. Recently, social networking sites (such as YouTube, Digg, Flickr, MySpace, Facebook, LinkedIn, Cyworld (in Korea), etc.), Web portal sites (such as Yahoo, Baidu (in China), Naver (in Korea), etc.), media sites (such as New York Times, ESPN, CNN, FoxNews, Chosun (in Korea), etc.), commerce sites (such as Amazon, Hotel, Gmarket (in Korea), etc.), learning sites (such as Wikipedia, About, etc.) are drawing anywhere from hundreds of thousands to tens of millions of visitors daily. The site operators provide contents and/or platforms on which the visitors may upload and share user-generated contents (UGCs) and to express their opinions in any of several means. These means include posting comments, participating in discussions or forums, responding to polling questions, voting "like/dislike" (e.g., 'digg it/bury it' on Digg) on other people's postings, voting "thumbs up/down" on other visitors' comments, sharing contents with "friends", saving contents for future viewing, copying contents in their blogs, etc. It is desirable that the opinions expressed by the Web site visitors be fairly and accurately reflected in the formation of collective opinions. The most important reason is that the collective opinions, although expressed by a minority of all Web site visitors, can influence the formation of the general public opinions, and in turn government's policies on a full range of momentous issues as the election of national leaders, waging a war, national security, education and welfare reforms, immigration policies, etc. Further, the collective opinions conveyed by the very popular Web sites
The Journal of Object Technology, 2008
Internet search engines have become an indispensable part of everyday living and business today. ... more Internet search engines have become an indispensable part of everyday living and business today. Although the capabilities of Internet search engines are incrementally improving steadily, it may be time for us to explore a few new directions that can take the search engines to the next level. In this article, we will summarize the current activities in advancing the state of Internet search engines, and explore a few directions of research and development.
Behavior Research Methods, 2008
The general aim of this study is to validate the cognitive relevance of the geometric model used ... more The general aim of this study is to validate the cognitive relevance of the geometric model used in the semantic atlases (SA). With this goal in mind, we compare the results obtained by the automatic contexonym organizing model (ACOM)-an SA-derived model for word sense representation based on contextual links-with human subjects' responses on a word association task. We begin by positioning the geometric paradigm with respect to the hierarchical paradigm (WordNet) and the vector paradigm (latent semantic analysis [LSA] and the hyperspace analogue to language model). Then we compare ACOM's responses with Hirsh and Tree's (2001) word association norms based on the responses of two groups of subjects. The results showed that words associated by 50% or more of the Hirsh and Tree subjects were also proposed by ACOM (e.g., 71% of the words in the norms were also given by ACOM). Finally, we compare ACOM and LSA on the basis of the same association norms. The results indicate better performance for the geometric model.
In this paper we describe two geometrical models of meaning representation, the Semantic Atlas (S... more In this paper we describe two geometrical models of meaning representation, the Semantic Atlas (SA) and the Automatic Contexonym Organizing Model (ACOM). The SA provides maps of meaning generated through correspondence factor analysis. The models can handle different types of word relations: synonymy in the SA and co-occurrence in ACOM. Their originality relies on an artifact called 'cliques'-a fine grained infra linguistic sub-unit of meaning. The SA is composed of several dictionaries and thesauri enhanced with a process of symmetrisation. It is currently available for French and English in monolingual versions as well as in a bilingual translation version. Other languages are under development and testing. ACOM deals with unannotated corpora. The models are used by research teams worldwide that investigate synonymy, translation processes, genre comparison, psycholinguistics and polysemy modeling. Both models can be consulted online via a flexible interface allowing for interactive navigation on http://dico.isc.cnrs.fr. This site is the most consulted address of the French National Center for Scientific Research's domain (CNRS), one of the major research bodies in France. The international interest it has triggered led us to initiate the process of going open source. In the meantime, all our databases are freely available on request.
Computational Linguistics, 2003
This article describes a spatial model for matching semantic values between two languages, French... more This article describes a spatial model for matching semantic values between two languages, French and English. Based on semantic similarity links, the model constructs a map that represents a word in the source language. Then the algorithm projects the map values onto a space in the target language. The new space abides by the semantic similarity links specific to the second language. Then the two maps are projected onto the same plane in order to detect overlapping values. For instructional purposes, the different steps are presented here using a few examples. The entire set of results is available at the following address: http://dico.isc.cnrs.fr .
Inter-word associations like stagger - drunken, or intra-word sense divisions (e.g. write a diary... more Inter-word associations like stagger - drunken, or intra-word sense divisions (e.g. write a diary vs.