Jong-hyeok Lee - Academia.edu (original) (raw)

Papers by Jong-hyeok Lee

Research paper thumbnail of POSTECH's Statistical Machine Translation Systems for NTCIR-9 PatentMT Task (English-to-Japanese)

Research paper thumbnail of Subtopic mining using simple patterns and hierarchical structure of subtopic candidates from web documents

Information Processing and Management, Nov 1, 2015

The intention gap between users and queries results in ambiguous and broad queries. To solve thes... more The intention gap between users and queries results in ambiguous and broad queries. To solve these problems, subtopic mining has been studied, which returns a ranked list of possible subtopics according to their relevance, popularity, and diversity. This paper proposes a novel method to mine subtopics using simple patterns and a hierarchical structure of subtopic candidates. First, relevant and various phrases are extracted as subtopic candidates using simple patterns based on noun phrases and alternative partial-queries. Second, a hierarchical structure of the subtopic candidates is constructed using sets of relevant documents from a web document collection. Finally, the subtopic candidates are ranked considering a balance between popularity and diversity using this structure. In experiments, our proposed methods outperformed the baselines and even an external resource based method at high-ranked subtopics, which shows that our methods can be effective and useful in various search scenarios like result diversification.

Research paper thumbnail of Semantic Relation Extraction using Pattern Pairs Sharing a Term

Journal of KIISE:Computing Practices and Letters, 2009

Constructing an ontology using a mass corpus begins with an automatic semantic relation extractio... more Constructing an ontology using a mass corpus begins with an automatic semantic relation extraction. A general method regards words appearing between terms as patterns which are used to extract semantic relations. However, previous approaches consider only one sentence to extract a pattern, so they cannot extract semantic relations for terms in different sentences. This paper proposes a semantic relation extraction method using pairs of patterns sharing a term, where each pattern is extracted using one of the seed term pair satisfying the target relation. In our experiments, we achieved the accuracy 83.75% improving previous methods by 7.5% in is- relation and the accuracy 83.75% improved by 5% in part-of relation. We also present a possibility of improving the recall by the relative recall.

Research paper thumbnail of Subtopic Mining Based on Head-Modifier Relation and Co-occurrence of Intents Using Web Documents

Springer eBooks, 2013

This paper proposes a method that mines subtopics using the head-modifier relation and co-occurre... more This paper proposes a method that mines subtopics using the head-modifier relation and co-occurrence of users' intents from web documents in Japanese. We extracted subtopics using the simple patterns based on the head-modifier relation between the query and its adjacent words, and returned the ranked list of subtopics by the proposed score equation. We re-ranked subtopics according to the intent co-occurrence measure. Our method achieved good performance than the baseline methods and suggested queries from the major web search engine. The results of our method will be useful in various search scenarios, such as query suggestion and result diversification.

Research paper thumbnail of A discriminative reordering parser for IWSLT 2013

We participated in the IWSLT 2013 Evaluation Campaign for the MT track for two official direction... more We participated in the IWSLT 2013 Evaluation Campaign for the MT track for two official directions: German$English. Our system consisted of a reordering module and a statistical machine translation (SMT) module under a pre-ordering SMT framework. We trained the reordering module using three scalable methods in order to utilize training instances as many as possible. The translation quality of our primary submissions were comparable to that of a hierarchical phrasebased SMT, which usually requires a longer time to decode.

Research paper thumbnail of Forest-to-string translation using binarized dependency forest for IWSLT 2012 OLYMPICS task

We participated in the OLYMPICS task in IWSLT 2012 and submitted two formal runs using a forest-t... more We participated in the OLYMPICS task in IWSLT 2012 and submitted two formal runs using a forest-to-string translation system. Our primary run achieved better translation quality than our contrastive run, but worse than a phrase-based and a hierarchical system using Moses.

Research paper thumbnail of Transformer-based Screenplay Summarization Using Augmented Learning Representation with Dialogue Information

Proceedings of the Third Workshop on Narrative Understanding, 2021

Screenplay summarization is the task of extracting informative scenes from a screenplay. The scre... more Screenplay summarization is the task of extracting informative scenes from a screenplay. The screenplay contains turning point (TP) events that change the story direction and thus define the story structure decisively. Accordingly, this task can be defined as the TP identification task. We suggest using dialogue information, one attribute of screenplays, motivated by previous work that discovered that TPs have a relation with dialogues appearing in screenplays. To teach a model this characteristic, we add a dialogue feature to the input embedding. Moreover, in an attempt to improve the model architecture of previous studies, we replace LSTM with Transformer. We observed that the model can better identify TPs in a screenplay by using dialogue information and that a model adopting Transformer outperforms LSTM-based models.

Research paper thumbnail of Partially Supervised Phrase-Level Sentiment Classification

Lecture Notes in Computer Science, 2009

This paper presents a new partially supervised approach to phraselevel sentiment analysis that fi... more This paper presents a new partially supervised approach to phraselevel sentiment analysis that first automatically constructs a polarity-tagged corpus and then learns sequential sentiment tag from the corpus. This approach uses only sentiment sentences which are readily available on the Internet and does not use a polarity-tagged corpus which is hard to construct manually. With this approach, the system is able to automatically classify phrase-level sentiment. The result shows that a system can learn sentiment expressions without a polaritytagged corpus.

Research paper thumbnail of Phoneme-level speech and natural language intergration for agglutinative languages

Arxiv preprint cmp-lg/9411013, 1994

A new tightly coupled speech and natural language integration model is presented for a TDNN-based... more A new tightly coupled speech and natural language integration model is presented for a TDNN-based large vocabulary continuous speech recognition system. Unlike the popular n-best techniques developed for integrating mainly HMM-based speech and natural language systems in word level, which is obviously inadequate for the morphologically complex agglutinative languages, our model constructs a spoken language system based on the phoneme-level integration. The TDNN-CYK spoken language architecture is designed and implemented using the TDNN-based diphone recognition module integrated with the table-driven phonological/morphological co-analysis. Our integration model provides a seamless integration of speech and natural language for connectionist speech recognition systems especially for morphologically complex languages such as Korean. Our experiment results This research was supported in part by a grant from KOSEF (Korean Science and Engineering Foundation). We also thank to WonIl Lee for coding the lexicon and the morphological parser and to professor Hong Jeong for his valuable suggestions for the earlier draft of this paper. An extended version of this paper was submitted to the journal of natural language engineering for a review. show that the speaker-dependent continuous Eojeol (word) recognition can be integrated with the morphological analysis with over 80% morphological analysis success rate directly from the speech input for the middlelevel vocabularies. 1 One notable exception is the researches by Sawai 8, 9].

Research paper thumbnail of Hierarchical subtopic mining for topic annotation

Research paper thumbnail of Postech's System Description for Medical Text Translation Task

This short paper presents a system description for intrinsic evaluation of the WMT 14's medical t... more This short paper presents a system description for intrinsic evaluation of the WMT 14's medical text translation task. Our systems consist of phrase-based statistical machine translation system and query translation system between German-English language pairs. Our work focuses on the query translation task and we achieved the highest BLEU score among the all submitted systems for the English-German intrinsic query translation evaluation.

Research paper thumbnail of Method of Mining Subtopics Using Dependency Structure and Anchor Texts

Lecture Notes in Computer Science, 2012

ABSTRACT

Research paper thumbnail of Conveying Subjectivity of a Lexicon of One Language into Another Using a Bilingual Dictionary and a Link Analysis Algorithm

International Journal of Computer Processing of Languages, Jun 1, 2009

This paper proposes a method that automatically creates a subjectivity lexicon in a new language ... more This paper proposes a method that automatically creates a subjectivity lexicon in a new language using a subjectivity lexicon in a resource-rich language with only a bilingual dictionary. We resolve some of the difficulties in selecting appropriate senses when translating lexicon, and present a framework that sequentially applies an iterative link analysis algorithm to enhance the quality of lexicons of both the source and target languages. The experimental results have empirically shown to improve the subjectivity lexicon in the source language as well as create a good quality lexicon in a new language.

Research paper thumbnail of Improving fluency by reordering target constituents using MST parser in English-to-Japanese phrase-based SMT

We propose a reordering method to improve the fluency of the output of the phrase-based SMT (PBSM... more We propose a reordering method to improve the fluency of the output of the phrase-based SMT (PBSMT) system. We parse the translation results that follow the source language order into non-projective dependency trees, then reorder dependency trees to obtain fluent target sentences. Our method ensures that the translation results are grammatically correct and achieves major improvements over PB-SMT using dependency-based metrics.

Research paper thumbnail of Korean Speech Act Tagging using Previous Sentence Features and Following Candidate Speech Acts

Journal of KIISE:Software and Applications, 2008

Research paper thumbnail of Found in Translation: Conveying Subjectivity of a Lexicon of One Language into Another Using a Bilingual Dictionary and a Link Analysis Algorithm

Springer eBooks, 2009

This paper proposes a method that automatically creates a subjectivity lexicon in a new language ... more This paper proposes a method that automatically creates a subjectivity lexicon in a new language using a subjectivity lexicon in a resource-rich language with only a bilingual dictionary. We resolve some of the difficulties in selecting appropriate senses when translating lexicon, and present a framework that sequentially applies an iterative link analysis algorithm to enhance the quality of lexicons of both the source and target languages. The experimental results have empirically shown to improve the subjectivity lexicon in the source language as well as create a good quality lexicon in a new language.

Research paper thumbnail of Subtopic Mining Based on Three-Level Hierarchical Search Intentions

Lecture Notes in Computer Science, 2016

This paper proposes a subtopic mining method based on three-level hierarchical search intentions.... more This paper proposes a subtopic mining method based on three-level hierarchical search intentions. Various subtopic candidates are extracted from web documents using a simple pattern, and higher-level and lower-level subtopics are selected from these candidates. The selected subtopics as second-level subtopics are ranked by a proposed measure, and are expanded and re-ranked considering the characteristics of resources. Using general terms in the higher-level subtopics, we make second-level subtopic groups and generate first-level subtopics. Our method achieved better performance than a state of the art method.

Research paper thumbnail of Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Meeting of the Association for Computational Linguistics, Jul 11, 2010

Subjectivity analysis is a rapidly growing field of study. Along with its applications to various... more Subjectivity analysis is a rapidly growing field of study. Along with its applications to various NLP tasks, much work have put efforts into multilingual subjectivity learning from existing resources. Multilingual subjectivity analysis requires language-independent criteria for comparable outcomes across languages. This paper proposes to measure the multilanguage-comparability of subjectivity analysis tools, and provides meaningful comparisons of multilingual subjectivity analysis from various points of view.

Research paper thumbnail of Noising Scheme for Data Augmentation in Automatic Post-Editing

This paper describes POSTECH’s submission to WMT20 for the shared task on Automatic Post-Editing ... more This paper describes POSTECH’s submission to WMT20 for the shared task on Automatic Post-Editing (APE). Our focus is on increasing the quantity of available APE data to overcome the shortage of human-crafted training data. In our experiment, we implemented a noising module that simulates four types of post-editing errors, and we introduced this module into a Transformer-based multi-source APE model. Our noising module implants errors into texts on the target side of parallel corpora during the training phase to make synthetic MT outputs, increasing the entire number of training samples. We also generated additional training data using the parallel corpora and NMT model that were released for the Quality Estimation task, and we used these data to train our APE model. Experimental results on the WMT20 English-German APE data set show improvements over the baseline in terms of both the TER and BLEU scores: our primary submission achieved an improvement of -3.15 TER and +4.01 BLEU, and ...

Research paper thumbnail of An Employment Verification Method Using Social Network Analysis

Journal of KIISE:Databases, 2013

Research paper thumbnail of POSTECH's Statistical Machine Translation Systems for NTCIR-9 PatentMT Task (English-to-Japanese)

Research paper thumbnail of Subtopic mining using simple patterns and hierarchical structure of subtopic candidates from web documents

Information Processing and Management, Nov 1, 2015

The intention gap between users and queries results in ambiguous and broad queries. To solve thes... more The intention gap between users and queries results in ambiguous and broad queries. To solve these problems, subtopic mining has been studied, which returns a ranked list of possible subtopics according to their relevance, popularity, and diversity. This paper proposes a novel method to mine subtopics using simple patterns and a hierarchical structure of subtopic candidates. First, relevant and various phrases are extracted as subtopic candidates using simple patterns based on noun phrases and alternative partial-queries. Second, a hierarchical structure of the subtopic candidates is constructed using sets of relevant documents from a web document collection. Finally, the subtopic candidates are ranked considering a balance between popularity and diversity using this structure. In experiments, our proposed methods outperformed the baselines and even an external resource based method at high-ranked subtopics, which shows that our methods can be effective and useful in various search scenarios like result diversification.

Research paper thumbnail of Semantic Relation Extraction using Pattern Pairs Sharing a Term

Journal of KIISE:Computing Practices and Letters, 2009

Constructing an ontology using a mass corpus begins with an automatic semantic relation extractio... more Constructing an ontology using a mass corpus begins with an automatic semantic relation extraction. A general method regards words appearing between terms as patterns which are used to extract semantic relations. However, previous approaches consider only one sentence to extract a pattern, so they cannot extract semantic relations for terms in different sentences. This paper proposes a semantic relation extraction method using pairs of patterns sharing a term, where each pattern is extracted using one of the seed term pair satisfying the target relation. In our experiments, we achieved the accuracy 83.75% improving previous methods by 7.5% in is- relation and the accuracy 83.75% improved by 5% in part-of relation. We also present a possibility of improving the recall by the relative recall.

Research paper thumbnail of Subtopic Mining Based on Head-Modifier Relation and Co-occurrence of Intents Using Web Documents

Springer eBooks, 2013

This paper proposes a method that mines subtopics using the head-modifier relation and co-occurre... more This paper proposes a method that mines subtopics using the head-modifier relation and co-occurrence of users' intents from web documents in Japanese. We extracted subtopics using the simple patterns based on the head-modifier relation between the query and its adjacent words, and returned the ranked list of subtopics by the proposed score equation. We re-ranked subtopics according to the intent co-occurrence measure. Our method achieved good performance than the baseline methods and suggested queries from the major web search engine. The results of our method will be useful in various search scenarios, such as query suggestion and result diversification.

Research paper thumbnail of A discriminative reordering parser for IWSLT 2013

We participated in the IWSLT 2013 Evaluation Campaign for the MT track for two official direction... more We participated in the IWSLT 2013 Evaluation Campaign for the MT track for two official directions: German$English. Our system consisted of a reordering module and a statistical machine translation (SMT) module under a pre-ordering SMT framework. We trained the reordering module using three scalable methods in order to utilize training instances as many as possible. The translation quality of our primary submissions were comparable to that of a hierarchical phrasebased SMT, which usually requires a longer time to decode.

Research paper thumbnail of Forest-to-string translation using binarized dependency forest for IWSLT 2012 OLYMPICS task

We participated in the OLYMPICS task in IWSLT 2012 and submitted two formal runs using a forest-t... more We participated in the OLYMPICS task in IWSLT 2012 and submitted two formal runs using a forest-to-string translation system. Our primary run achieved better translation quality than our contrastive run, but worse than a phrase-based and a hierarchical system using Moses.

Research paper thumbnail of Transformer-based Screenplay Summarization Using Augmented Learning Representation with Dialogue Information

Proceedings of the Third Workshop on Narrative Understanding, 2021

Screenplay summarization is the task of extracting informative scenes from a screenplay. The scre... more Screenplay summarization is the task of extracting informative scenes from a screenplay. The screenplay contains turning point (TP) events that change the story direction and thus define the story structure decisively. Accordingly, this task can be defined as the TP identification task. We suggest using dialogue information, one attribute of screenplays, motivated by previous work that discovered that TPs have a relation with dialogues appearing in screenplays. To teach a model this characteristic, we add a dialogue feature to the input embedding. Moreover, in an attempt to improve the model architecture of previous studies, we replace LSTM with Transformer. We observed that the model can better identify TPs in a screenplay by using dialogue information and that a model adopting Transformer outperforms LSTM-based models.

Research paper thumbnail of Partially Supervised Phrase-Level Sentiment Classification

Lecture Notes in Computer Science, 2009

This paper presents a new partially supervised approach to phraselevel sentiment analysis that fi... more This paper presents a new partially supervised approach to phraselevel sentiment analysis that first automatically constructs a polarity-tagged corpus and then learns sequential sentiment tag from the corpus. This approach uses only sentiment sentences which are readily available on the Internet and does not use a polarity-tagged corpus which is hard to construct manually. With this approach, the system is able to automatically classify phrase-level sentiment. The result shows that a system can learn sentiment expressions without a polaritytagged corpus.

Research paper thumbnail of Phoneme-level speech and natural language intergration for agglutinative languages

Arxiv preprint cmp-lg/9411013, 1994

A new tightly coupled speech and natural language integration model is presented for a TDNN-based... more A new tightly coupled speech and natural language integration model is presented for a TDNN-based large vocabulary continuous speech recognition system. Unlike the popular n-best techniques developed for integrating mainly HMM-based speech and natural language systems in word level, which is obviously inadequate for the morphologically complex agglutinative languages, our model constructs a spoken language system based on the phoneme-level integration. The TDNN-CYK spoken language architecture is designed and implemented using the TDNN-based diphone recognition module integrated with the table-driven phonological/morphological co-analysis. Our integration model provides a seamless integration of speech and natural language for connectionist speech recognition systems especially for morphologically complex languages such as Korean. Our experiment results This research was supported in part by a grant from KOSEF (Korean Science and Engineering Foundation). We also thank to WonIl Lee for coding the lexicon and the morphological parser and to professor Hong Jeong for his valuable suggestions for the earlier draft of this paper. An extended version of this paper was submitted to the journal of natural language engineering for a review. show that the speaker-dependent continuous Eojeol (word) recognition can be integrated with the morphological analysis with over 80% morphological analysis success rate directly from the speech input for the middlelevel vocabularies. 1 One notable exception is the researches by Sawai 8, 9].

Research paper thumbnail of Hierarchical subtopic mining for topic annotation

Research paper thumbnail of Postech's System Description for Medical Text Translation Task

This short paper presents a system description for intrinsic evaluation of the WMT 14's medical t... more This short paper presents a system description for intrinsic evaluation of the WMT 14's medical text translation task. Our systems consist of phrase-based statistical machine translation system and query translation system between German-English language pairs. Our work focuses on the query translation task and we achieved the highest BLEU score among the all submitted systems for the English-German intrinsic query translation evaluation.

Research paper thumbnail of Method of Mining Subtopics Using Dependency Structure and Anchor Texts

Lecture Notes in Computer Science, 2012

ABSTRACT

Research paper thumbnail of Conveying Subjectivity of a Lexicon of One Language into Another Using a Bilingual Dictionary and a Link Analysis Algorithm

International Journal of Computer Processing of Languages, Jun 1, 2009

This paper proposes a method that automatically creates a subjectivity lexicon in a new language ... more This paper proposes a method that automatically creates a subjectivity lexicon in a new language using a subjectivity lexicon in a resource-rich language with only a bilingual dictionary. We resolve some of the difficulties in selecting appropriate senses when translating lexicon, and present a framework that sequentially applies an iterative link analysis algorithm to enhance the quality of lexicons of both the source and target languages. The experimental results have empirically shown to improve the subjectivity lexicon in the source language as well as create a good quality lexicon in a new language.

Research paper thumbnail of Improving fluency by reordering target constituents using MST parser in English-to-Japanese phrase-based SMT

We propose a reordering method to improve the fluency of the output of the phrase-based SMT (PBSM... more We propose a reordering method to improve the fluency of the output of the phrase-based SMT (PBSMT) system. We parse the translation results that follow the source language order into non-projective dependency trees, then reorder dependency trees to obtain fluent target sentences. Our method ensures that the translation results are grammatically correct and achieves major improvements over PB-SMT using dependency-based metrics.

Research paper thumbnail of Korean Speech Act Tagging using Previous Sentence Features and Following Candidate Speech Acts

Journal of KIISE:Software and Applications, 2008

Research paper thumbnail of Found in Translation: Conveying Subjectivity of a Lexicon of One Language into Another Using a Bilingual Dictionary and a Link Analysis Algorithm

Springer eBooks, 2009

This paper proposes a method that automatically creates a subjectivity lexicon in a new language ... more This paper proposes a method that automatically creates a subjectivity lexicon in a new language using a subjectivity lexicon in a resource-rich language with only a bilingual dictionary. We resolve some of the difficulties in selecting appropriate senses when translating lexicon, and present a framework that sequentially applies an iterative link analysis algorithm to enhance the quality of lexicons of both the source and target languages. The experimental results have empirically shown to improve the subjectivity lexicon in the source language as well as create a good quality lexicon in a new language.

Research paper thumbnail of Subtopic Mining Based on Three-Level Hierarchical Search Intentions

Lecture Notes in Computer Science, 2016

This paper proposes a subtopic mining method based on three-level hierarchical search intentions.... more This paper proposes a subtopic mining method based on three-level hierarchical search intentions. Various subtopic candidates are extracted from web documents using a simple pattern, and higher-level and lower-level subtopics are selected from these candidates. The selected subtopics as second-level subtopics are ranked by a proposed measure, and are expanded and re-ranked considering the characteristics of resources. Using general terms in the higher-level subtopics, we make second-level subtopic groups and generate first-level subtopics. Our method achieved better performance than a state of the art method.

Research paper thumbnail of Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Meeting of the Association for Computational Linguistics, Jul 11, 2010

Subjectivity analysis is a rapidly growing field of study. Along with its applications to various... more Subjectivity analysis is a rapidly growing field of study. Along with its applications to various NLP tasks, much work have put efforts into multilingual subjectivity learning from existing resources. Multilingual subjectivity analysis requires language-independent criteria for comparable outcomes across languages. This paper proposes to measure the multilanguage-comparability of subjectivity analysis tools, and provides meaningful comparisons of multilingual subjectivity analysis from various points of view.

Research paper thumbnail of Noising Scheme for Data Augmentation in Automatic Post-Editing

This paper describes POSTECH’s submission to WMT20 for the shared task on Automatic Post-Editing ... more This paper describes POSTECH’s submission to WMT20 for the shared task on Automatic Post-Editing (APE). Our focus is on increasing the quantity of available APE data to overcome the shortage of human-crafted training data. In our experiment, we implemented a noising module that simulates four types of post-editing errors, and we introduced this module into a Transformer-based multi-source APE model. Our noising module implants errors into texts on the target side of parallel corpora during the training phase to make synthetic MT outputs, increasing the entire number of training samples. We also generated additional training data using the parallel corpora and NMT model that were released for the Quality Estimation task, and we used these data to train our APE model. Experimental results on the WMT20 English-German APE data set show improvements over the baseline in terms of both the TER and BLEU scores: our primary submission achieved an improvement of -3.15 TER and +4.01 BLEU, and ...

Research paper thumbnail of An Employment Verification Method Using Social Network Analysis

Journal of KIISE:Databases, 2013