Namgyu Kim - Academia.edu (original) (raw)
Papers by Namgyu Kim
World Academy of Science, Engineering and Technology, International Journal of Economics and Management Engineering, May 28, 2015
Zenodo (CERN European Organization for Nuclear Research), Oct 2, 2015
Recently, many users have begun to frequently share their opinions on diverse issues using variou... more Recently, many users have begun to frequently share their opinions on diverse issues using various social media. Therefore, numerous governments have attempted to establish or improve national policies according to the public opinions captured from various social media. In this paper, we indicate several limitations of the traditional approaches to analyze public opinion on science and technology and provide an alternative methodology to overcome these limitations. First, we distinguish between the science and technology analysis phase and the social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we successively apply a start list and a stop list to acquire clarified and interesting results. Finally, to identify the most appropriate documents that fit with a given subject, we develop a new logical filter concept that consists of not only mere keywords but also a logical relationship among the keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discover core issues and public opinions from 1,700,886 documents comprising SNS, blogs, news, and discussions.
Zenodo (CERN European Organization for Nuclear Research), Sep 4, 2014
The need to extract R&D keywords from issues and use them to retrieve R&D information is increasi... more The need to extract R&D keywords from issues and use them to retrieve R&D information is increasing rapidly. However, it is difficult to identify related issues or distinguish them. Although the similarity between issues cannot be identified, with an R&D lexicon, issues that always share the same R&D keywords can be determined. In detail, the R&D keywords that are associated with a particular issue imply the key technology elements that are needed to solve a particular issue. Furthermore, the relationship among issues that share the same R&D keywords can be shown in a more systematic way by clustering them according to keywords. Thus, sharing R&D results and reusing R&D technology can be facilitated. Indirectly, redundant investment in R&D can be reduced as the relevant R&D information can be shared among corresponding issues and the reusability of related R&D can be improved. Therefore, a methodology to cluster issues from the perspective of common R&D keywords is proposed to satisfy these demands.
Journal of Intelligence and Information Systems, Sep 30, 2015
Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distr... more Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags.
The e-Business Studies, 2018
Journal of the Korea society of IT services, Mar 31, 2015
The volume of unstructured text data generated by various social media has been increasing rapidl... more The volume of unstructured text data generated by various social media has been increasing rapidly; therefore, use of text mining to support decision making has also been increasing. Especially, issue Clustering-determining a new relation with various issues through clustering-has gained attention from many researchers. However, traditional issue clustering methods can only be performed based on the co-occurrence frequency of issue keywords in many documents. Therefore, an association between issues that have a low co-occurrence frequency cannot be discovered using traditional issue clustering methods, even if those issues are strongly related in other perspectives. Therefore, issue clustering that fits each of criteria needs to be performed by the perspective of analysis and the purpose of use. In this study, a multi-dimensional issue clustering is proposed to overcome the limitation of traditional issue clustering. We assert, specifically in this study, that issue clustering should be performed for a particular purpose. We analyze the results of applying our methodology to two specific perspectives on issue clustering, (i) consumers' interests, and (ii) related R&D terms.
Journal of the Korea society of IT services, 2014
Recently, the volume of unstructured text data generated by various social media has been increas... more Recently, the volume of unstructured text data generated by various social media has been increasing rapidly; consequently, the use of text mining to support decision-making has also been growing. In particular, academia and industry are paying significant attention to topic analysis in order to discover the main issues from a large volume of text documents. Topic analysis can be regarded as static analysis because it analyzes a snapshot of the distribution of various issues. In contrast, some recent studies have attempted to perform dynamic issue tracking, which analyzes and traces issue trends during a predefined period. However, most traditional issue tracking methods have a common limitation:when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. Additionally, traditional issue tracking methods do not concentrate on the transition of individuals' interests from certain issues to others, although the methods can illustrate macro-level issue trends. In this paper, we propose an individual interests tracking methodology to overcome the two limitations of traditional issue tracking methods. Our main goal is not to track macro-level issue trends but to analyze trends of individual interests flow. Further, our methodology has extensible characteristics because it analyzes only newly added documents when the period of analysis is extended. In this paper, we also analyze the results of applying our methodology to news articles and their access logs.
World Academy of Science, Engineering and Technology, International Journal of Computer and Information Engineering, 2016
The Journal of Korean Institute of Communications and Information Sciences, 2017
The demand and interest in big data analytics are increasing rapidly. The concepts around big dat... more The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.
Journal of Intelligence and Information Systems, 2016
The Journal of Information Systems, 2011
Journal of Intelligence and Information Systems, 2013
The KIPS Transactions PartD
Association rule mining techniques enable us to acquire knowledge concerning sales patterns among... more Association rule mining techniques enable us to acquire knowledge concerning sales patterns among individual items from voluminous transactional data. Certainly, one of the major purposes of association rule mining is utilizing the acquired knowledge to provide marketing strategies such as catalogue design, cross-selling and shop allocation. However, this requires too much time and high cost to only extract the actionable and profitable knowledge from tremendous numbers of discovered patterns. In currently available literature, a number of interest measures have been devised to accelerate and systematize the process of pattern evaluation. Unfortunately, most of such measures, including support and confidence, are prone to yielding impractical results because they are calculated only from the sales frequencies of items. For instance, traditional measures cannot differentiate between the purchases in a small basket and those in a large shopping cart. Therefore, some adjustment should ...
Journal of Intelligence and Information Systems, 2014
Signal Processing: Image Communication, 2010
A digital object identifier refers to diverse technologies associated with assigning an identifie... more A digital object identifier refers to diverse technologies associated with assigning an identifier to a digital resource and managing the identification system. One type of implementation of a digital object identifier developed by the Korean Government is termed the Universal Content Identifier (UCI) system. It circulates and utilizes identifiable resources efficiently by connecting various online and offline identifying schemes. UCI
The Journal of the Korea Contents Association, 2021
Habruta is a question-based learning that talks, discusses, and argues in pairs. In particular, t... more Habruta is a question-based learning that talks, discusses, and argues in pairs. In particular, the famous painting Habruta is being implemented for the purpose of enhancing the appreciation ability of paintings and enriching the expressive power through questions and answers about the famous paintings. In this study, in order to support the famous painting Habruta for oriental paintings, we propose a method of automatically generating questions from the gender perspective of oriental painting characters using the current deep learning technology. Specifically, in this study, based on the pre-trained model, VGG16, we propose a model that can effectively analyze the features of Asian paintings by performing fine-tuning. In addition, we classify the types of questions into three types: fact, imagination, and applied questions used in the famous Habruta, and subdivide each question according to the character to derive a total of 9 question patterns. In order to verify the feasibilityof...
World Academy of Science, Engineering and Technology, International Journal of Economics and Management Engineering, May 28, 2015
Zenodo (CERN European Organization for Nuclear Research), Oct 2, 2015
Recently, many users have begun to frequently share their opinions on diverse issues using variou... more Recently, many users have begun to frequently share their opinions on diverse issues using various social media. Therefore, numerous governments have attempted to establish or improve national policies according to the public opinions captured from various social media. In this paper, we indicate several limitations of the traditional approaches to analyze public opinion on science and technology and provide an alternative methodology to overcome these limitations. First, we distinguish between the science and technology analysis phase and the social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we successively apply a start list and a stop list to acquire clarified and interesting results. Finally, to identify the most appropriate documents that fit with a given subject, we develop a new logical filter concept that consists of not only mere keywords but also a logical relationship among the keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discover core issues and public opinions from 1,700,886 documents comprising SNS, blogs, news, and discussions.
Zenodo (CERN European Organization for Nuclear Research), Sep 4, 2014
The need to extract R&D keywords from issues and use them to retrieve R&D information is increasi... more The need to extract R&D keywords from issues and use them to retrieve R&D information is increasing rapidly. However, it is difficult to identify related issues or distinguish them. Although the similarity between issues cannot be identified, with an R&D lexicon, issues that always share the same R&D keywords can be determined. In detail, the R&D keywords that are associated with a particular issue imply the key technology elements that are needed to solve a particular issue. Furthermore, the relationship among issues that share the same R&D keywords can be shown in a more systematic way by clustering them according to keywords. Thus, sharing R&D results and reusing R&D technology can be facilitated. Indirectly, redundant investment in R&D can be reduced as the relevant R&D information can be shared among corresponding issues and the reusability of related R&D can be improved. Therefore, a methodology to cluster issues from the perspective of common R&D keywords is proposed to satisfy these demands.
Journal of Intelligence and Information Systems, Sep 30, 2015
Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distr... more Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags.
The e-Business Studies, 2018
Journal of the Korea society of IT services, Mar 31, 2015
The volume of unstructured text data generated by various social media has been increasing rapidl... more The volume of unstructured text data generated by various social media has been increasing rapidly; therefore, use of text mining to support decision making has also been increasing. Especially, issue Clustering-determining a new relation with various issues through clustering-has gained attention from many researchers. However, traditional issue clustering methods can only be performed based on the co-occurrence frequency of issue keywords in many documents. Therefore, an association between issues that have a low co-occurrence frequency cannot be discovered using traditional issue clustering methods, even if those issues are strongly related in other perspectives. Therefore, issue clustering that fits each of criteria needs to be performed by the perspective of analysis and the purpose of use. In this study, a multi-dimensional issue clustering is proposed to overcome the limitation of traditional issue clustering. We assert, specifically in this study, that issue clustering should be performed for a particular purpose. We analyze the results of applying our methodology to two specific perspectives on issue clustering, (i) consumers' interests, and (ii) related R&D terms.
Journal of the Korea society of IT services, 2014
Recently, the volume of unstructured text data generated by various social media has been increas... more Recently, the volume of unstructured text data generated by various social media has been increasing rapidly; consequently, the use of text mining to support decision-making has also been growing. In particular, academia and industry are paying significant attention to topic analysis in order to discover the main issues from a large volume of text documents. Topic analysis can be regarded as static analysis because it analyzes a snapshot of the distribution of various issues. In contrast, some recent studies have attempted to perform dynamic issue tracking, which analyzes and traces issue trends during a predefined period. However, most traditional issue tracking methods have a common limitation:when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. Additionally, traditional issue tracking methods do not concentrate on the transition of individuals' interests from certain issues to others, although the methods can illustrate macro-level issue trends. In this paper, we propose an individual interests tracking methodology to overcome the two limitations of traditional issue tracking methods. Our main goal is not to track macro-level issue trends but to analyze trends of individual interests flow. Further, our methodology has extensible characteristics because it analyzes only newly added documents when the period of analysis is extended. In this paper, we also analyze the results of applying our methodology to news articles and their access logs.
World Academy of Science, Engineering and Technology, International Journal of Computer and Information Engineering, 2016
The Journal of Korean Institute of Communications and Information Sciences, 2017
The demand and interest in big data analytics are increasing rapidly. The concepts around big dat... more The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.
Journal of Intelligence and Information Systems, 2016
The Journal of Information Systems, 2011
Journal of Intelligence and Information Systems, 2013
The KIPS Transactions PartD
Association rule mining techniques enable us to acquire knowledge concerning sales patterns among... more Association rule mining techniques enable us to acquire knowledge concerning sales patterns among individual items from voluminous transactional data. Certainly, one of the major purposes of association rule mining is utilizing the acquired knowledge to provide marketing strategies such as catalogue design, cross-selling and shop allocation. However, this requires too much time and high cost to only extract the actionable and profitable knowledge from tremendous numbers of discovered patterns. In currently available literature, a number of interest measures have been devised to accelerate and systematize the process of pattern evaluation. Unfortunately, most of such measures, including support and confidence, are prone to yielding impractical results because they are calculated only from the sales frequencies of items. For instance, traditional measures cannot differentiate between the purchases in a small basket and those in a large shopping cart. Therefore, some adjustment should ...
Journal of Intelligence and Information Systems, 2014
Signal Processing: Image Communication, 2010
A digital object identifier refers to diverse technologies associated with assigning an identifie... more A digital object identifier refers to diverse technologies associated with assigning an identifier to a digital resource and managing the identification system. One type of implementation of a digital object identifier developed by the Korean Government is termed the Universal Content Identifier (UCI) system. It circulates and utilizes identifiable resources efficiently by connecting various online and offline identifying schemes. UCI
The Journal of the Korea Contents Association, 2021
Habruta is a question-based learning that talks, discusses, and argues in pairs. In particular, t... more Habruta is a question-based learning that talks, discusses, and argues in pairs. In particular, the famous painting Habruta is being implemented for the purpose of enhancing the appreciation ability of paintings and enriching the expressive power through questions and answers about the famous paintings. In this study, in order to support the famous painting Habruta for oriental paintings, we propose a method of automatically generating questions from the gender perspective of oriental painting characters using the current deep learning technology. Specifically, in this study, based on the pre-trained model, VGG16, we propose a model that can effectively analyze the features of Asian paintings by performing fine-tuning. In addition, we classify the types of questions into three types: fact, imagination, and applied questions used in the famous Habruta, and subdivide each question according to the character to derive a total of 9 question patterns. In order to verify the feasibilityof...