Yi-Reun Kim | Korea Advanced Institute of Science and Technology (original) (raw)

Papers by Yi-Reun Kim

Research paper thumbnail of COSMOS/MT: 객체 저장 시스템 COSMOS를 위한 멀티프로세스/멀티쓰레드 모델의 설계 및 구현 = Design and implementation of a multi-process/multi-thread model for the COSMOS object strorage system

Research paper thumbnail of Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability

Journal of KIISE:Computing Practices and Letters, 2008

As the amount of electronic documents increases rapidly with the growth of the Internet, a parall... more As the amount of electronic documents increases rapidly with the growth of the Internet, a parallel search engine capable of handling a large number of documents are becoming ever important. To implement a parallel search engine, we need to partition the inverted index and search through the partitioned index in parallel. There are two methods of partitioning the inverted index: 1) document-identifier based partitioning and 2) keyword-identifier based partitioning. However, each method alone has the following drawbacks. The former is convenient in inserting documents and has high throughput, but has poor performance for top h query processing. The latter has good performance for top-k query processing, but is inconvenient in inserting documents and has low throughput. In this paper, we propose a hybrid partitioning method to compensate for the drawback of each method. We design and implement a parallel search engine that supports the hybrid partitioning method using the Odysseus DBM...

Research paper thumbnail of Efficient Buffer Coherency Management for a Shared-Disk based Multiple-Server DBMS

Journal of KIISE:Databases, 2009

In a multiple-server DBMS using the share-disk model, when a server process updates data, the upd... more In a multiple-server DBMS using the share-disk model, when a server process updates data, the updated ones are not immediately reflected to the buffers of the other server processes. Thus, the other server processes may read invalid data. In this paper, we propose a novel method to solve this problem. In this method the server process stores the identifiers and timestamps of the pages that have been updated during a transaction into the coherency volume when the transaction commits. Then, the server process invalidates its buffers of the pages updated by the other server processes by accessing the coherency volume when the lock is acquired, and, subsequently, read the up-to-date versions of the pages from disk. This method needs only a very small coherency volume and shows a good performance because the amount of data that need to be accessed is very small.

Research paper thumbnail of Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine

As the size of the web is growing explosively, search engines are becoming increasingly important... more As the size of the web is growing explosively, search engines are becoming increasingly important as the primary means to retrieve information from the Internet. A search engine periodically downloads web pages and stores them in the database to provide readers with up-to-date search results. The web crawler is a program that downloads and stores web pages for this purpose. A large-scale search engines uses a parallel web crawler to retrieve the collection of web pages maximizing the download rate. However, the service architecture or experimental analysis of parallel web crawlers has not been fully discussed in the literature. In this paper, we propose an architecture of the parallel web crawler and discuss implementation issues in detail. The proposed parallel web crawler is based on the coordinator/agent model using multiple machines to download web pages in parallel. The coordinator/agent model consists of multiple agent machines to collect web pages and a single coordinator mac...

Research paper thumbnail of Managing MEMS and flash storage devices in a DBMS = DBMS에서의 MEMS와 Flash 저장 장치들의 관리

Research paper thumbnail of Query Expansion Using Augmented Terms in an Extended Boolean Model

Journal of Computing Science and Engineering, 2008

We propose a new query expansion method in the extended Boolean model that improves precision wit... more We propose a new query expansion method in the extended Boolean model that improves precision without degrading recall. For improving precision, our method promotes the ranks of documents having more query terms since users typically prefer such documents. The proposed method consists of the following three steps: (1) expanding the query by adding new terms related to each term of the query, (2) further expanding the query by adding augmented terms, which are conjunctions of the terms, (3) assigning a weight on each term so that augmented terms have higher weights than the other terms. We conduct extensive experiments to show the effectiveness of the proposed method. The experimental results show that the proposed method improves precision by up to 102% for the TREC-6 data compared with the existing query expansion method using a thesaurus proposed by Kwon et al. [Kwon et al. 1994].

Research paper thumbnail of KSAnswer: Question-answering System of Kangwon National University and Sogang University in the 2016 BioASQ Challenge

Proceedings of the Fourth BioASQ workshop, 2016

Research paper thumbnail of Shadow-Page Deferred-Update Recovery Technique Integrating Shadow Page and Deferred Update Techniques in a Storage System

Research paper thumbnail of Query Expansion Method Using Augmented Terms for Improving Precision Without Degrading Recall

Research paper thumbnail of The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique

Information Sciences, 2009

Research paper thumbnail of A Logical Model and Data Placement Strategies for MEMS Storage Devices

IEICE Transactions on Information and Systems, 2009

Research paper thumbnail of Page-differential logging

Proceedings of the 2010 international conference on Management of data - SIGMOD '10, 2010

Flash memory is widely used as the secondary storage in lightweight computing devices due to its ... more Flash memory is widely used as the secondary storage in lightweight computing devices due to its outstanding advantages over magnetic disks. Flash memory has many access characteristics different from those of magnetic disks, and how to take advantage of them is becoming an important research issue. There are two existing approaches to storing data into flash memory: page-based and log-based. The former has good performance for read operations, but poor performance for write operations. In contrast, the latter has good performance for write operations when updates are light, but poor performance for read operations. In this paper, we propose a new method of storing data, called page-differential logging, for flash-based storage systems that solves the drawbacks of the two methods. The primary characteristics of our method are: (1) writing only the difference (which we

Research paper thumbnail of COSMOS/MT: 객체 저장 시스템 COSMOS를 위한 멀티프로세스/멀티쓰레드 모델의 설계 및 구현 = Design and implementation of a multi-process/multi-thread model for the COSMOS object strorage system

Research paper thumbnail of Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability

Journal of KIISE:Computing Practices and Letters, 2008

As the amount of electronic documents increases rapidly with the growth of the Internet, a parall... more As the amount of electronic documents increases rapidly with the growth of the Internet, a parallel search engine capable of handling a large number of documents are becoming ever important. To implement a parallel search engine, we need to partition the inverted index and search through the partitioned index in parallel. There are two methods of partitioning the inverted index: 1) document-identifier based partitioning and 2) keyword-identifier based partitioning. However, each method alone has the following drawbacks. The former is convenient in inserting documents and has high throughput, but has poor performance for top h query processing. The latter has good performance for top-k query processing, but is inconvenient in inserting documents and has low throughput. In this paper, we propose a hybrid partitioning method to compensate for the drawback of each method. We design and implement a parallel search engine that supports the hybrid partitioning method using the Odysseus DBM...

Research paper thumbnail of Efficient Buffer Coherency Management for a Shared-Disk based Multiple-Server DBMS

Journal of KIISE:Databases, 2009

In a multiple-server DBMS using the share-disk model, when a server process updates data, the upd... more In a multiple-server DBMS using the share-disk model, when a server process updates data, the updated ones are not immediately reflected to the buffers of the other server processes. Thus, the other server processes may read invalid data. In this paper, we propose a novel method to solve this problem. In this method the server process stores the identifiers and timestamps of the pages that have been updated during a transaction into the coherency volume when the transaction commits. Then, the server process invalidates its buffers of the pages updated by the other server processes by accessing the coherency volume when the lock is acquired, and, subsequently, read the up-to-date versions of the pages from disk. This method needs only a very small coherency volume and shows a good performance because the amount of data that need to be accessed is very small.

Research paper thumbnail of Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine

As the size of the web is growing explosively, search engines are becoming increasingly important... more As the size of the web is growing explosively, search engines are becoming increasingly important as the primary means to retrieve information from the Internet. A search engine periodically downloads web pages and stores them in the database to provide readers with up-to-date search results. The web crawler is a program that downloads and stores web pages for this purpose. A large-scale search engines uses a parallel web crawler to retrieve the collection of web pages maximizing the download rate. However, the service architecture or experimental analysis of parallel web crawlers has not been fully discussed in the literature. In this paper, we propose an architecture of the parallel web crawler and discuss implementation issues in detail. The proposed parallel web crawler is based on the coordinator/agent model using multiple machines to download web pages in parallel. The coordinator/agent model consists of multiple agent machines to collect web pages and a single coordinator mac...

Research paper thumbnail of Managing MEMS and flash storage devices in a DBMS = DBMS에서의 MEMS와 Flash 저장 장치들의 관리

Research paper thumbnail of Query Expansion Using Augmented Terms in an Extended Boolean Model

Journal of Computing Science and Engineering, 2008

We propose a new query expansion method in the extended Boolean model that improves precision wit... more We propose a new query expansion method in the extended Boolean model that improves precision without degrading recall. For improving precision, our method promotes the ranks of documents having more query terms since users typically prefer such documents. The proposed method consists of the following three steps: (1) expanding the query by adding new terms related to each term of the query, (2) further expanding the query by adding augmented terms, which are conjunctions of the terms, (3) assigning a weight on each term so that augmented terms have higher weights than the other terms. We conduct extensive experiments to show the effectiveness of the proposed method. The experimental results show that the proposed method improves precision by up to 102% for the TREC-6 data compared with the existing query expansion method using a thesaurus proposed by Kwon et al. [Kwon et al. 1994].

Research paper thumbnail of KSAnswer: Question-answering System of Kangwon National University and Sogang University in the 2016 BioASQ Challenge

Proceedings of the Fourth BioASQ workshop, 2016

Research paper thumbnail of Shadow-Page Deferred-Update Recovery Technique Integrating Shadow Page and Deferred Update Techniques in a Storage System

Research paper thumbnail of Query Expansion Method Using Augmented Terms for Improving Precision Without Degrading Recall

Research paper thumbnail of The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique

Information Sciences, 2009

Research paper thumbnail of A Logical Model and Data Placement Strategies for MEMS Storage Devices

IEICE Transactions on Information and Systems, 2009

Research paper thumbnail of Page-differential logging

Proceedings of the 2010 international conference on Management of data - SIGMOD '10, 2010

Flash memory is widely used as the secondary storage in lightweight computing devices due to its ... more Flash memory is widely used as the secondary storage in lightweight computing devices due to its outstanding advantages over magnetic disks. Flash memory has many access characteristics different from those of magnetic disks, and how to take advantage of them is becoming an important research issue. There are two existing approaches to storing data into flash memory: page-based and log-based. The former has good performance for read operations, but poor performance for write operations. In contrast, the latter has good performance for write operations when updates are light, but poor performance for read operations. In this paper, we propose a new method of storing data, called page-differential logging, for flash-based storage systems that solves the drawbacks of the two methods. The primary characteristics of our method are: (1) writing only the difference (which we