Yi-Reun Kim - Profile on Academia.edu (original) (raw)

Papers by Yi-Reun Kim

Research paper thumbnail of COSMOS/MT: 객체 저장 시스템 COSMOS를 위한 멀티프로세스/멀티쓰레드 모델의 설계 및 구현 = Design and implementation of a multi-process/multi-thread model for the COSMOS object strorage system

COSMOS/MT: 객체 저장 시스템 COSMOS를 위한 멀티프로세스/멀티쓰레드 모델의 설계 및 구현 = Design and implementation of a multi-process/multi-thread model for the COSMOS object strorage system

Research paper thumbnail of Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability

Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability

Journal of KIISE:Computing Practices and Letters, 2008

As the amount of electronic documents increases rapidly with the growth of the Internet, a parall... more As the amount of electronic documents increases rapidly with the growth of the Internet, a parallel search engine capable of handling a large number of documents are becoming ever important. To implement a parallel search engine, we need to partition the inverted index and search through the partitioned index in parallel. There are two methods of partitioning the inverted index: 1) document-identifier based partitioning and 2) keyword-identifier based partitioning. However, each method alone has the following drawbacks. The former is convenient in inserting documents and has high throughput, but has poor performance for top h query processing. The latter has good performance for top-k query processing, but is inconvenient in inserting documents and has low throughput. In this paper, we propose a hybrid partitioning method to compensate for the drawback of each method. We design and implement a parallel search engine that supports the hybrid partitioning method using the Odysseus DBM...

Research paper thumbnail of Efficient Buffer Coherency Management for a Shared-Disk based Multiple-Server DBMS

Efficient Buffer Coherency Management for a Shared-Disk based Multiple-Server DBMS

Journal of KIISE:Databases, 2009

In a multiple-server DBMS using the share-disk model, when a server process updates data, the upd... more In a multiple-server DBMS using the share-disk model, when a server process updates data, the updated ones are not immediately reflected to the buffers of the other server processes. Thus, the other server processes may read invalid data. In this paper, we propose a novel method to solve this problem. In this method the server process stores the identifiers and timestamps of the pages that have been updated during a transaction into the coherency volume when the transaction commits. Then, the server process invalidates its buffers of the pages updated by the other server processes by accessing the coherency volume when the lock is acquired, and, subsequently, read the up-to-date versions of the pages from disk. This method needs only a very small coherency volume and shows a good performance because the amount of data that need to be accessed is very small.

Research paper thumbnail of Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine

Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine

As the size of the web is growing explosively, search engines are becoming increasingly important... more As the size of the web is growing explosively, search engines are becoming increasingly important as the primary means to retrieve information from the Internet. A search engine periodically downloads web pages and stores them in the database to provide readers with up-to-date search results. The web crawler is a program that downloads and stores web pages for this purpose. A large-scale search engines uses a parallel web crawler to retrieve the collection of web pages maximizing the download rate. However, the service architecture or experimental analysis of parallel web crawlers has not been fully discussed in the literature. In this paper, we propose an architecture of the parallel web crawler and discuss implementation issues in detail. The proposed parallel web crawler is based on the coordinator/agent model using multiple machines to download web pages in parallel. The coordinator/agent model consists of multiple agent machines to collect web pages and a single coordinator mac...

Research paper thumbnail of Managing MEMS and flash storage devices in a DBMS = DBMS에서의 MEMS와 Flash 저장 장치들의 관리

Managing MEMS and flash storage devices in a DBMS = DBMS에서의 MEMS와 Flash 저장 장치들의 관리

Research paper thumbnail of Query Expansion Using Augmented Terms in an Extended Boolean Model

Journal of Computing Science and Engineering, 2008

We propose a new query expansion method in the extended Boolean model that improves precision wit... more We propose a new query expansion method in the extended Boolean model that improves precision without degrading recall. For improving precision, our method promotes the ranks of documents having more query terms since users typically prefer such documents. The proposed method consists of the following three steps: (1) expanding the query by adding new terms related to each term of the query, (2) further expanding the query by adding augmented terms, which are conjunctions of the terms, (3) assigning a weight on each term so that augmented terms have higher weights than the other terms. We conduct extensive experiments to show the effectiveness of the proposed method. The experimental results show that the proposed method improves precision by up to 102% for the TREC-6 data compared with the existing query expansion method using a thesaurus proposed by Kwon et al. [Kwon et al. 1994].

Research paper thumbnail of KSAnswer: Question-answering System of Kangwon National University and Sogang University in the 2016 BioASQ Challenge

Proceedings of the Fourth BioASQ workshop, 2016

This paper describes a questionanswering system that returns relevant documents and snippets (wit... more This paper describes a questionanswering system that returns relevant documents and snippets (with particular emphasis on snippets) from a large medical document collection. The system is implemented as part of our participation to Phase A of Task 4b in the 2016 BioASQ Challenge. The proposed system retrieves candidate answer sentences using a cluster-based language model. Then, it re-ranks the retrieved top-n sentences using five independent similarity models based on shallow semantic analysis. The experimental results show that the proposed system is the first to find snippets in batches 2 (MAP 0.0604), 3 (MAP 0.0728), 4 (MAP 0.1182), and 5 (MAP 0.1187). * Corresponding author. describes a questionanswering system of Kangwon National University and Sogang University submitted for Phase A of Task 4b in BioASQ 2016. The proposed system is focused on returning relevant documents and snippets (with particular emphasis on snippets).

Research paper thumbnail of Shadow-Page Deferred-Update Recovery Technique Integrating Shadow Page and Deferred Update Techniques in a Storage System

Shadow-Page Deferred-Update Recovery Technique Integrating Shadow Page and Deferred Update Techniques in a Storage System

Research paper thumbnail of Query Expansion Method Using Augmented Terms for Improving Precision Without Degrading Recall

Query Expansion Method Using Augmented Terms for Improving Precision Without Degrading Recall

Research paper thumbnail of The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique

Information Sciences, 2009

A top-k query returns k tuples with the highest (or the lowest) scores from a relation. The score... more A top-k query returns k tuples with the highest (or the lowest) scores from a relation. The score is computed by combining the values of one or more attributes. We focus on top-k queries having monotone linear score functions. Layer-based methods are well-known techniques for top-k query processing. These methods construct a database as a single list of layers. Here, the ith layer has the tuples that can be the top-i tuple. Thus, these methods answer top-k queries by reading at most k layers. Query performance, however, is poor when the number of tuples in each layer (simply, the layer size) is large. In this paper, we propose a new layer-ordering method, called the Partitioned-Layer Index (simply, the PL Index), that significantly improves query performance by reducing the layer size. The PL Index uses the notion of partitioning, which constructs a database as multiple sublayer lists instead of a single layer list subsequently reducing the layer size. The PL Index also uses the convex skyline, which is a subset of the skyline, to construct a sublayer to further reduce the layer size. The PL Index has the following desired properties. The query performance of the PL Index is quite insensitive to the weights of attributes (called the preference vector) of the score function and is approximately linear in the value of k. The PL Index is capable of tuning query performance for the most frequently used value of k by controlling the number of sublayer lists. Experimental results using synthetic and real data sets show that the query performance of the PL Index significantly outperforms existing methods except for small values of k (say, k 6 9).

Research paper thumbnail of A Logical Model and Data Placement Strategies for MEMS Storage Devices

IEICE Transactions on Information and Systems, 2009

MEMS storage devices are new non-volatile secondary storages that have outstanding advantages ove... more MEMS storage devices are new non-volatile secondary storages that have outstanding advantages over magnetic disks. MEMS storage devices, however, are much different from magnetic disks in the structure and access characteristics. They have thousands of heads called probe tips and provide the following two major access facilities: (1) flexibility : freely selecting a set of probe tips for accessing data, (2) parallelism : simultaneously reading and writing data with the set of probe tips selected. Due to these characteristics, it is nontrivial to find data placements that fully utilize the capability of MEMS storage devices. In this paper, we propose a simple logical model called the Region-Sector (RS) model that abstracts major characteristics affecting data retrieval performance, such as flexibility and parallelism, from the physical MEMS storage model. We also suggest heuristic data placement strategies based on the RS model and derive new data placements for relational data and two-dimensional spatial data by using those strategies. Experimental results show that the proposed data placements improve the data retrieval performance by up to 4.0 times for relational data and by up to 4.8 times for two-dimensional spatial data of approximately 320 Mbytes compared with those of existing data placements. Further, these improvements are expected to be more marked as the database size grows.

Research paper thumbnail of Page-differential logging

Proceedings of the 2010 international conference on Management of data - SIGMOD '10, 2010

Flash memory is widely used as the secondary storage in lightweight computing devices due to its ... more Flash memory is widely used as the secondary storage in lightweight computing devices due to its outstanding advantages over magnetic disks. Flash memory has many access characteristics different from those of magnetic disks, and how to take advantage of them is becoming an important research issue. There are two existing approaches to storing data into flash memory: page-based and log-based. The former has good performance for read operations, but poor performance for write operations. In contrast, the latter has good performance for write operations when updates are light, but poor performance for read operations. In this paper, we propose a new method of storing data, called page-differential logging, for flash-based storage systems that solves the drawbacks of the two methods. The primary characteristics of our method are: (1) writing only the difference (which we

Research paper thumbnail of COSMOS/MT: 객체 저장 시스템 COSMOS를 위한 멀티프로세스/멀티쓰레드 모델의 설계 및 구현 = Design and implementation of a multi-process/multi-thread model for the COSMOS object strorage system

COSMOS/MT: 객체 저장 시스템 COSMOS를 위한 멀티프로세스/멀티쓰레드 모델의 설계 및 구현 = Design and implementation of a multi-process/multi-thread model for the COSMOS object strorage system

Research paper thumbnail of Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability

Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability

Journal of KIISE:Computing Practices and Letters, 2008

As the amount of electronic documents increases rapidly with the growth of the Internet, a parall... more As the amount of electronic documents increases rapidly with the growth of the Internet, a parallel search engine capable of handling a large number of documents are becoming ever important. To implement a parallel search engine, we need to partition the inverted index and search through the partitioned index in parallel. There are two methods of partitioning the inverted index: 1) document-identifier based partitioning and 2) keyword-identifier based partitioning. However, each method alone has the following drawbacks. The former is convenient in inserting documents and has high throughput, but has poor performance for top h query processing. The latter has good performance for top-k query processing, but is inconvenient in inserting documents and has low throughput. In this paper, we propose a hybrid partitioning method to compensate for the drawback of each method. We design and implement a parallel search engine that supports the hybrid partitioning method using the Odysseus DBM...

Research paper thumbnail of Efficient Buffer Coherency Management for a Shared-Disk based Multiple-Server DBMS

Efficient Buffer Coherency Management for a Shared-Disk based Multiple-Server DBMS

Journal of KIISE:Databases, 2009

In a multiple-server DBMS using the share-disk model, when a server process updates data, the upd... more In a multiple-server DBMS using the share-disk model, when a server process updates data, the updated ones are not immediately reflected to the buffers of the other server processes. Thus, the other server processes may read invalid data. In this paper, we propose a novel method to solve this problem. In this method the server process stores the identifiers and timestamps of the pages that have been updated during a transaction into the coherency volume when the transaction commits. Then, the server process invalidates its buffers of the pages updated by the other server processes by accessing the coherency volume when the lock is acquired, and, subsequently, read the up-to-date versions of the pages from disk. This method needs only a very small coherency volume and shows a good performance because the amount of data that need to be accessed is very small.

Research paper thumbnail of Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine

Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine

As the size of the web is growing explosively, search engines are becoming increasingly important... more As the size of the web is growing explosively, search engines are becoming increasingly important as the primary means to retrieve information from the Internet. A search engine periodically downloads web pages and stores them in the database to provide readers with up-to-date search results. The web crawler is a program that downloads and stores web pages for this purpose. A large-scale search engines uses a parallel web crawler to retrieve the collection of web pages maximizing the download rate. However, the service architecture or experimental analysis of parallel web crawlers has not been fully discussed in the literature. In this paper, we propose an architecture of the parallel web crawler and discuss implementation issues in detail. The proposed parallel web crawler is based on the coordinator/agent model using multiple machines to download web pages in parallel. The coordinator/agent model consists of multiple agent machines to collect web pages and a single coordinator mac...

Research paper thumbnail of Managing MEMS and flash storage devices in a DBMS = DBMS에서의 MEMS와 Flash 저장 장치들의 관리

Managing MEMS and flash storage devices in a DBMS = DBMS에서의 MEMS와 Flash 저장 장치들의 관리

Research paper thumbnail of Query Expansion Using Augmented Terms in an Extended Boolean Model

Journal of Computing Science and Engineering, 2008

We propose a new query expansion method in the extended Boolean model that improves precision wit... more We propose a new query expansion method in the extended Boolean model that improves precision without degrading recall. For improving precision, our method promotes the ranks of documents having more query terms since users typically prefer such documents. The proposed method consists of the following three steps: (1) expanding the query by adding new terms related to each term of the query, (2) further expanding the query by adding augmented terms, which are conjunctions of the terms, (3) assigning a weight on each term so that augmented terms have higher weights than the other terms. We conduct extensive experiments to show the effectiveness of the proposed method. The experimental results show that the proposed method improves precision by up to 102% for the TREC-6 data compared with the existing query expansion method using a thesaurus proposed by Kwon et al. [Kwon et al. 1994].

Research paper thumbnail of KSAnswer: Question-answering System of Kangwon National University and Sogang University in the 2016 BioASQ Challenge

Proceedings of the Fourth BioASQ workshop, 2016

This paper describes a questionanswering system that returns relevant documents and snippets (wit... more This paper describes a questionanswering system that returns relevant documents and snippets (with particular emphasis on snippets) from a large medical document collection. The system is implemented as part of our participation to Phase A of Task 4b in the 2016 BioASQ Challenge. The proposed system retrieves candidate answer sentences using a cluster-based language model. Then, it re-ranks the retrieved top-n sentences using five independent similarity models based on shallow semantic analysis. The experimental results show that the proposed system is the first to find snippets in batches 2 (MAP 0.0604), 3 (MAP 0.0728), 4 (MAP 0.1182), and 5 (MAP 0.1187). * Corresponding author. describes a questionanswering system of Kangwon National University and Sogang University submitted for Phase A of Task 4b in BioASQ 2016. The proposed system is focused on returning relevant documents and snippets (with particular emphasis on snippets).

Research paper thumbnail of Shadow-Page Deferred-Update Recovery Technique Integrating Shadow Page and Deferred Update Techniques in a Storage System

Shadow-Page Deferred-Update Recovery Technique Integrating Shadow Page and Deferred Update Techniques in a Storage System

Research paper thumbnail of Query Expansion Method Using Augmented Terms for Improving Precision Without Degrading Recall

Query Expansion Method Using Augmented Terms for Improving Precision Without Degrading Recall

Research paper thumbnail of The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique

Information Sciences, 2009

A top-k query returns k tuples with the highest (or the lowest) scores from a relation. The score... more A top-k query returns k tuples with the highest (or the lowest) scores from a relation. The score is computed by combining the values of one or more attributes. We focus on top-k queries having monotone linear score functions. Layer-based methods are well-known techniques for top-k query processing. These methods construct a database as a single list of layers. Here, the ith layer has the tuples that can be the top-i tuple. Thus, these methods answer top-k queries by reading at most k layers. Query performance, however, is poor when the number of tuples in each layer (simply, the layer size) is large. In this paper, we propose a new layer-ordering method, called the Partitioned-Layer Index (simply, the PL Index), that significantly improves query performance by reducing the layer size. The PL Index uses the notion of partitioning, which constructs a database as multiple sublayer lists instead of a single layer list subsequently reducing the layer size. The PL Index also uses the convex skyline, which is a subset of the skyline, to construct a sublayer to further reduce the layer size. The PL Index has the following desired properties. The query performance of the PL Index is quite insensitive to the weights of attributes (called the preference vector) of the score function and is approximately linear in the value of k. The PL Index is capable of tuning query performance for the most frequently used value of k by controlling the number of sublayer lists. Experimental results using synthetic and real data sets show that the query performance of the PL Index significantly outperforms existing methods except for small values of k (say, k 6 9).

Research paper thumbnail of A Logical Model and Data Placement Strategies for MEMS Storage Devices

IEICE Transactions on Information and Systems, 2009

MEMS storage devices are new non-volatile secondary storages that have outstanding advantages ove... more MEMS storage devices are new non-volatile secondary storages that have outstanding advantages over magnetic disks. MEMS storage devices, however, are much different from magnetic disks in the structure and access characteristics. They have thousands of heads called probe tips and provide the following two major access facilities: (1) flexibility : freely selecting a set of probe tips for accessing data, (2) parallelism : simultaneously reading and writing data with the set of probe tips selected. Due to these characteristics, it is nontrivial to find data placements that fully utilize the capability of MEMS storage devices. In this paper, we propose a simple logical model called the Region-Sector (RS) model that abstracts major characteristics affecting data retrieval performance, such as flexibility and parallelism, from the physical MEMS storage model. We also suggest heuristic data placement strategies based on the RS model and derive new data placements for relational data and two-dimensional spatial data by using those strategies. Experimental results show that the proposed data placements improve the data retrieval performance by up to 4.0 times for relational data and by up to 4.8 times for two-dimensional spatial data of approximately 320 Mbytes compared with those of existing data placements. Further, these improvements are expected to be more marked as the database size grows.

Research paper thumbnail of Page-differential logging

Proceedings of the 2010 international conference on Management of data - SIGMOD '10, 2010

Flash memory is widely used as the secondary storage in lightweight computing devices due to its ... more Flash memory is widely used as the secondary storage in lightweight computing devices due to its outstanding advantages over magnetic disks. Flash memory has many access characteristics different from those of magnetic disks, and how to take advantage of them is becoming an important research issue. There are two existing approaches to storing data into flash memory: page-based and log-based. The former has good performance for read operations, but poor performance for write operations. In contrast, the latter has good performance for write operations when updates are light, but poor performance for read operations. In this paper, we propose a new method of storing data, called page-differential logging, for flash-based storage systems that solves the drawbacks of the two methods. The primary characteristics of our method are: (1) writing only the difference (which we