1-11hit |
Jihwan SONG Xing XIE Yoon-Joon LEE Ji-Rong WEN
Mobile devices such as cell phones and personal digital assistants (PDAs) are becoming increasingly popular tools to access the Internet. Unfortunately, the experience of users attempting to access web pages with these mobile devices has been less than satisfactory because of their small display areas, slow communications links and low computing power. In this paper, we propose a trained scorer to estimate the mobile-friendliness scores of web pages, providing an indication of their suitability for mobile devices. These scores help mobile-friendly pages receive higher ranks in search results when mobile users seek information on the web. Our experiments show that the search results re-ranked by our mobile-friendliness scores increase mobile user satisfaction.
Song-Hyon KIM Inchul SONG Kyong-Ha LEE Yoon-Joon LEE
Subgraph matching is a fundamental operation for querying graph-structured data. Due to potential errors and noises in real-world graph data, exact subgraph matching is sometimes inappropriate in practice. In this paper we consider an approximate subgraph matching model that allows missing edges. Based on this model, approximate subgraph matching finds all occurrences of a given query graph in a database graph, allowing missing edges. A straightforward approach is to first generate query subgraphs of a given query graph by deleting edges and then perform exact subgraph matching for each query subgraph. In this paper we propose a sharing-based approach to approximate subgraph matching, called SASUM. Our method is based on the fact that query subgraphs are highly overlapped. Due to this overlapping nature of query subgraphs, the matches of a query subgraph can be computed from the matches of a smaller query subgraph, which results in reducing the number of query subgraphs that require expensive exact subgraph matching. Our method uses a lattice framework to identify sharing opportunities between query subgraphs. To further reduce the number of graphs that need exact subgraph matching, SASUM generates small base graphs that are shared by query subgraphs and chooses the minimum number of base graphs whose matches are used to derive the matching results of all query subgraphs. A comprehensive set of experiments shows that our approach outperforms the state-of-the-art approach by orders of magnitude in terms of query execution time.
Woo-Lam KANG Hyeon-Gyu KIM Yoon-Joon LEE
This paper presents a method to reduce I/O cost in MapReduce when online analytical processing (OLAP) queries are used for data analysis. The proposed method consists of two basic ideas. First, to reduce network transmission cost, mappers are organized to receive only data necessary to perform a map task, not an entire set of input data. Second, to reduce storage consumption, only record IDs are stored for checkpointing, not the raw records. Experiments conducted with TPC-H benchmark show that the proposed method is about 40% faster than Hive, the well-known data warehouse solution for MapReduce, while reducing the size of data stored for checkpoining to about 80%.
Song-Hyon KIM Kyong-Ha LEE Inchul SONG Hyebong CHOI Yoon-Joon LEE
We address the problem of processing graph pattern matching queries over a massive set of data graphs in this letter. As the number of data graphs is growing rapidly, it is often hard to process such queries with serial algorithms in a timely manner. We propose a distributed graph querying algorithm, which employs feature-based comparison and a filter-and-verify scheme working on the MapReduce framework. Moreover, we devise an efficient scheme that adaptively tunes a proper feature size at runtime by sampling data graphs. With various experiments, we show that the proposed method outperforms conventional algorithms in terms of scalability and efficiency.
Chul-Woong YANG Ki YONG LEE Myoung HO KIM Yoon-Joon LEE
In this paper, we present an efficient index structure for NAND flash memory, called the Dynamic Forest (D-Forest). Since write operations incur high overhead on NAND flash memory, D-Forest is designed to minimize write operations for index updates. The experimental results show that D-Forest significantly reduces write operations compared to the conventional B+-tree.
Jihwan SONG Deokmin HAAM Yoon-Joon LEE Myoung-Ho KIM
In this paper, we introduce a new sequential pattern, the Interactive User Sequence Pattern (IUSP). This pattern is useful for grouping highly interrelated users in one-way communications such as e-mail, SMS, etc., especially when the communications include many spam users. Also, we propose an efficient algorithm for discovering IUSPs from massive one-way communication logs containing only the following information: senders, receivers, and dates and times. Even though there is a difficulty in that our new sequential pattern violates the Apriori property, the proposed algorithm shows excellent processing performance and low storage cost in experiments on a real dataset.
Chul-Woong YANG Ki Yong LEE Myoung Ho KIM Yoon-Joon LEE
We propose an efficient dynamic hash index structure suitable for a NAND flash memory environment. Since write operations incur significant overhead in NAND flash memory, our design of index structure focuses on minimizing the number of write operations for hash index updates. Through a set of extensive experiments, we show the effectiveness of the proposed hash index structure in a NAND flash memory environment.
Hyeon-Gyu KIM Woo-Lam KANG Yoon-Joon LEE Myoung-Ho KIM
In this paper, we propose a predicate indexing method which handles equality and inequality tests separately. Our method uses a hash table for the equality test and a balanced binary search tree for the inequality test. Such a separate structure reduces a height of the search tree and the number of comparisons per tree node, as well as the cost for tree rebalancing. We compared our method with the IBS-tree which is one of the popular indexing methods suitable for data stream processing. Our experimental results show that the proposed method provides better insertion and search performances than the IBS-tree.
In processing stream data, time is one of the most significant facts not only because the size of data is dramatically increased but because the characteristics of data is varying over time. To learn stream data evolving over time effectively, it is required to detect the drift of concept. We present a window adaptation function on domain value (WAV) to determine the size of windowed batch for learning algorithms of stream data and a method to detect the change of data characteristics with a criterion function utilizing correlation. When applying our adaptation function to a clustering task on a multi-stream data model, the result of learning synopsis of windowed batch determined by it shows its effectiveness. Our criterion function with correlation information of value distribution over time can be the reasonable threshold to detect the change between windowed batches.
Seunglak CHOI Jinwon LEE Su Myeon KIM Junehwa SONG Yoon-Joon LEE
Most commercial Web sites dynamically generate their contents through a three-tier server architecture composed of a Web server, an application server, and a database server. In such an architecture, the database server easily becomes a bottleneck to the overall performance. In this paper, we propose WDBAccel, a high-performance database server accelerator that significantly improves the throughput of database processing. WDBAccel eliminates costly, complex query processing needed to obtain query results by reusing the results from previous queries for subsequent queries. This differentiates WDBAccel from other database cache systems, which employ traditional query processing. WDBAccel further improves its performance by fully utilizing main memory as the primary storage. This paper presents the design and implementation of the WDBAccel as well as the results of performance evaluation with a prototype.
Chul-Woong YANG Ki Yong LEE Yon Dohn CHUNG Myoung Ho KIM Yoon-Joon LEE
In this paper, we propose an effective Web cache admission control algorithm. By selectively admitting objects into the cache, the proposed algorithm can significantly reduce the amount of disk I/O on a Web cache while maintaining a high hit ratio. The proposed algorithm adaptively adjusts its own admission control parameter, requiring no user-supplied parameters. Through extensive experiments, we show the effectiveness of the proposed algorithm.