The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Woong-Kee LOH(8hit)

1-8hit
  • A Data Cleansing Method for Clustering Large-Scale Transaction Databases

    Woong-Kee LOH  Yang-Sae MOON  Jun-Gyu KANG  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E93-D No:11
      Page(s):
    3120-3123

    In this paper, we emphasize the need for data cleansing when clustering large-scale transaction databases and propose a new data cleansing method that improves clustering quality and performance. We evaluate our data cleansing method through a series of experiments. As a result, the clustering quality and performance were significantly improved by up to 165% and 330%, respectively.

  • ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases

    Woong-Kee LOH  Yang-Sae MOON  Heejune AHN  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E94-D No:10
      Page(s):
    2048-2051

    We propose a robust and efficient algorithm called ROCKET for clustering large-scale transaction databases. ROCKET is a divisive hierarchical algorithm that makes the most of recent hardware architecture. ROCKET handles the cases with the small and the large number of similar transaction pairs separately and efficiently. Through experiments, we show that ROCKET achieves high-quality clustering with a dramatic performance improvement.

  • A Fast Divide-and-Conquer Algorithm for Indexing Human Genome Sequences

    Woong-Kee LOH  Yang-Sae MOON  Wookey LEE  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E94-D No:7
      Page(s):
    1369-1377

    Since the release of human genome sequences, one of the most important research issues is about indexing the genome sequences, and the suffix tree is most widely adopted for that purpose. The traditional suffix tree construction algorithms suffer from severe performance degradation due to the memory bottleneck problem. The recent disk-based algorithms also provide limited performance improvement due to random disk accesses. Moreover, they do not fully utilize the recent CPUs with multiple cores. In this paper, we propose a fast algorithm based on `divide-and-conquer' strategy for indexing the human genome sequences. Our algorithm nearly eliminates random disk accesses by accessing the disk in the unit of contiguous chunks. In addition, our algorithm fully utilizes the multi-core CPUs by dividing the genome sequences into multiple partitions and then assigning each partition to a different core for parallel processing. Experimental results show that our algorithm outperforms the previous fastest DIGEST algorithm by up to 10.5 times.

  • An Object-Oriented Hypermedia System Based on the Dexter Reference Model and the MHEG Standard

    Byung-Kwon PARK  Woong-Kee LOH  Jeong-Joon LEE  Chong-Mok PARK  Kyu-Young WHANG  

     
    PAPER

      Vol:
    E79-D No:6
      Page(s):
    687-694

    In this paper, we describe the design and implementation of a hypermedia system that has the following characteristics. First, being designed according to the Dexter hypertext reference model, it has a layered architecture and thus maintains commonality with other hypermedia systems based on the Dextermodel. Second, being designed based on the MHEG standard, it has data structures that are inherently suitable for data interchange and synchronization. Third, adopting the MIME protocol, it provides multimedia mail services. Finally, being built on top of an object-oriented DBMS, it makes it easy to represent Dexter and MHEG models and also provides efficient storage and search capabilities. The contribution of this paper is combining these characteristics to build an integrated hypermedia system reflecting reference architectures and international standard efforts.

  • A Storage-Efficient Suffix Tree Construction Algorithm for Human Genome Sequences

    Woong-Kee LOH  Heejune AHN  

     
    LETTER-Biological Engineering

      Vol:
    E94-D No:12
      Page(s):
    2557-2560

    The suffix tree is one of most widely adopted indexes in the application of genome sequence alignment. Although it supports very fast alignment, it has a couple of shortcomings, such as a very long construction time and a very large volume size. Loh et al. [7] proposed a suffix tree construction algorithm with dramatically improved performance; however, the size still remains as a challenging problem. We propose an algorithm by extending the one by Loh et al. to reduce the suffix tree size. As a result of our experiments, our algorithm constructed a suffix tree of approximately 60% of the size within almost the same time period.

  • Index Interpolation: A Subsequence Matching Algorithm Supporting Moving Average Transform of Arbitrary Order in Time-Series Databases

    Woong-Kee LOH  Sang-Wook KIM  Kyu-Young WHANG  

     
    PAPER-Databases

      Vol:
    E84-D No:1
      Page(s):
    76-86

    In this paper we propose a subsequence matching algorithm that supports moving average transform of arbitrary order in time-series databases. Moving average transform reduces the effect of noise and has been used in many areas such as econometrics since it is useful in finding the overall trends. The proposed algorithm extends the existing subsequence matching algorithm proposed by Faloutsos et al. (SUB94 in short). If we applied the algorithm without any extension, we would have to generate an index for each moving average order and would have serious storage and CPU time overhead. In this paper we tackle the problem using the notion of index interpolation. Index interpolation is defined as a searching method that uses one or more indexes generated for a few selected cases and performs searching for all the cases satisfying some criteria. The proposed algorithm, which is based on index interpolation, can use only one index for a pre-selected moving average order k and performs subsequence matching for arbitrary order m ( k). We prove that the proposed algorithm causes no false dismissal. The proposed algorithm can also use more than one index to improve search performance. The algorithm works better with smaller selectivities. For selectivities less than 10-2, the degradation of search performance compared with the fully-indexed case--which is equivalent to SUB94--is no more than 33.0% when one index is used, and 17.2% when two indexes are used. Since the queries with smaller selectivities are much more frequent in general database applications, the proposed algorithm is suitable for practical situations.

  • A Fast Parallel Algorithm for Indexing Human Genome Sequences

    Woong-Kee LOH  Kyoung-Soo HAN  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E97-D No:5
      Page(s):
    1345-1348

    A suffix tree is widely adopted for indexing genome sequences. While supporting highly efficient search, the suffix tree has a few shortcomings such as very large size and very long construction time. In this paper, we propose a very fast parallel algorithm to construct a disk-based suffix tree for human genome sequences. Our algorithm constructs a suffix array for part of the suffixes in the human genome sequence and then converts it into a suffix tree very quickly. It outperformed the previous algorithms by Loh et al. and Barsky et al. by up to 2.09 and 3.04 times, respectively.

  • Fast Density-Based Clustering Using Graphics Processing Units

    Woong-Kee LOH  Yang-Sae MOON  Young-Ho PARK  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E97-D No:5
      Page(s):
    1349-1352

    Due to the recent technical advances, GPUs are used for general applications as well as screen display. Many research results have been proposed to the performance of previous CPU-based algorithms by a few hundred times using the GPUs. In this paper, we propose a density-based clustering algorithm called GSCAN, which reduces the number of unnecessary distance computations using a grid structure. As a result of our experiments, GSCAN outperformed CUDA-DClust [2] and DBSCAN [3] by up to 13.9 and 32.6 times, respectively.