The search functionality is under construction.

Author Search Result

[Author] Yang-Sae MOON(12hit)

1-12hit
  • Evaluation of Space Filling Curves for Lower-Dimensional Transformation of Image Histogram Sequences

    Jeonggon LEE  Bum-Soo KIM  Mi-Jung CHOI  Yang-Sae MOON  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E96-D No:10
      Page(s):
    2277-2281

    Histogram sequences represent high-dimensional time-series converted from images by space filling curves (SFCs). To overcome the high-dimensionality nature of histogram sequences (e.g., 106 dimensions for a 1024×1024 image), we often use lower-dimensional transformations, but the tightness of their lower-bounds is highly affected by the types of SFCs. In this paper we attack a challenging problem of evaluating which SFC shows the better performance when we apply the lower-dimensional transformation to histogram sequences. For this, we first present a concept of spatial locality and propose spatial locality preservation metric (SLPM in short). We then evaluate five well-known SFCs from the perspective of SLPM and verify that the evaluation result concurs with the actual transformation performance. Finally, we empirically validate the accuracy of SLPM by providing that the Hilbert-order with the highest SLPM also shows the best performance in k-NN (k-nearest neighbors) search.

  • Fast Density-Based Clustering Using Graphics Processing Units

    Woong-Kee LOH  Yang-Sae MOON  Young-Ho PARK  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E97-D No:5
      Page(s):
    1349-1352

    Due to the recent technical advances, GPUs are used for general applications as well as screen display. Many research results have been proposed to the performance of previous CPU-based algorithms by a few hundred times using the GPUs. In this paper, we propose a density-based clustering algorithm called GSCAN, which reduces the number of unnecessary distance computations using a grid structure. As a result of our experiments, GSCAN outperformed CUDA-DClust [2] and DBSCAN [3] by up to 13.9 and 32.6 times, respectively.

  • Effective Reference Probability Incorporating the Effect of Expiration Time in Web Cache

    Jeong-Joon LEE  Kyu-Young WHANG  Yang-Sae MOON  Eui-Kyung HONG  

     
    PAPER-Databases

      Vol:
    E84-D No:9
      Page(s):
    1184-1197

    Web caching has become an important problem when addressing the performance issues in Web applications. The expiration time of the Web data item is useful a piece of information for performance enhancement in Web caching. In this paper, we introduce the notion of the effective reference probability that incorporates the effect of expiration time for Web caching. For a formal approach, we propose the continuous independent reference model extending the existing independent reference model. Based on this model, we define formally the effective reference probability and derive it theoretically. By simply replacing the reference probability in the existing cache replacement algorithms with the effective reference probability, we can take the effect of expiration time into account. The results of performance experiments show that the replacement algorithms using the effective reference probability always outperform existing ones. In particular, when the cache fraction is 0.05 and data update is comparatively frequent (i.e., the update frequency is more than 1/10 of the reference frequency), the performance is enhanced by more than 30% in LRU-2 and 13% in Aggarwal's method. The results show that the effective reference probability significantly enhances the performance of Web caching when the expiration time is given.

  • Fast Normalization-Transformed Subsequence Matching in Time-Series Databases

    Yang-Sae MOON  Jinho KIM  

     
    PAPER-Data Mining

      Vol:
    E90-D No:12
      Page(s):
    2007-2018

    Normalization transform is known to be very useful for finding the overall trend of time-series data since it enables finding sequences with similar fluctuation patterns. Previous subsequence matching methods with normalization transform, however, would incur index overhead both in storage space and in update maintenance since they should build multiple indexes for supporting query sequences of arbitrary length. To solve this problem, we adopt a single-index approach in the normalization-transformed subsequence matching that supports query sequences of arbitrary length. For the single-index approach, we first provide the notion of inclusion-normalization transform by generalizing the original definition of normalization transform. To normalize a window, the inclusion-normalization transform uses the mean and the standard deviation of a subsequence that includes the window while the original transform uses those of the window itself. Next, we formally prove the correctness of the proposed normalization-transformed subsequence matching method that uses the inclusion-normalization transform. We then propose subsequence matching and index-building algorithms to implement the proposed method. Experimental results for real stock data show that our method improves performance by up to 2.52.8 times compared with the previous method.

  • Linear Detrending Subsequence Matching in Time-Series Databases

    Myeong-Seon GIL  Yang-Sae MOON  Bum-Soo KIM  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E94-D No:4
      Page(s):
    917-920

    Every time-series has its own linear trend, the directionality of a time-series, and removing the linear trend is crucial to get more intuitive matching results. Supporting the linear detrending in subsequence matching is a challenging problem due to the huge number of all possible subsequences. In this paper we define this problem as the linear detrending subsequence matching and propose its efficient index-based solution. To this end, we first present a notion of LD-windows (LD means linear detrending). Using the LD-windows we then present a lower bounding theorem for the index-based matching solution and show its correctness. We next propose the index building and subsequence matching algorithms. We finally show the superiority of the index-based solution.

  • A Data Cleansing Method for Clustering Large-Scale Transaction Databases

    Woong-Kee LOH  Yang-Sae MOON  Jun-Gyu KANG  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E93-D No:11
      Page(s):
    3120-3123

    In this paper, we emphasize the need for data cleansing when clustering large-scale transaction databases and propose a new data cleansing method that improves clustering quality and performance. We evaluate our data cleansing method through a series of experiments. As a result, the clustering quality and performance were significantly improved by up to 165% and 330%, respectively.

  • ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases

    Woong-Kee LOH  Yang-Sae MOON  Heejune AHN  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E94-D No:10
      Page(s):
    2048-2051

    We propose a robust and efficient algorithm called ROCKET for clustering large-scale transaction databases. ROCKET is a divisive hierarchical algorithm that makes the most of recent hardware architecture. ROCKET handles the cases with the small and the large number of similar transaction pairs separately and efficiently. Through experiments, we show that ROCKET achieves high-quality clustering with a dramatic performance improvement.

  • Navigation Stability: A New Isolation Level in ORDBMSs

    Hong-Suk SEO  Kyu-Young WHANG  Yang-Sae MOON  Ji-Woong CHANG  Eui-Kyung HONG  

     
    PAPER-Databases

      Vol:
    E84-D No:9
      Page(s):
    1171-1183

    In order to enhance the performance, many database management systems (DBMSs) execute transactions at isolation level 2 rather than at isolation level 3, the strict two phase locking, even if it sacrifices consistency to a certain degree. Cursor stability, a variant of isolation level 2 in relational DBMSs (RDBMSs), has been widely used as a useful technique for obtaining concurrency achievable at level 2 without much sacrificing consistency. However, cursor stability is much less usable in object-relational DBMSs (ORDBMSs) because navigational applications in ORDBMSs can suffer from critical inconsistency problems such as dangling pointers, lost updates, and reading inconsistent complex objects. In this paper, we propose a new isolation level, navigation stability, that prevents the inconsistency problems of cursor stability for navigational applications, while avoiding significant degradation of the concurrency of level 3. First, we analyze the inconsistency problems of cursor stability for navigational applications. Second, we define navigation stability as an extension of cursor stability and show that it solves those inconsistency problems of cursor stability in ORDBMSs. Third, through extensive simulation, we show that navigation stability significantly enhances the performance compared with level 3. For workloads consisting of transactions of long duration, compared with level 3, the throughput of navigation stability is enhanced by up to 200%; the average response time reduced by as much as 55%; and the abort ratio reduced by as much as 77%. From these results, we conclude that navigation stability is a useful isolation level in ORDBMSs that can be used in place of isolation level 3 to improve the performance and concurrency without significant sacrifice of consistency.

  • A Fast Divide-and-Conquer Algorithm for Indexing Human Genome Sequences

    Woong-Kee LOH  Yang-Sae MOON  Wookey LEE  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E94-D No:7
      Page(s):
    1369-1377

    Since the release of human genome sequences, one of the most important research issues is about indexing the genome sequences, and the suffix tree is most widely adopted for that purpose. The traditional suffix tree construction algorithms suffer from severe performance degradation due to the memory bottleneck problem. The recent disk-based algorithms also provide limited performance improvement due to random disk accesses. Moreover, they do not fully utilize the recent CPUs with multiple cores. In this paper, we propose a fast algorithm based on `divide-and-conquer' strategy for indexing the human genome sequences. Our algorithm nearly eliminates random disk accesses by accessing the disk in the unit of contiguous chunks. In addition, our algorithm fully utilizes the multi-core CPUs by dividing the genome sequences into multiple partitions and then assigning each partition to a different core for parallel processing. Experimental results show that our algorithm outperforms the previous fastest DIGEST algorithm by up to 10.5 times.

  • Fourier Magnitude-Based Privacy-Preserving Clustering on Time-Series Data

    Hea-Suk KIM  Yang-Sae MOON  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E93-D No:6
      Page(s):
    1648-1651

    Privacy-preserving clustering (PPC in short) is important in publishing sensitive time-series data. Previous PPC solutions, however, have a problem of not preserving distance orders or incurring privacy breach. To solve this problem, we propose a new PPC approach that exploits Fourier magnitudes of time-series. Our magnitude-based method does not cause privacy breach even though its techniques or related parameters are publicly revealed. Using magnitudes only, however, incurs the distance order problem, and we thus present magnitude selection strategies to preserve as many Euclidean distance orders as possible. Through extensive experiments, we showcase the superiority of our magnitude-based approach.

  • Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs

    Sung-Hyun SHIN  Yang-Sae MOON  Jinho KIM  Sang-Wook KIM  

     
    PAPER-Database

      Vol:
    E91-D No:6
      Page(s):
    1719-1729

    In recent years, a horizontal table with a large number of attributes is widely used in OLAP or e-business applications to analyze multidimensional data efficiently. For efficient storing and querying of horizontal tables, recent works have tried to transform a horizontal table to a traditional vertical table. Existing works, however, have the drawback of not considering an optimized PIVOT operation provided (or to be provided) in recent commercial RDBMSs. In this paper we propose a formal approach that exploits the optimized PIVOT operation of commercial RDBMSs for storing and querying of horizontal tables. To achieve this goal, we first provide an overall framework that stores and queries a horizontal table using an equivalent vertical table. Under the proposed framework, we then formally define 1) a method that stores a horizontal table in an equivalent vertical table and 2) a PIVOT operation that converts a stored vertical table to an equivalent horizontal view. Next, we propose a novel method that transforms a user-specified query on horizontal tables to an equivalent PIVOT-included query on vertical tables. In particular, by providing transformation rules for all five elementary operations in relational algebra as theorems, we prove our method is theoretically applicable to commercial RDBMSs. Experimental results show that, compared with the earlier work, our method reduces storage space significantly and also improves average performance by several orders of magnitude. These results indicate that our method provides an excellent framework to maximize performance in handling horizontal tables by exploiting the optimized PIVOT operation in commercial RDBMSs.

  • Hybrid Lower-Dimensional Transformation for Similar Sequence Matching

    Yang-Sae MOON  Jinho KIM  

     
    LETTER-Data Mining

      Vol:
    E92-D No:3
      Page(s):
    541-544

    Lower-dimensional transformations in similar sequence matching show different performance characteristics depending on the type of time-series data. In this paper we propose a hybrid approach that exploits multiple transformations at a time in a single hybrid index. This hybrid approach has advantages of exploiting the similar effect of using multiple transformations and reducing the index maintenance overhead. For this, we first propose a new notion of hybrid lower-dimensional transformation that extracts various features using different transformations. We next define the hybrid distance to compute the distance between the hybrid transformed points. We then formally prove that the hybrid approach performs similar sequence matching correctly. We also present the index building and similar sequence matching algorithms based on the hybrid transformation and distance. Experimental results show that our hybrid approach outperforms the single transformation-based approach.