IEICE global.ieice.org Site

Author Search Result

[Author] Yang-Sae MOON(12hit)

1-12hit

Evaluation of Space Filling Curves for Lower-Dimensional Transformation of Image Histogram Sequences
Jeonggon LEE Bum-Soo KIM Mi-Jung CHOI Yang-Sae MOON

LETTER-Data Engineering, Web Information Systems

Vol:
E96-D No:10
Page(s):
2277-2281
Histogram sequences represent high-dimensional time-series converted from images by space filling curves (SFCs). To overcome the high-dimensionality nature of histogram sequences (e.g., 106 dimensions for a 1024×1024 image), we often use lower-dimensional transformations, but the tightness of their lower-bounds is highly affected by the types of SFCs. In this paper we attack a challenging problem of evaluating which SFC shows the better performance when we apply the lower-dimensional transformation to histogram sequences. For this, we first present a concept of spatial locality and propose spatial locality preservation metric (SLPM in short). We then evaluate five well-known SFCs from the perspective of SLPM and verify that the evaluation result concurs with the actual transformation performance. Finally, we empirically validate the accuracy of SLPM by providing that the Hilbert-order with the highest SLPM also shows the best performance in k-NN (k-nearest neighbors) search.
Fast Density-Based Clustering Using Graphics Processing Units
Woong-Kee LOH Yang-Sae MOON Young-Ho PARK

LETTER-Artificial Intelligence, Data Mining

Vol:
E97-D No:5
Page(s):
1349-1352
- HTML
- PDF(1MB) >> Buy this Article
- Errata[Uploaded on July 1,2014]
Due to the recent technical advances, GPUs are used for general applications as well as screen display. Many research results have been proposed to the performance of previous CPU-based algorithms by a few hundred times using the GPUs. In this paper, we propose a density-based clustering algorithm called GSCAN, which reduces the number of unnecessary distance computations using a grid structure. As a result of our experiments, GSCAN outperformed CUDA-DClust [2] and DBSCAN [3] by up to 13.9 and 32.6 times, respectively.
Effective Reference Probability Incorporating the Effect of Expiration Time in Web Cache
Jeong-Joon LEE Kyu-Young WHANG Yang-Sae MOON Eui-Kyung HONG

PAPER-Databases

Vol:
E84-D No:9
Page(s):
1184-1197
Web caching has become an important problem when addressing the performance issues in Web applications. The expiration time of the Web data item is useful a piece of information for performance enhancement in Web caching. In this paper, we introduce the notion of the effective reference probability that incorporates the effect of expiration time for Web caching. For a formal approach, we propose the continuous independent reference model extending the existing independent reference model. Based on this model, we define formally the effective reference probability and derive it theoretically. By simply replacing the reference probability in the existing cache replacement algorithms with the effective reference probability, we can take the effect of expiration time into account. The results of performance experiments show that the replacement algorithms using the effective reference probability always outperform existing ones. In particular, when the cache fraction is 0.05 and data update is comparatively frequent (i.e., the update frequency is more than 1/10 of the reference frequency), the performance is enhanced by more than 30% in LRU-2 and 13% in Aggarwal's method. The results show that the effective reference probability significantly enhances the performance of Web caching when the expiration time is given.
Fast Normalization-Transformed Subsequence Matching in Time-Series Databases
Yang-Sae MOON Jinho KIM

PAPER-Data Mining

Vol:
E90-D No:12
Page(s):
2007-2018
Normalization transform is known to be very useful for finding the overall trend of time-series data since it enables finding sequences with similar fluctuation patterns. Previous subsequence matching methods with normalization transform, however, would incur index overhead both in storage space and in update maintenance since they should build multiple indexes for supporting query sequences of arbitrary length. To solve this problem, we adopt a single-index approach in the normalization-transformed subsequence matching that supports query sequences of arbitrary length. For the single-index approach, we first provide the notion of inclusion-normalization transform by generalizing the original definition of normalization transform. To normalize a window, the inclusion-normalization transform uses the mean and the standard deviation of a subsequence that includes the window while the original transform uses those of the window itself. Next, we formally prove the correctness of the proposed normalization-transformed subsequence matching method that uses the inclusion-normalization transform. We then propose subsequence matching and index-building algorithms to implement the proposed method. Experimental results for real stock data show that our method improves performance by up to 2.52.8 times compared with the previous method.
Linear Detrending Subsequence Matching in Time-Series Databases
Myeong-Seon GIL Yang-Sae MOON Bum-Soo KIM

LETTER-Artificial Intelligence, Data Mining

Vol:
E94-D No:4
Page(s):
917-920
Every time-series has its own linear trend, the directionality of a time-series, and removing the linear trend is crucial to get more intuitive matching results. Supporting the linear detrending in subsequence matching is a challenging problem due to the huge number of all possible subsequences. In this paper we define this problem as the linear detrending subsequence matching and propose its efficient index-based solution. To this end, we first present a notion of LD-windows (LD means linear detrending). Using the LD-windows we then present a lower bounding theorem for the index-based matching solution and show its correctness. We next propose the index building and subsequence matching algorithms. We finally show the superiority of the index-based solution.
A Data Cleansing Method for Clustering Large-Scale Transaction Databases
Woong-Kee LOH Yang-Sae MOON Jun-Gyu KANG

LETTER-Data Engineering, Web Information Systems

Vol:
E93-D No:11
Page(s):
3120-3123
In this paper, we emphasize the need for data cleansing when clustering large-scale transaction databases and propose a new data cleansing method that improves clustering quality and performance. We evaluate our data cleansing method through a series of experiments. As a result, the clustering quality and performance were significantly improved by up to 165% and 330%, respectively.
ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases
Woong-Kee LOH Yang-Sae MOON Heejune AHN

LETTER-Artificial Intelligence, Data Mining

Vol:
E94-D No:10
Page(s):
2048-2051
We propose a robust and efficient algorithm called ROCKET for clustering large-scale transaction databases. ROCKET is a divisive hierarchical algorithm that makes the most of recent hardware architecture. ROCKET handles the cases with the small and the large number of similar transaction pairs separately and efficiently. Through experiments, we show that ROCKET achieves high-quality clustering with a dramatic performance improvement.
Navigation Stability: A New Isolation Level in ORDBMSs
Hong-Suk SEO Kyu-Young WHANG Yang-Sae MOON Ji-Woong CHANG Eui-Kyung HONG

PAPER-Databases

Vol:
E84-D No:9
Page(s):
1171-1183
In order to enhance the performance, many database management systems (DBMSs) execute transactions at isolation level 2 rather than at isolation level 3, the strict two phase locking, even if it sacrifices consistency to a certain degree. Cursor stability, a variant of isolation level 2 in relational DBMSs (RDBMSs), has been widely used as a useful technique for obtaining concurrency achievable at level 2 without much sacrificing consistency. However, cursor stability is much less usable in object-relational DBMSs (ORDBMSs) because navigational applications in ORDBMSs can suffer from critical inconsistency problems such as dangling pointers, lost updates, and reading inconsistent complex objects. In this paper, we propose a new isolation level, navigation stability, that prevents the inconsistency problems of cursor stability for navigational applications, while avoiding significant degradation of the concurrency of level 3. First, we analyze the inconsistency problems of cursor stability for navigational applications. Second, we define navigation stability as an extension of cursor stability and show that it solves those inconsistency problems of cursor stability in ORDBMSs. Third, through extensive simulation, we show that navigation stability significantly enhances the performance compared with level 3. For workloads consisting of transactions of long duration, compared with level 3, the throughput of navigation stability is enhanced by up to 200%; the average response time reduced by as much as 55%; and the abort ratio reduced by as much as 77%. From these results, we conclude that navigation stability is a useful isolation level in ORDBMSs that can be used in place of isolation level 3 to improve the performance and concurrency without significant sacrifice of consistency.
A Fast Divide-and-Conquer Algorithm for Indexing Human Genome Sequences
Woong-Kee LOH Yang-Sae MOON Wookey LEE

PAPER-Fundamentals of Information Systems

Vol:
E94-D No:7
Page(s):
1369-1377
Since the release of human genome sequences, one of the most important research issues is about indexing the genome sequences, and the suffix tree is most widely adopted for that purpose. The traditional suffix tree construction algorithms suffer from severe performance degradation due to the memory bottleneck problem. The recent disk-based algorithms also provide limited performance improvement due to random disk accesses. Moreover, they do not fully utilize the recent CPUs with multiple cores. In this paper, we propose a fast algorithm based on `divide-and-conquer' strategy for indexing the human genome sequences. Our algorithm nearly eliminates random disk accesses by accessing the disk in the unit of contiguous chunks. In addition, our algorithm fully utilizes the multi-core CPUs by dividing the genome sequences into multiple partitions and then assigning each partition to a different core for parallel processing. Experimental results show that our algorithm outperforms the previous fastest DIGEST algorithm by up to 10.5 times.
Fourier Magnitude-Based Privacy-Preserving Clustering on Time-Series Data
Hea-Suk KIM Yang-Sae MOON

LETTER-Artificial Intelligence, Data Mining

Vol:
E93-D No:6
Page(s):
1648-1651
Privacy-preserving clustering (PPC in short) is important in publishing sensitive time-series data. Previous PPC solutions, however, have a problem of not preserving distance orders or incurring privacy breach. To solve this problem, we propose a new PPC approach that exploits Fourier magnitudes of time-series. Our magnitude-based method does not cause privacy breach even though its techniques or related parameters are publicly revealed. Using magnitudes only, however, incurs the distance order problem, and we thus present magnitude selection strategies to preserve as many Euclidean distance orders as possible. Through extensive experiments, we showcase the superiority of our magnitude-based approach.
Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs
Sung-Hyun SHIN Yang-Sae MOON Jinho KIM Sang-Wook KIM

PAPER-Database

Vol:
E91-D No:6
Page(s):
1719-1729
In recent years, a horizontal table with a large number of attributes is widely used in OLAP or e-business applications to analyze multidimensional data efficiently. For efficient storing and querying of horizontal tables, recent works have tried to transform a horizontal table to a traditional vertical table. Existing works, however, have the drawback of not considering an optimized PIVOT operation provided (or to be provided) in recent commercial RDBMSs. In this paper we propose a formal approach that exploits the optimized PIVOT operation of commercial RDBMSs for storing and querying of horizontal tables. To achieve this goal, we first provide an overall framework that stores and queries a horizontal table using an equivalent vertical table. Under the proposed framework, we then formally define 1) a method that stores a horizontal table in an equivalent vertical table and 2) a PIVOT operation that converts a stored vertical table to an equivalent horizontal view. Next, we propose a novel method that transforms a user-specified query on horizontal tables to an equivalent PIVOT-included query on vertical tables. In particular, by providing transformation rules for all five elementary operations in relational algebra as theorems, we prove our method is theoretically applicable to commercial RDBMSs. Experimental results show that, compared with the earlier work, our method reduces storage space significantly and also improves average performance by several orders of magnitude. These results indicate that our method provides an excellent framework to maximize performance in handling horizontal tables by exploiting the optimized PIVOT operation in commercial RDBMSs.
Hybrid Lower-Dimensional Transformation for Similar Sequence Matching
Yang-Sae MOON Jinho KIM

LETTER-Data Mining

Vol:
E92-D No:3
Page(s):
541-544
Lower-dimensional transformations in similar sequence matching show different performance characteristics depending on the type of time-series data. In this paper we propose a hybrid approach that exploits multiple transformations at a time in a single hybrid index. This hybrid approach has advantages of exploiting the similar effect of using multiple transformations and reducing the index maintenance overhead. For this, we first propose a new notion of hybrid lower-dimensional transformation that extracts various features using different transformations. We next define the hybrid distance to compute the distance between the hybrid transformed points. We then formally prove that the hybrid approach performs similar sequence matching correctly. We also present the index building and similar sequence matching algorithms based on the hybrid transformation and distance. Experimental results show that our hybrid approach outperforms the single transformation-based approach.

Author Search Result

[Author] Yang-Sae MOON(12hit)

Evaluation of Space Filling Curves for Lower-Dimensional Transformation of Image Histogram Sequences

Fast Density-Based Clustering Using Graphics Processing Units

Effective Reference Probability Incorporating the Effect of Expiration Time in Web Cache

Fast Normalization-Transformed Subsequence Matching in Time-Series Databases

Linear Detrending Subsequence Matching in Time-Series Databases

A Data Cleansing Method for Clustering Large-Scale Transaction Databases

ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases

Navigation Stability: A New Isolation Level in ORDBMSs

A Fast Divide-and-Conquer Algorithm for Indexing Human Genome Sequences

Fourier Magnitude-Based Privacy-Preserving Clustering on Time-Series Data

Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs

Hybrid Lower-Dimensional Transformation for Similar Sequence Matching

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles