1-3hit |
Ig-hoon LEE Junho SHIM Sang-goo LEE
Rebuilding an index is an essential step for database recovery. Fast recovery of the index is a necessary condition for fast database recovery. The B+-Tree is the most popular index structure in database systems. In this paper, we present a fast B+-Tree rebuilding algorithm called Max-PL. The main idea of Max-PL is that at first it constructs a B+-Tree index structure using the pre-stored max keys of the leaf nodes, and then inserts the keys and data pointers into the index. The algorithm employs a pipelining mechanism for reading the data records and inserting the keys into the index. It also exploits parallelisms in several phases to boost the overall performance. We analyze the time complexity and space requirement of the algorithm, and perform the experimental study to compare its performance to other B+-Trees rebuilding algorithms; Sequential Insertion and Batch-Construction. The results show that our algorithm runs on average at least 670% faster than Sequential Insertion and 200% faster than Batch-Construction.
We propose a new effective method of managing flash memory space for flash memory-specific file systems based on a log-structured file system. Flash memory has attractive features such as non-volatility and fast I/O speed, but it also suffers from inability to update in situ and from limited usage (erase) cycles. These drawbacks necessitate a number of changes to conventional storage (file) management techniques. Our focus is on lowering cleaning cost and evenly utilizing flash memory cells while maintaining a balance between these two often-conflicting goals. The proposed cleaning method performs well especially when storage utilization and the degree of locality are high. The cleaning efficiency is enhanced by dynamically separating cold data and non-cold data, which is called 'collection operation.' The second goal, that of cycle-leveling, is achieved to the degree that the maximum difference between erase cycles is below the error range of the hardware. Experimental results show that the proposed technique provides sufficient performance for reliable flash storage systems.
This paper discusses a new type of semi-supervised document clustering that uses partial supervision to partition a large set of documents. Most clustering methods organizes documents into groups based only on similarity measures. In this paper, we attempt to isolate more semantically coherent clusters by employing the domain-specific knowledge provided by a document analyst. By using external human knowledge to guide the clustering mechanism with some flexibility when creating the clusters, clustering efficiency can be considerably enhanced. Experimental results show that the use of only a little external knowledge can considerably enhance the quality of clustering results that satisfy users' constraint.