1-3hit |
Masahiro ISHIKAWA Kazutaka FURUSE Hanxiong CHEN Nobuo OHBO
Clustering is one of the most important topics in the field of knowledge discovery from databases. Especially, hierarchical clustering is useful since it gives a hierarchical view of a whole database and can be used to guide users in browsing a huge database. In many cases, clustering can be modeled as a graph partitioning problem. When an appropriate distance function between database objects is given, a database can be viewed as an edge-weighted complete graph, where vertices are database objects and weights of edges are distances between them. Then a process of MST (Minimal Spanning Tree) construction can be viewed as a process of a single-linkage agglomerative clustering process for database objects. In this paper, we propose an efficient MST construction method for a large complete metric graph, which is derived from a database with a metric distance function defined on it. Our method utilizes a metric index to reduce the number of distance calculations. The basic idea is to exclude those edges less probable to be a part of an MST by using the metric postulate. For this purpose, we introduce a new metric index named MetricMatrix. Experimental results show that our method can drastically reduce the number of distance calculations needed for MST construction in comparison with the classical method.
Yaxin LI Hiroyuki KITAGAWA Nobuo OHBO
Nested relational models were proposed as natural extensions of the relational model to support new emerging database applications. Prototype implementations of nested relational database systems (NRDBSs) have been done by some research groups. However, there remain many research issues on nested relations. One important issue is query processing, in particular query optimization. In NRDBSs, efficient execution of queries involving hierarchical data structures inherent in nested relations is required. In this paper, we focus on two join-type operations on nested relations: nested join and embed, and propose an algorithm to derive a cost optimal execution sequence of nested joins and embeds for a given query graph. The cost optimality of the derived sequence is formally proved. The complexity of the algorithm is proved to be O(N 2), when N nested relations are included in the query graph.
ADTs (Abstract Data Types) have been known as a promising feature for extending the database applications to CAD/CAM and other engineering areas. This extension has brought a new dimension to query optimization. Conventional query optimization methods, which considers only joins as the dominant cost factor, are based on the belief that the executions of selections and projections basically take no time. However, in databases that support ADTs, this may not be true since the execution of a selection involving ADT functions may be very time-cosuming. Thus selections with ADT functions should not be considered as inexpensive operations in queries, and the conventional optimization heuristics should be enhanced to correspond to the appearance of the queries of this kind. In this paper, we show the possibility that semijoins can be used as an effective means to reduce the number of evaluations of an ADT function and consequently optimize queries containing expensive ADT selections. We suggest the enhancement of an conventional optimization heuristics by adding a semijoins pre-stage which is an additional component corresponding to expensive ADT selections. By this way, the applicable range of the conventional heuristics are extended to hold the ability of handling queries with ADT functions. Several optimization algorithms are given and some simulation results show the effectiveness of our methods.