Complexity and a Method of Extracting a Database Schema over Semistructured Documents

Nobutaka SUZUKI; Yoichirou SATO; Michiyoshi HAYASE

Complexity and a Method of Extracting a Database Schema over Semistructured Documents

Nobutaka SUZUKI, Yoichirou SATO, Michiyoshi HAYASE

Full Text Views

0

Cite this

Summary :

Semistructured data comprises irregular structure and has no a-priori database schema, therefore we encounter several problems such as inefficient data retrieval and wasteful data storage. To cope with such problems, some schema extraction algorithms over semistructured data have been proposed, in which data is modeled as an unordered tree. However, the order of elements is indispensable for document data, therefore we consider extracting an optimal database schema over an ordered tree. We consider an optimization problem to extract a smallest database schema such that the density of each class is no less than a given threshold, where the density of a class represents a similarity between the type of the class and those of the objects in the class. We first prove that the corresponding decision problem is strongly NP-complete, and show that another version of the problem is strongly NP-hard and belongs to Δ₂ P. Then we show that for any r < 3/2, there is no polynomial-time r-approximation algorithm that solves the optimization problem unless P = NP. Finally, we propose a kind of class called bounded class that can be constructed efficiently, then show a polynomial-time algorithm for constructing a database schema by using bounded classes.

Publication: IEICE TRANSACTIONS on Information Vol.E85-D No.6 pp.940-949

Publication Date: 2002/06/01

Publicized

Online ISSN

DOI

Type of Manuscript: PAPER

Category: Databases

Cite this

Copy

Nobutaka SUZUKI, Yoichirou SATO, Michiyoshi HAYASE, "Complexity and a Method of Extracting a Database Schema over Semistructured Documents" in IEICE TRANSACTIONS on Information, vol. E85-D, no. 6, pp. 940-949, June 2002, doi: .
Abstract: Semistructured data comprises irregular structure and has no a-priori database schema, therefore we encounter several problems such as inefficient data retrieval and wasteful data storage. To cope with such problems, some schema extraction algorithms over semistructured data have been proposed, in which data is modeled as an unordered tree. However, the order of elements is indispensable for document data, therefore we consider extracting an optimal database schema over an ordered tree. We consider an optimization problem to extract a smallest database schema such that the density of each class is no less than a given threshold, where the density of a class represents a similarity between the type of the class and those of the objects in the class. We first prove that the corresponding decision problem is strongly NP-complete, and show that another version of the problem is strongly NP-hard and belongs to Δ₂ P. Then we show that for any r < 3/2, there is no polynomial-time r-approximation algorithm that solves the optimization problem unless P = NP. Finally, we propose a kind of class called bounded class that can be constructed efficiently, then show a polynomial-time algorithm for constructing a database schema by using bounded classes.
URL: https://global.ieice.org/en_transactions/information/10.1587/e85-d_6_940/_p

Copy

@ARTICLE{e85-d_6_940,
author={Nobutaka SUZUKI, Yoichirou SATO, Michiyoshi HAYASE, },
journal={IEICE TRANSACTIONS on Information},
title={Complexity and a Method of Extracting a Database Schema over Semistructured Documents},
year={2002},
volume={E85-D},
number={6},
pages={940-949},
abstract={Semistructured data comprises irregular structure and has no a-priori database schema, therefore we encounter several problems such as inefficient data retrieval and wasteful data storage. To cope with such problems, some schema extraction algorithms over semistructured data have been proposed, in which data is modeled as an unordered tree. However, the order of elements is indispensable for document data, therefore we consider extracting an optimal database schema over an ordered tree. We consider an optimization problem to extract a smallest database schema such that the density of each class is no less than a given threshold, where the density of a class represents a similarity between the type of the class and those of the objects in the class. We first prove that the corresponding decision problem is strongly NP-complete, and show that another version of the problem is strongly NP-hard and belongs to Δ₂ P. Then we show that for any r < 3/2, there is no polynomial-time r-approximation algorithm that solves the optimization problem unless P = NP. Finally, we propose a kind of class called bounded class that can be constructed efficiently, then show a polynomial-time algorithm for constructing a database schema by using bounded classes.},
keywords={},
doi={},
ISSN={},
month={June},}

Copy

TY - JOUR
TI - Complexity and a Method of Extracting a Database Schema over Semistructured Documents
T2 - IEICE TRANSACTIONS on Information
SP - 940
EP - 949
AU - Nobutaka SUZUKI
AU - Yoichirou SATO
AU - Michiyoshi HAYASE
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E85-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2002
AB - Semistructured data comprises irregular structure and has no a-priori database schema, therefore we encounter several problems such as inefficient data retrieval and wasteful data storage. To cope with such problems, some schema extraction algorithms over semistructured data have been proposed, in which data is modeled as an unordered tree. However, the order of elements is indispensable for document data, therefore we consider extracting an optimal database schema over an ordered tree. We consider an optimization problem to extract a smallest database schema such that the density of each class is no less than a given threshold, where the density of a class represents a similarity between the type of the class and those of the objects in the class. We first prove that the corresponding decision problem is strongly NP-complete, and show that another version of the problem is strongly NP-hard and belongs to Δ₂ P. Then we show that for any r < 3/2, there is no polynomial-time r-approximation algorithm that solves the optimization problem unless P = NP. Finally, we propose a kind of class called bounded class that can be constructed efficiently, then show a polynomial-time algorithm for constructing a database schema by using bounded classes.
ER -