1-10hit |
Document merging is essential to synchronizing several versions of a document concurrently edited by two or more users. A few methods for merging structured documents have been proposed so far, and yet the methods may not always merge given documents appropriately. As an aid for finding an appropriate merging, using another approach we propose a polynomial-time algorithm for merging structured documents. In the approach, we merge given two documents (treated as ordered trees) by optimally transforming the documents into isomorphic ones, using operations such as add (add a new node), del (delete an existing node), and upd (make two nodes have the same label).
Nobutaka SUZUKI Yuji FUKUSHIMA Kosetsu IKEDA
In this paper, we consider the XPath satisfiability problem under restricted DTDs called “duplicate free”. For an XPath expression q and a DTD D, q is satisfiable under D if there exists an XML document t such that t is valid against D and that the answer of q on t is nonempty. Evaluating an unsatisfiable XPath expression is meaningless, since such an expression can always be replaced by an empty set without evaluating it. However, it is shown that the XPath satisfiability problem is intractable for a large number of XPath fragments. In this paper, we consider simple XPath fragments under two restrictions: (i) only a label can be specified as a node test and (ii) operators such as qualifier ([]) and path union (∪) are not allowed. We first show that, for some small XPath fragments under the above restrictions, the satisfiability problem is NP-complete under DTDs without any restriction. Then we show that there exist XPath fragments, containing the above small fragments, for which the satisfiability problem is in PTIME under duplicate-free DTDs.
Nobutaka SUZUKI Yoichirou SATO Michiyoshi HAYASE
Semistructured data comprises irregular structure and has no a-priori database schema, therefore we encounter several problems such as inefficient data retrieval and wasteful data storage. To cope with such problems, some schema extraction algorithms over semistructured data have been proposed, in which data is modeled as an unordered tree. However, the order of elements is indispensable for document data, therefore we consider extracting an optimal database schema over an ordered tree. We consider an optimization problem to extract a smallest database schema such that the density of each class is no less than a given threshold, where the density of a class represents a similarity between the type of the class and those of the objects in the class. We first prove that the corresponding decision problem is strongly NP-complete, and show that another version of the problem is strongly NP-hard and belongs to Δ2 P. Then we show that for any r < 3/2, there is no polynomial-time r-approximation algorithm that solves the optimization problem unless P = NP. Finally, we propose a kind of class called bounded class that can be constructed efficiently, then show a polynomial-time algorithm for constructing a database schema by using bounded classes.
Nobutaka SUZUKI Yoichirou SATO Michiyoshi HAYASE
Semistructured data has no a-priori schema information, which causes some problems such as inefficient storage and query execution. To cope with such problems, extracting schema information from semistructured data has been an important issue. However, in most cases optimal schema information cannot be extracted efficiently, and few efficient approximation algorithms have been proposed. In this paper, we consider an approximation algorithm for extracting "typical" classes from semistructured data. Intuitively, a class C is said to be typical if the structure of C is "similar" to those of "many" objects. We present the following results. First, we prove that the problem of deciding if a typical class can be extracted from given semistructured data is NP-complete. Second, we present an approximation algorithm for extracting typical classes from given semistructured data, and show a sufficient condition for the approximation algorithm to run in polynomial time. Finally, by using extracted classes obtained by the approximation algorithm, we propose a polynomial-time algorithm for constructing a set R of classes such that R covers all the objects to form a database schema.
Nobutaka SUZUKI Yuji FUKUSHIMA
Finding an appropriate data transformation between two schemas has been an important problem. In this paper, assuming that an update script between original and updated DTDs is available, we consider inferring a transformation algorithm from the original DTD and the update script such that the algorithm transforms each document valid against the original DTD into a document valid against the updated DTD. We first show a transformation algorithm inferred from a DTD and an update script. We next show a sufficient condition under which the transformation algorithm inferred from a DTD d and an update script is unambiguous, i.e., for any document t valid against d, elements to be deleted/inserted can unambiguously be determined. Finally, we show a polynomial-time algorithm for testing the sufficient condition.
DTDs are continuously updated according to changes in the real world. Let t be an XML document valid against a DTD D, and suppose that D is updated by an update script s. In general, we cannot uniquely "infer" a transformation of t from s, i.e., we cannot uniquely determine the elements in t that should be deleted and/or the positions in t that new elements should be inserted into. In this paper, we consider inferring K optimum transformations of t from s so that a user finds the most desirable transformation more easily. We first show that the problem of inferring K optimum transformations of an XML document from an update script is NP-hard even if K = 1. Then, assuming that an update script is of length one, we show an algorithm for solving the problem, which runs in time polynomial of |D|, |t|, and K.
Nobutaka SUZUKI Kosetsu IKEDA Yeondae KWON
In this paper, we consider solving the all-pairs regular path problem on large graphs efficiently. Let G be a graph and r be a regular path query, and consider finding the answers of r on G. If G is so small that it fits in main memory, it suffices to load entire G into main memory and traverse G to find paths matching r. However, if G is too large and cannot fit in main memory, we need another approach. In this paper, we propose a novel approach based on external memory algorithm. Our algorithm finds the answers matching r by scanning the node list of G sequentially. We made a small experiment, which suggests that our algorithm can solve the problem efficiently.
MathML is a standard markup language for describing math expressions. MathML consists of two sets of elements: Presentation Markup and Content Markup. The former is widely used to display math expressions in Web pages, while the latter is more suited to the calculation of math expressions. In this letter, we focus on the former and consider classifying Presentation MathML expressions. Identifying the classes of given Presentation MathML expressions is helpful for several applications, e.g., Presentation to Content MathML conversion, text-to-speech, and so on. We propose a method for classifying Presentation MathML expressions by using multilayer perceptron. Experimental results show that our method classifies MathML expressions with high accuracy.
For a complex object model, a form of range restriction, called specialization constraint (SC), has been studied. On the other hand, very few models have been proposed that support selective inheritance. In this paper, the following consideration is taken into SCs for a complex object medel suppoorting selective inheritance. A polynomial-time algorithm is given for deciding if a given database schema is well-formed. A sound and complete axiomatization for SCs is presented. A polynomial-time algorithm is given that decides if an SC is a logical consequence of a set of SCs. Finally, another polynomial-time algorithm is given, which decides if there exists a database that contains a given path from a given class.
Nobutaka SUZUKI Takuya OKADA Yeondae KWON
Cascading Style Sheets (CSS) is a popular language for describing the styles of XML documents as well as HTML documents. To resolve conflicts among CSS rules, CSS has a mechanism called specificity. For a DTD D and a CSS code R, due to specificity R may contain “unsatisfiable” rules under D, e.g., rules that are not applied to any element of any document valid for D. In this paper, we consider the problem of detecting unsatisfiable CSS rules under DTDs. We focus on CSS fragments in which descendant, child, adjacent sibling, and general sibling combinators are allowed. We show that the problem is coNP-hard in most cases, even if only one of the four combinators is allowed and under very restricted DTDs. We also show that the problem is in coNP or PSPACE depending on restrictions on DTDs and CSS. Finally, we present four conditions under which the problem can be solved in polynomial time.