Yusuke KOZAWA Toshiyuki AMAGASA Hiroyuki KITAGAWA
Probabilistic frequent itemset mining, which discovers frequent itemsets from uncertain data, has attracted much attention due to inherent uncertainty in the real world. Many algorithms have been proposed to tackle this problem, but their performance is not satisfactory because handling uncertainty incurs high processing cost. To accelerate such computation, we utilize GPUs (Graphics Processing Units). Our previous work accelerated an existing algorithm with a single GPU. In this paper, we extend the work to employ multiple GPUs. Proposed methods minimize the amount of data that need to be communicated among GPUs, and achieve load balancing as well. Based on the methods, we also present algorithms on a GPU cluster. Experiments show that the single-node methods realize near-linear speedups, and the methods on a GPU cluster of eight nodes achieve up to a 7.1 times speedup.
Tingting DONG Chuan XIAO Yoshiharu ISHIKAWA
Probabilistic range query is an important type of query in the area of uncertain data management. A probabilistic range query returns all the data objects within a specific range from the query object with a probability no less than a given threshold. In this paper, we assume that each uncertain object stored in the database is associated with a multi-dimensional Gaussian distribution, which describes the probability distribution that the object appears in the multi-dimensional space. A query object is either a certain object or an uncertain object modeled by a Gaussian distribution. We propose several filtering techniques and an R-tree-based index to efficiently support probabilistic range queries over Gaussian objects. Extensive experiments on real data demonstrate the efficiency of our proposed approach.
As data volumes explode, data storage costs become a large fraction of total IT costs. We can reduce the costs substantially by using compression. However, it is generally known that database compression is not suitable for write-intensive workloads. In this paper, we provide a comprehensive solution to improve the performance of compressed databases for write-intensive OLTP workloads. We find that storing data too densely in compressed pages incurs many future page splits, which require exclusive locks. In order to avoid lock contention, we reduce page splits by sacrificing a couple of percent of space savings. We reserve enough space in each compressed page for future updates of records and prevent page merges that are prone to incur page splits in the near future. The experimental results using TPC-C benchmark and MySQL/InnoDB show that our method gives 1.5 times higher throughput with 33% space savings compared with the uncompressed counterpart and 1.8 times higher throughput with only 1% more space compared with the state-of-the-art compression method developed by Facebook.
This paper summarizes the current status of regulations, standardization efforts and trials around the world regarding white space (WS) communications, especially television band WS (TVWS). After defining WS communication systems configurations and function and the categories of white space database, the TVWS regulations in United States, United Kingdom, and Japan are summarized. Then regarding status of standardization for TVWS devices, IEEE 802 and IEEE 1900 standards are summarized. Finally ongoing pilot projects and trials of WS communications in the world are summarized, and trends and future direction of research on WS communication systems are summarized.
Boseon YU Hyunduk KIM Wonik CHOI Dongseop KWON
Recently, various research efforts have been conducted to develop strategies for accelerating multi-dimensional query processing using the graphics processing units (GPUs). However, well-known multi-dimensional access methods such as the R-tree, B-tree, and their variants are hardly applicable to GPUs in practice, mainly due to the characteristics of a hierarchical index structure. More specifically, the hierarchical structure not only causes frequent transfers of small volumes of data but also provides limited opportunity to exploit the advanced data parallelism of GPUs. To address these problems, we propose an approach that uses GPUs as a buffer. The main idea is that object entries in recently visited leaf nodes are buffered in the global memory of GPUs and processed by massive parallel threads of the GPUs. Through extensive performance studies, we observed that the proposed approach achieved query performance up to five times higher than that of the original R-tree.
Mi-Young CHOI Chang-Joo MOON Doo-Kwon BAIK
The Semantic Web uses RDF/RDFS, which can enable a machine to understand web data without human interference. But most web data is not available in RDF/RDFS documents because most web data is still stored in databases. It is much more favorable to use stored data in a database to build the Semantic Web. This paper proposes an enhanced relational RDF/RDFS interoperable data model (ER2iDM) and a transformation procedure from relational data model (RDM) to RDF/RDFS based on ER2iDM. The ER2iDM is a data model that plays the role of an inter-mediator between RDM and RDF/RDFS during a transformation procedure. The data and schema information in the database are migrated to the ER2iDM according to the proposed translation procedures without incurring loss of meaning of the entities, relationships, and data. The RDF/RDFS generation tool makes a RDF/RDFS XML document automatically from the ER2iDM. The proposed ER2iDM and transformation procedure provides detailed guidelines for transformation from RDM to RDF/RDFS unlike existing studies; therefore, we can more efficiently build up the Semantic Web using database stored data.
Chittaphone PHONHARATH Kenji HASHIMOTO Hiroyuki SEKI
We study a static analysis problem on k-secrecy, which is a metric for the security against inference attacks on XML databases. Intuitively, k-secrecy means that the number of candidates of sensitive data of a given database instance or the result of unauthorized query cannot be narrowed down to k-1 by using available information such as authorized queries and their results. In this paper, we investigate the decidability of the schema k-secrecy problem defined as follows: for a given XML database schema, an authorized query and an unauthorized query, decide whether every database instance conforming to the given schema is k-secret. We first show that the schema k-secrecy problem is undecidable for any finite k>1 even when queries are represented by a simple subclass of linear deterministic top-down tree transducers (LDTT). We next show that the schema ∞-secrecy problem is decidable for queries represented by LDTT. We give an algorithm for deciding the schema ∞-secrecy problem and analyze its time complexity. We show the schema ∞-secrecy problem is EXPTIME-complete for LDTT. Moreover, we show similar results LDTT with regular look-ahead.
The performance of a mobile database management system (DBMS) in which most queries are made up of random data accesses if the NAND flash memory is used as storage media of the DBMS is degraded. The reason for this is that the performance of NAND flash memory is good for writing sequentially but poor when writing randomly. Thus, a new storage structure and querying policies are needed in mobile DBMS when flash memory is used as the storage media. In this letter, we propose a new policy of database page management to enhance the frequent random update performance, and then evaluate the performance experimentally.
Xiangxu MENG Xiaodong WANG Xinye LIN
The GPS trajectory databases serve as bases for many intelligent applications that need to extract some trajectories for future processing or mining. When doing such tasks, spatio-temporal range queries based methods, which find all sub-trajectories within the given spatial extent and time interval, are commonly used. However, the history trajectory indexes of such methods suffer from two problems. First, temporal and spatial factors are not considered simutaneously, resulting in low performance when processing spatio-temporal queries. Second, the efficiency of indexes is sensitive to query size. The query performance changes dramatically as the query size changed. This paper proposes workload-aware Adaptive OcTree based Trajectory clustering Index (ATTI) aiming at optimizing trajectory storage and index performance. The contributions are three-folds. First, the distribution and time delay of the trajectory storage are introduced into the cost model of spatio-temporal range query; Second, the distribution of spatial division is dynamically adjusted based on GPS update workload; Third, the query workload adaptive mechanism is proposed based on virtual OcTree forest. A wide range of experiments are carried out over Microsoft GeoLife project dataset, and the results show that query delay of ATTI could be about 50% shorter than that of the nested index.
Song-Hyon KIM Inchul SONG Kyong-Ha LEE Yoon-Joon LEE
Subgraph matching is a fundamental operation for querying graph-structured data. Due to potential errors and noises in real-world graph data, exact subgraph matching is sometimes inappropriate in practice. In this paper we consider an approximate subgraph matching model that allows missing edges. Based on this model, approximate subgraph matching finds all occurrences of a given query graph in a database graph, allowing missing edges. A straightforward approach is to first generate query subgraphs of a given query graph by deleting edges and then perform exact subgraph matching for each query subgraph. In this paper we propose a sharing-based approach to approximate subgraph matching, called SASUM. Our method is based on the fact that query subgraphs are highly overlapped. Due to this overlapping nature of query subgraphs, the matches of a query subgraph can be computed from the matches of a smaller query subgraph, which results in reducing the number of query subgraphs that require expensive exact subgraph matching. Our method uses a lattice framework to identify sharing opportunities between query subgraphs. To further reduce the number of graphs that need exact subgraph matching, SASUM generates small base graphs that are shared by query subgraphs and chooses the minimum number of base graphs whose matches are used to derive the matching results of all query subgraphs. A comprehensive set of experiments shows that our approach outperforms the state-of-the-art approach by orders of magnitude in terms of query execution time.
Yukihiko SHIGESADA Shinsuke KOBAYASHI Noboru KOSHIZUKA Ken SAKAMURA
Context awareness is one of the ultimate goals of ubiquitous computing, and spatial information plays an important role in building context awareness. In this paper, we propose a new interoperable spatial information model, which is based on ucode relation (ucR) and Place Identifier (PI), for realizing ubiquitous spatial infrastructure. In addition, we propose a design environment for spatial information database using our model. Our model is based on ucode and its relation. ucode is 128 bits number and the number itself has no meaning. Hence, it is difficult to manage the relation between ucodes without using a tool. Our design environment provides to describe connection between each ucode visually and is able to manipulate data using the target space map interactively. To evaluate the proposed model and environment, we designed three spaces using our tool. In addition, we developed a web application using our spatial model. From evaluation, we have been showed that our model is effective and our design environment is useful to develop our spatial information model.
Naoto KIRIBUCHI Ryo KATO Tsukasa ENDO Takashi NISHIDE Hiroshi YOSHIURA
It is becoming more and more important to make use of personal or classified information while keeping it confidential. A promising tool for meeting this challenge is secure multi-party computation (MPC). It enables multiple parties, each given a snippet of a secret s, to compute a function f(s) by communicating with each other without revealing s. However, one of the biggest problems with MPC is that it requires a vast amount of communication. Much research has gone into making each protocol (equality testing, interval testing, etc.) more efficient. In this work, we make a set of multiple protocols more efficient by transforming them into their equivalent batch processing form and propose two protocols: “Batch Logical OR” and “Batch Logical AND.” Using proposed protocols recursively, we also propose “Batch Logical OR-AND” and “Batch Logical AND-OR,” and show arbitrary formula consisting of Boolean protocols, OR gates, and AND gates can be batched. Existing logical OR and logical AND protocols consisting of t equality testing invocations have a communication complexity of O(
Doo Hwa HONG June Sig SUNG Kyung Hwan OH Nam Soo KIM
Decision tree-based clustering and parameter estimation are essential steps in the training part of an HMM-based speech synthesis system. These two steps are usually performed based on the maximum likelihood (ML) criterion. However, one of the drawbacks of the ML criterion is that it is sensitive to outliers which usually result in quality degradation of the synthesized speech. In this letter, we propose an approach to detect and remove outliers for HMM-based speech synthesis. Experimental results show that the proposed approach can improve the synthetic speech, particularly when the available training speech database is insufficient.
Amril SYALIM Takashi NISHIDE Kouichi SAKURAI
Recently, there is much concern about the provenance of distributed processes, that is about the documentation of the origin and the processes to produce an object in a distributed system. The provenance has many applications in the forms of medical records, documentation of processes in the computer systems, recording the origin of data in the cloud, and also documentation of human-executed processes. The provenance of distributed processes can be modeled by a directed acyclic graph (DAG) where each node represents an entity, and an edge represents the origin and causal relationship between entities. Without sufficient security mechanisms, the provenance graph suffers from integrity and confidentiality problems, for example changes or deletions of the correct nodes, additions of fake nodes and edges, and unauthorized accesses to the sensitive nodes and edges. In this paper, we propose an integrity mechanism for provenance graph using the digital signature involving three parties: the process executors who are responsible in the nodes' creation, a provenance owner that records the nodes to the provenance store, and a trusted party that we call the Trusted Counter Server (TCS) that records the number of nodes stored by the provenance owner. We show that the mechanism can detect the integrity problem in the provenance graph, namely unauthorized and malicious “authorized” updates even if all the parties, except the TCS, collude to update the provenance. In this scheme, the TCS only needs a very minimal storage (linear with the number of the provenance owners). To protect the confidentiality and for an efficient access control administration, we propose a method to encrypt the provenance graph that allows access by paths and compartments in the provenance graph. We argue that encryption is important as a mechanism to protect the provenance data stored in an untrusted environment. We analyze the security of the integrity mechanism, and perform experiments to measure the performance of both mechanisms.
Kenji HASHIMOTO Hiroto KAWAI Yasunori ISHIHARA Toru FUJIWARA
This paper discusses verification of the security against inference attacks on XML databases in the presence of a functional dependency. So far, we have provided the verification method for k-secrecy, which is a metric for the security against inference attacks on databases. Intuitively, k-secrecy means that the number of candidates of sensitive data (i.e., the result of unauthorized query) of a given database instance cannot be narrowed down to k-1 by using available information such as authorized queries and their results. In this paper, we consider a functional dependency on database instances as one of the available information. Functional dependencies help attackers to reduce the number of the candidates for the sensitive information. The verification method we have provided cannot be naively extended to the k-secrecy problem with a functional dependency. The method requires that the candidate set can be captured by a tree automaton, but the candidate set when a functional dependency is considered cannot be always captured by any tree automaton. We show that the ∞-secrecy problem in the presence of a functional dependency is decidable when a given unauthorized query is represented by a deterministic topdown tree transducer, without explicitly computing the candidate set.
Shelly SACHDEVA Daigo YAGINUMA Wanming CHU Subhash BHALLA
Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.
Ha-Nguyen TRAN Chen SUN Yohannes D. ALEMSEGED Hiroshi HARADA
This paper presents the efficiency of a sensing database and caching (SDB) for cognitive radio systems. The proposed SDB stores regulatory information from regulatory databases, and contains sensing information by distributed sensing schemes. Preliminary information processing for instance indexing, sorting, or applying some models or algorithms, etc. can be performed for the stored data. Available information and the results of the information processing are provided to cognitive radios in order to determine available spectrum and to facilitate dynamic spectrum access at lower sensing cost but higher sensing quality. The SDB is implemented in local networks, therefore information exchange between SDB and the cognitive radios can be realized at low latency and the amount of signaling traffic to global network can be reduced. This paper analyzes the effect of SDB and the performance evaluation was done in a certain condition. As a result, by deploying SDB a system can achieve up to 20% of reduction of sensing activities and maximum 1.3 times higher sensing quality.
Shayma ALKOBAISI Wan D. BAE Sada NARAYANAPPA
The increase in the advanced location based services such as traffic coordination and management necessitates the need for advanced models tracking the positions of Moving Objects (MOs) like vehicles. Due to computer processing limitations, it is impossible for MOs to continuously update their locations. This results in the uncertainty nature of a MO's location between any two reported positions. Efficiently managing and quantifying the uncertainty regions of MOs are needed in order to support different types of queries and to improve query response time. This challenging problem of modeling uncertainty regions associated with MO was recently addressed by researchers and resulted in models that ranged from linear which require few properties of MOs as input to the models, to non-linear that are able to more accurately represent uncertainty regions by considering higher degree input. This paper summarizes and discusses approaches in modeling uncertainty regions associated with MOs. It further illustrates the need for appropriate approximations especially in the case of non-linear models as the uncertainty regions become rather irregularly shaped and difficult to manage. Finally, we demonstrate through several experimental sets the advantage of non-linear models over linear models when the uncertainty regions of MOs are approximated by two different approximations; the Minimum Bounding Box (MBB) and the Tilted Minimum Bounding Box (TMBB).
Woong-Kee LOH Yang-Sae MOON Heejune AHN
We propose a robust and efficient algorithm called ROCKET for clustering large-scale transaction databases. ROCKET is a divisive hierarchical algorithm that makes the most of recent hardware architecture. ROCKET handles the cases with the small and the large number of similar transaction pairs separately and efficiently. Through experiments, we show that ROCKET achieves high-quality clustering with a dramatic performance improvement.
Yoo Rhee OH Yong Guk KIM Mina KIM Hong Kook KIM Mi Suk LEE Hyun Joo BAE
In this paper, we propose a text corpus design method for a Korean stereo super-wideband speech database. Since a small-sized text corpus for speech coding is generally required for speech coding, the corpus should be designed to comply with the pronunciation behavior of natural conversation in order to ensure efficient speech quality tests. To this end, the proposed design method utilizes a similarity measure between the phoneme distribution occurring from natural conversation and that from the designed text corpus. In order to achieve this goal, we first collect and refine text data from textbooks and websites. Next, a corpus is designed from the refined text data based on the similarity measure to compare phoneme distributions. We then construct a Korean stereo super-wideband speech (K-SW) database using the designed text corpus, where the recording environment is set to meet the conditions defined by ITU-T. Finally, the subjective quality of the K-SW database is evaluated using an ITU-T super-wideband codec in order to demonstrate that the K-SW database is useful for developing and evaluating super-wideband codecs.