Junya SHIMIZU Yixin DIAO Maheswaran SURENDRA
One of the system greatly affecting the performance of a database server is the size-division of buffer pools. This letter proposes an adaptive control method of the buffer pool sizes. This method obtains the nearly optimal division using only observed response times in a comparatively short duration.
Satoshi NAKAYAMA Maki YOSHIDA Shingo OKAMURA Toru FUJIWARA
Data retrieval is used to obtain a particular data item from a database. A user requests an item in the database from a database server by sending a query, and obtains the item from an answer to the query. Security requirements of data retrieval include protecting the privacy of the user, the secrecy of the database, and the consistency of answers. In this paper, a data retrieval scheme which satisfies all the security requirements is defined and an efficient construction is proposed. In the proposed construction, the size of a query and an answer is O((log N)2), and the size of data published by the database server when the database is updated is only O(1). The proposed construction uses the Merkle tree, a commitment scheme, and Oblivious Transfer. The proof of the security is given under the assumption that the used cryptographic schemes are secure.
Masakiyo FUJIMOTO Kazuya TAKEDA Satoshi NAKAMURA
This paper introduces a common database, an evaluation framework, and its baseline recognition results for in-car speech recognition, CENSREC-3, as an outcome of the IPSJ-SIG SLP Noisy Speech Recognition Evaluation Working Group. CENSREC-3, which is a sequel to AURORA-2J, has been designed as the evaluation framework of isolated word recognition in real car-driving environments. Speech data were collected using two microphones, a close-talking microphone and a hands-free microphone, under 16 carefully controlled driving conditions, i.e., combinations of three car speeds and six car conditions. CENSREC-3 provides six evaluation environments designed using speech data collected in these conditions.
Toshiyuki MIYAMOTO Yasuhiro MORITA Sadatoshi KUMAGAI
Secret sharing is a method for distributing a secret among a party of participants. Each of them is allocated a share of the secret, and the secret can only be reconstructed when the shares are combined together. We have been proposing a secret sharing distributed database system (SSDDB) that uses a secret sharing scheme to improve confidentiality and robustness of distributed database systems. This paper proposes a vertical partitioning algorithm for the SSDDB, and evaluates the algorithm by computational experiments.
Seunglak CHOI Jinwon LEE Su Myeon KIM Junehwa SONG Yoon-Joon LEE
Most commercial Web sites dynamically generate their contents through a three-tier server architecture composed of a Web server, an application server, and a database server. In such an architecture, the database server easily becomes a bottleneck to the overall performance. In this paper, we propose WDBAccel, a high-performance database server accelerator that significantly improves the throughput of database processing. WDBAccel eliminates costly, complex query processing needed to obtain query results by reusing the results from previous queries for subsequent queries. This differentiates WDBAccel from other database cache systems, which employ traditional query processing. WDBAccel further improves its performance by fully utilizing main memory as the primary storage. This paper presents the design and implementation of the WDBAccel as well as the results of performance evaluation with a prototype.
Yoshitaka FUJIWARA Yoshiaki OHNISHI Hideki YOSHIDA
This paper presents a method for tuning the structure of a causal network (CN) to evaluate a learner's profile for a learning assistance system that employs hierarchically structured learning material. The method uses as an initial CN structure causally related inter-node paths that explicitly define the learning material structure. Then, based on this initial structure other inter-node paths (sideway paths) not present in the initial CN structure are inferred by referring to the learner's database generated through the use of a learning assistance system. An evaluation using simulation indicates that the method has an inference probability of about 63% and an inference accuracy of about 30%.
Kiyoshi HOSHINO Takanobu TANIMOTO
The hand posture estimation system by searching a similar image from a vast database, such as our previous research, may cause the increase of processing time, and prevent realtime controlling of a robot. In this study, the authors proposed a new estimation method of human hand posture by rearranging a large-scale database with the Self-Organizing Map including self-reproduction and self-annihilation, which enables two-step searches of similar image with short period of processing time, within small errors, and without deviation of search time. The experimental results showed that our system exhibited good performance with high accuracy within processing time above 50 fps for each image input with a 2.8 GHz CPU PC.
Fitri ARNIA Ikue IIZUKA Masaaki FUJIYOSHI Hitoshi KIYA
Two schemes for fast identification of JPEG coded images are proposed in this paper. The aim is to identify the JPEG images that are generated from the same original image and have equivalent or different compression ratios. Fast identification can be achieved since the schemes work on the quantized Discrete Cosine Transform (DCT) domain. It is not required to inverse the quantization and the DCT. Moreover, only a few coefficients are commonly required for identification. The first approach can avoid identification leakage or false negative (FN), and probably result in a few false positives (FP). The second approach can avoid both FN and FP, with a slightly higher processing time. By combining the two schemes, a faster and a more perfect identification can be achieved, in which FN and FP can be avoided.
To estimate the number of substring matches against string data, count suffix trees (CS-tree) have been used as a kind of alphanumeric histograms. Although the trees are useful for substring count estimation in short data strings (e.g. name or title), they reveal several drawbacks when the target is changed to extremely long strings. First, it becomes too hard or at least slow to build CS-trees, because their origin, the suffix tree, has memory-bottleneck problem with long strings. Secondly, some of CS-tree-node counts are incorrect due to frequent pruning of nodes. Therefore, we propose the count q-gram tree (CQ-tree) as an alphanumeric histogram for long strings. By adopting q-grams (or length-q substrings), CQ-trees can be created fast and correctly within small available memory. Furthermore, we mathematically provide the lower and upper bounds that the count estimation can reach to. To the best of our knowledge, our work is the first one to present such bounds among research activities to estimate the alphanumeric selectivity. Our experimental study shows that the CQ-tree outperforms the CS-tree in terms of the building time and accuracy.
Krishna KANT Amit SAHOO Nrupal JANI
Given the availability of high-speed Ethernet and HW based protocol offload, clustered systems using a commodity network fabric (e.g., TCP/IP over Ethernet) are expected to become more attractive for a range of e-business and data center applications. In this paper, we describe a comprehensive simulation to study the performance of clustered database systems using such a fabric. The simulation model currently supports both TCP and SCTP as the transport protocol and models an Oracle 9i like clustered DBMS running a TPC-C like workload. The model can be used to study a wide variety of issues regarding the performance of clustered DBMS systems including the impact of enhancements to network layers (transport, IP, MAC), QoS mechanisms or latency improvements, and cluster-wide power control issues.
Yasunori ISHIHARA Shuichiro AKO Toru FUJIWARA
Inference attacks mean that a user derives information on the execution results of unauthorized queries from the execution results of authorized queries. Most of the studies on inference attacks so far have focused on only inference of positive information (i.e., what value is the execution result of a given unauthorized query). However, negative information (i.e., what value is never the execution result of a given unauthorized query) is also sensitive in many cases. This paper presents the following results on the security against inference attacks on negative information in object-oriented databases. First, inference of negative information is formalized under a model of object-oriented databases called method schemas. Then, the following two types of security problems are defined: (1) Is a given database instance secure against inference attacks on given negative information? (2) Are all of the database instances of a given database schema secure against inference attacks on given negative information? It is shown that the first problem is decidable in polynomial time in the description size of the database instance while the second one is undecidable. A decidable sufficient condition for any database instance of a given database schema to be secure is also proposed. Finally, it is shown that for a monadic schema (i.e., every method has exactly one parameter), this sufficient condition is also a necessary one.
Seokjin HONG Bongki MOON Sukho LEE
A range top-k query returns the topmost k records in the order set by a measure attribute within a specified region of multi-dimensional data. The range top-k query is a powerful tool for analysis in spatial databases and data warehouse environments. In this paper, we propose an algorithm to answer the query by selectively traversing an aggregate R-tree having MAX as the aggregate values. The algorithm can execute the query by accessing only a small part of the leaf nodes within a query region. Therefore, it shows good query performance regardless of the size of the query region. We suggest an efficient pruning technique for the priority queue, which reduces the cost of handling the priority queue, and also propose an efficient technique for leaf node organization to reduce the number of node accesses to execute the range top-k queries.
Kiyoshi HOSHINO Takanobu TANIMOTO
The authors propose a system for searching the shape of human hands and fingers in real time and with high accuracy, without using any special peripheral equipment such as range sensor, PC cluster, etc., by a method of retrieving similar image quickly with high accuracy from a large volume of image database containing the complicated shapes and self-occlusions. In designing the system, we constructed a database in a way to be adaptable even to differences among individuals, and searched CG images of hand similar to unknown hand image, through extraction of characteristics using high-order local autocorrelational patterns, reduction of the amount of characteristics centering on principal component analysis, and prior rearrangement of data corresponding to the amount of characteristics. As a result of experiments, our system performed high-accuracy estimation of human hand shape where mean error was 7 degrees in finger joint angles, with the processing speed of 30 fps or over.
We propose a system that enables us to gather hundreds of images related to one set of keywords provided by a user from the World Wide Web. The system is called Image Collector II. The Image Collector, which we proposed previously, can gather only one or two hundreds of images. We propose the two following improvements on our previous system in terms of the number of gathered images and their precision: (1) We extract some words appearing with high frequency from all HTML files in which output images are embedded in an initial image gathering, and using them as keywords, we carry out a second image gathering. Through this process, we can obtain hundreds of images for one set of keywords. (2) The more images we gather, the more the precision of gathered images decreases. To improve the precision, we introduce word vectors of HTML files embedding images into the image selecting process in addition to image feature vectors.
Kazuki ADACHI Tomoki TODA Hiromichi KAWANAMI Hiroshi SARUWATARI Kiyohiro SHIKANO
This research aims to construct a high-quality Japanese TTS (Text-to-Speech) system that has high flexibility in treating prosody. Many TTS systems have implemented a prosody control system but such systems have been fundamentally designed to output speech with a standard pitch and speech rate. In this study, we employ a unit selection-concatenation method and also introduce an analysis-synthesis process to provide precisely controlled prosody in output speech. Speech quality degrades in proportion to the amount of prosody modification, therefore a target cost for prosody is set to evaluate prosodic difference between target prosody and speech candidates in such a unit selection system. However, the conventional cost ignores the original prosody of speech segments, although it is assumed that the quality deterioration tendency varies in relation to the pitch or speech rate of original speech. In this paper, we propose a novel cost function design based on the prosody of speech segments. First, we recorded nine databases of Japanese speech with different prosodic characteristics. Then with respect to the speech databases, we investigated the relationships between the amount of prosody modification and the perceptual degradation. The results indicate that the tendency of perceptual degradation differs according to the prosodic features of the original speech. On the basis of these results, we propose a new cost function design, which changes a cost function according to the prosody of a speech database. Results of preference testing of synthetic speech show that the proposed cost functions generate speech of higher quality than the conventional method.
Carlos PEREZ LEGUIZAMO Dake WANG Kinji MORI
Recently with the advent of the IT and the wide spread use of the Internet, new user oriented production and logistic systems, such as the Supply Chain Management System, have been required in order to cope with the drastic and continuous changes on the markets and users' preferences. Therefore, heterogeneous database systems need to be integrated in a common environment which can cope with the heterogeneous requirements of each company under an ever-evolving changing environment. That is assurance. Autonomous Decentralized Database System (ADDS) is proposed as a system architecture in order to realize assurance in distributed database systems. In this system architecture, a loosely-consistency management technology is proposed in order to maintain the consistency of the system, each database can update autonomously, and confer the real time property. A background coordination technology, performed by an autonomous mobile agent, is devised to adapt the system to evolving situations. The system can achieve real time by allocating the information in advance among the sites that has different time constraints for updating. Moreover, an assurance information allocation technology is proposed when considering that a failure in the background coordination mechanism may lead to loss of data and unavailability of the system. This mechanism, in which the mobile agent autonomously regulate its own capacity for allocating the information, is proposed based on the real-time property and system's availability considerations. The effectiveness of the proposed architecture and technologies are evaluated by simulation.
Motion capture technology is widely used to make a realistic motion in these days. Different motion capture devices use different motion capture data formats. Because of the lack of compatibility of motion capture data animators can't reuse the already captured motion sequence. In addition, it is difficult for integrating, storing and retrieving motion capture data with different formats in the storage. In this paper, we propose a standard format for integrating a different motion capture data formats. In addition, we propose a framework of a system that manages motion capture data using our standard format. Our standard format is called MCML (Motion Capture Markup Language). It is a markup language for motion capture data and is based on XML (extensible Markup Language). Our system designed to manage motion capture data consists of a several components -- Mocap Syntax Analyzer, MCML Converter, MCML Editor, Motion Viewer, MCML Storage Wrapper.
By using distributed database systems, many advantages can be obtained such as database management cost, efficiency, and high integrity of systems through allocating fragments to many distributed sites with horizontal/vertical fragmentation of global database schema. To minimize costs, distributed algorithms must be applied so that database fragments are allocated to optimal sites. It is useful to replicate fragments, such as allocating many copies in many sites including load balancing. But there are too many possible combinations of each site and fragment, making it impossible to find a solution in real time, i.e., it is an NP-complete problem. This paper proposes near optimal heuristic algorithms for minimizing cost by defining a cost model based on read and update queries that are requested in many sites. Various factors are applied to the proposed algorithms for sizing efficient network resources that compute database transactions as remote query or update requests for consistency in replicated database systems. For network load balancing, incoming network traffic table is defined in each site. A request transaction from unallocated sites to allocated sites can be accessed properly at any other replicated sites by using the network traffic table. Finally, some experimental results verified the proposed algorithms by comparing actual cases of database allocation.
Inseon LEE Heon Y. YEOM Taesoon PARK
Distributed database systems require a commit process to preserve the ACID property of transactions executed on a number of system sites. With the appearance of main memory database system, the database processing time has been reduced in the order of magnitude, since the database access does not incur any disk access at all. However, when it comes to distributed main memory database systems, the distributed commit process is still very slow since the disk logging at several sites has to precede the transaction commit. In this paper, we re-evaluate various distributed commit protocols and come up with a causal commit protocol suitable for distributed main memory database systems. To evaluate the performance of the proposed commit protocol, extensive simulation study has been performed. The simulation results confirm that the new protocol greatly reduces the time to commit the distributed transactions without any consistency problem.
Yasunori ISHIHARA Kengo MORI Toru FUJIWARA
Detecting the possibility of inference attacks is necessary in order to keep a database secure. Inference attacks mean that a user tries to infer the result of an unauthorized queries to the user. For method schemas, which are a formal model of object-oriented databases, it is known that the security problem against inference attacks is decidable in polynomial time in the size of a given database instance. However, when the database instance or authorization has slightly been updated, it is not desirable to check the entire database again for efficiency. In this paper, we propose several sufficient conditions for update operations to preserve the security. Furthermore, we show that some of the proposed sufficient conditions can be decided much more efficiently than the entire security check. Thus, the sufficient conditions are useful for incremental security checking.