1-6hit |
Yu WANG Liangyong YANG Jilian ZHANG Xuelian DENG
Cloud computing has become the mainstream computing paradigm nowadays. More and more data owners (DO) choose to outsource their data to a cloud service provider (CSP), who is responsible for data management and query processing on behalf of DO, so as to cut down operational costs for the DO. However, in real-world applications, CSP may be untrusted, hence it is necessary to authenticate the query result returned from the CSP. In this paper, we consider the problem of approximate string query result authentication in the context of database outsourcing. Based on Merkle Hash Tree (MHT) and Trie, we propose an authenticated tree structure named MTrie for authenticating approximate string query results. We design efficient algorithms for query processing and query result authentication. To verify effectiveness of our method, we have conducted extensive experiments on real datasets and the results show that our proposed method can effectively authenticate approximate string query results.
Lucas Saad Nogueira NUNES Jacir Luiz BORDIM Yasuaki ITO Koji NAKANO
The closeness of a match is an important measure with a number of practical applications, including computational biology, signal processing and text retrieval. The approximate string matching (ASM) problem asks to find a substring of string Y of length n that is most similar to string X of length m. It is well-know that the ASM can be solved by dynamic programming technique by computing a table of size m×n. The main contribution of this work is to present a memory-access-efficient implementation for computing the ASM on a GPU. The proposed GPU implementation relies on warp shuffle instructions which are used to accelerate the communication between threads without resorting to shared memory access. Despite the fact that O(mn) memory access operations are necessary to access all elements of a table with size n×m, the proposed implementation performs only $O(rac{mn}{w})$ memory access operations, where w is the warp size. Experimental results carried out on a GeForce GTX 980 GPU show that the proposed implementation, called w-SCAN, provides speed-up of over two fold in computing the ASM as compared to another prominent alternative.
Duhu MAN Koji NAKANO Yasuaki ITO
The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computing on CUDA-enabled GPUs. The approximate string matching (ASM) for two strings X and Y of length m and n is a task to find a substring of Y most similar to X. The main contribution of this paper is to show an optimal parallel algorithm for the approximate string matching on the HMM and implement it on GeForce GTX 580 GPU. Our algorithm runs in $O({nover w}+{mnover dw}+{nLover p}+{mnlover p})$ time units on the HMM with p threads, d streaming processors, memory band width w, global memory access latency L, and shared memory access latency l. We also show that the lower bound of the computing time is $Omega({nover w}+{mnover dw}+{nLover p}+{mnlover p})$ time units. Thus, our algorithm for the approximate string matching is time optimal. Further, we implemented our algorithm on GeForce GTX 580 GPU and evaluated the performance. The experimental results show that the ASM of two strings of 1024 and 4M (=222) characters can be done in 419.6ms, while the sequential algorithm can compute it in 27720ms. Thus, our implementation on the GPU attains a speedup factor of 66.1 over the single CPU implementation.
This paper shows a way to derive positive definite kernels from edit distances. It is well-known that, if a distance d is negative definite, e-λd is positive definite for any λ > 0. This property provides us the opportunity to apply useful techniques of kernel multivariate analysis to the features of data captured by means of the distance. However, the known instances of edit distance are not always negative definite. Even worse, it is usually not easy to examine whether a given instance of edit distance is negative definite. This paper introduces alignment kernels to present an alternative means to derive kernels from edit distance. The most important advantage of the alignment kernel consists in its easy-to-check sufficient condition for the positive definiteness. In fact, when we surveyed edit distances for strings, trees and graphs, all but one are instantly verified to meet the condition and therefore proven to be positive definite.
Tree structured data often appear in bioinformatics. For example, glycans, RNA secondary structures and phylogenetic trees usually have tree structures. Comparison of trees is one of fundamental tasks in analysis of these data. Various distance measures have been proposed and utilized for comparison of trees, among which extensive studies have been done on tree edit distance. In this paper, we review key results and our recent results on the tree edit distance problem and related problems. In particular, we review polynomial time exact algorithms and more efficient approximation algorithms for the edit distance problem for ordered trees, and approximation algorithms for the largest common sub-tree problem for unordered trees. We also review applications of tree edit distance and its variants to bioinformatics with focusing on comparison of glycan structures.
Shinnosuke SEKI Satoshi KOBAYASHI
In this paper, we are concerned with a structural ambiguity problem of tree adjoining grammars (TAGs), which is an essential problem when we try to model consensus structures of given set of ribonucleic acid (RNA) secondary structures by TAGs. RNA secondary structures can be represented as strings with structural information, and TAGs have a descriptive capability of this kind of strings, what we call structure-annotated strings. Thus, we can model RNA secondary structures by TAGs. It is sufficient to use existing alignment methods for just computing the optimal alignment between RNA secondary structures. However, when we also want to model the resulting alignment by grammars, if we adopt these existing methods, then we may fail in modeling the alignment result by grammars. Therefore, it is important to introduce a new alignment method whose alignment results can be appropriately modeled by grammars. In this paper, we will propose an alignment method based on TAG's derivations each corresponding to a given RNA secondary structure. For an RNA secondary structure, there exist a number of derivations of TAGs which correspond to the structure. From the grammatical point of view, the property of TAGs drives us to the question how we should choose a derivation from these candidates in order to obtain an optimal alignment. This is the structural ambiguity problem of TAGs, which will be mainly discussed in this paper. For dealing with this problem appropriately, we will propose an edit distance between two structure-annotated strings, and then present an algorithm which computes an optimal alignment based on the edit distance.