The search functionality is under construction.

Author Search Result

[Author] Yoshiharu ISHIKAWA(12hit)

1-12hit
  • Grouping Methods for Pattern Matching over Probabilistic Data Streams

    Kento SUGIURA  Yoshiharu ISHIKAWA  Yuya SASAKI  

     
    PAPER

      Pubricized:
    2017/01/17
      Vol:
    E100-D No:4
      Page(s):
    718-729

    As the development of sensor and machine learning technologies has progressed, it has become increasingly important to detect patterns from probabilistic data streams. In this paper, we focus on complex event processing based on pattern matching. When we apply pattern matching to probabilistic data streams, numerous matches may be detected at the same time interval because of the uncertainty of data. Although existing methods distinguish between such matches, they may derive inappropriate results when some of the matches correspond to the real-world event that has occurred during the time interval. Thus, we propose two grouping methods for matches. Our methods output groups that indicate the occurrence of complex events during the given time intervals. In this paper, first we describe the definition of groups based on temporal overlap, and propose two grouping algorithms, introducing the notions of complete overlap and single overlap. Then, we propose an efficient approach for calculating the occurrence probabilities of groups by using deterministic finite automata that are generated from the query patterns. Finally, we empirically evaluate the effectiveness of our methods by applying them to real and synthetic datasets.

  • Content-Based Element Search for Presentation Slide Reuse

    Jie ZHANG  Chuan XIAO  Toyohide WATANABE  Yoshiharu ISHIKAWA  

     
    PAPER-Data Engineering, Web Information Systems

      Vol:
    E97-D No:10
      Page(s):
    2685-2696

    Presentation slide composition is an important job for knowledge workers. Instead of starting from scratch, users tend to make new presentation slides by reusing existing ones. A primary challenge in slide reuse is to select desired materials from a collection of existing slides. The state-of-the-art solution utilizes texts and images in slides as well as file names to help users to retrieve the materials they want. However, it only allows users to choose an entire slide as a query but does not support the search for a single element such as a few keywords, a sentence, an image, or a diagram. In this paper, we investigate content-based search for a variety of elements in presentation slides. Users may freely choose a slide element as a query. We propose different query processing methods to deal with various types of queries and improve the search efficiency. A system with a user-friendly interface is designed, based on which experiments are performed to evaluate the effectiveness and the efficiency of the proposed methods.

  • Probabilistic Range Querying over Gaussian Objects Open Access

    Tingting DONG  Chuan XIAO  Yoshiharu ISHIKAWA  

     
    PAPER

      Vol:
    E97-D No:4
      Page(s):
    694-704

    Probabilistic range query is an important type of query in the area of uncertain data management. A probabilistic range query returns all the data objects within a specific range from the query object with a probability no less than a given threshold. In this paper, we assume that each uncertain object stored in the database is associated with a multi-dimensional Gaussian distribution, which describes the probability distribution that the object appears in the multi-dimensional space. A query object is either a certain object or an uncertain object modeled by a Gaussian distribution. We propose several filtering techniques and an R-tree-based index to efficiently support probabilistic range queries over Gaussian objects. Extensive experiments on real data demonstrate the efficiency of our proposed approach.

  • Building Hierarchical Spatial Histograms for Exploratory Analysis in Array DBMS

    Jing ZHAO  Yoshiharu ISHIKAWA  Lei CHEN  Chuan XIAO  Kento SUGIURA  

     
    PAPER

      Pubricized:
    2019/01/18
      Vol:
    E102-D No:4
      Page(s):
    788-799

    As big data attracts attention in a variety of fields, research on data exploration for analyzing large-scale scientific data has gained popularity. To support exploratory analysis of scientific data, effective summarization and visualization of the target data as well as seamless cooperation with modern data management systems are in demand. In this paper, we focus on the exploration-based analysis of scientific array data, and define a spatial V-Optimal histogram to summarize it based on the notion of histograms in the database research area. We propose histogram construction approaches based on a general hierarchical partitioning as well as a more specific one, the l-grid partitioning, for effective and efficient data visualization in scientific data analysis. In addition, we implement the proposed algorithms on the state-of-the-art array DBMS, which is appropriate to process and manage scientific data. Experiments are conducted using massive evacuation simulation data in tsunami disasters, real taxi data as well as synthetic data, to verify the effectiveness and efficiency of our methods.

  • Query Processing in a Traceable P2P Record Exchange Framework

    Fengrong LI  Yoshiharu ISHIKAWA  

     
    PAPER-Parallel and Distributed Databases

      Vol:
    E93-D No:6
      Page(s):
    1433-1446

    As the spread of high-speed networks and the development of network technologies, P2P technologies are actively used today for information exchange in the network. While information exchange in a P2P network is quite flexible, there is an important problem--lack of reliability. Since we cannot know the details of how the data was obtained, it is hard to fully rely on it. To ensure the reliability of exchanged data, we have proposed the framework of a traceable P2P record exchange based on database technologies. In this framework, records are exchanged among autonomous peers, and each peer stores its exchange and modification histories in it. The framework supports the function of tracing queries to query the details of the obtained data. A tracing query is described in datalog and executed as a recursive query in the P2P network. In this paper, we focus on the query processing strategies for the framework. We consider two types of queries, ad hoc queries and continual queries, and present the query processing strategies for their executions.

  • False Drop Analysis of Set Retrieval with Signature Files

    Hiroyuki KITAGAWA  Yoshiharu ISHIKAWA  

     
    PAPER-Databases

      Vol:
    E80-D No:6
      Page(s):
    653-664

    Modern database systems have to support complex data objects, which appear in advanced data models such as object-oriented data models and nested relational data models. Set-valued objects are basic constructs to build complex structures in those models. Therefore, efficient processing of set-valued object retrieval (simply, set retrieval) is an important feature required of advanced database systems. Our previous work proposed a basic scheme to apply superimposed coded signature files to set retrieval and showed its potential advantages over the B-tree index based approach using a performance analysis model. Retrieval with signature files is always accompanied by mismatches called false drops, and proper control of the false drops is indispensable in the signature file design. This study intensively analyzes the false drops in set retrieval with signature files. First, schemes to use signature files are presented to process set retrieval involving "has-subset," "is-subset," "has-intersection," and "is-equal" predicates, and generic formulas estimating the false drops are derived. Then, three sets of concrete formulas are derived in three ways to estimate the false drops in the four types of set retrieval. Finally, their estimates are validated with computer simulations, and advantages and disadvantages of each set of the false drop estimation formulas are discussed. The analysis shows that proper choice of estimation formulas gives quite accurate estimates of the false drops in set retrieval with signature files.

  • An Efficient Algorithm for Location-Aware Query Autocompletion Open Access

    Sheng HU  Chuan XIAO  Yoshiharu ISHIKAWA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2017/10/05
      Vol:
    E101-D No:1
      Page(s):
    181-192

    Query autocompletion is an important and practical technique when users want to search for desirable information. As mobile devices become more and more popular, one of the main applications is location-aware service, such as Web mapping. In this paper, we propose a new solution to location-aware query autocompletion. We devise a trie-based index structure and integrate spatial information into trie nodes. Our method is able to answer both range and top-k queries. In addition, we discuss the extension of our method to support the error tolerant feature in case user's queries contain typographical errors. Experiments on real datasets show that the proposed method outperforms existing methods in terms of query processing performance.

  • Requirement Specification and Derivation of ECA Rules for Integrating Multiple Dissemination-Based Information Sources

    Tomoyuki KAJINO  Hiroyuki KITAGAWA  Yoshiharu ISHIKAWA  

     
    PAPER

      Vol:
    E87-D No:1
      Page(s):
    3-14

    The recent development of network technology has enabled us to access various information sources easily, and their integration has been studied intensively by the data engineering research community. Although technological advancement has made it possible to integrate existing heterogeneous information sources, we still have to deal with information sources of a new kind--dissemination-based information sources. They actively and autonomously deliver information from server sites to users. Integration of dissemination-based information sources is one of the popular research topics. We have been developing an information integration system in which we employ ECA rules to enable users to define new information delivery services integrating multiple existing dissemination-based information sources. However, it is not easy for users to directly specify ECA rules and to verify them. In this paper, we propose a scheme to specify new dissemination-based information delivery services using the framework of relational algebra. We discuss some important properties of the specification, and show how we can derive ECA rules to implement the services.

  • Learning Local Similarity with Spatial Interrelations on Content-Based Image Retrieval

    Longjiao ZHAO  Yu WANG  Jien KATO  Yoshiharu ISHIKAWA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2023/02/14
      Vol:
    E106-D No:5
      Page(s):
    1069-1080

    Convolutional Neural Networks (CNNs) have recently demonstrated outstanding performance in image retrieval tasks. Local convolutional features extracted by CNNs, in particular, show exceptional capability in discrimination. Recent research in this field has concentrated on pooling methods that incorporate local features into global features and assess the global similarity of two images. However, the pooling methods sacrifice the image's local region information and spatial relationships, which are precisely known as the keys to the robustness against occlusion and viewpoint changes. In this paper, instead of pooling methods, we propose an alternative method based on local similarity, determined by directly using local convolutional features. Specifically, we first define three forms of local similarity tensors (LSTs), which take into account information about local regions as well as spatial relationships between them. We then construct a similarity CNN model (SCNN) based on LSTs to assess the similarity between the query and gallery images. The ideal configuration of our method is sought through thorough experiments from three perspectives: local region size, local region content, and spatial relationships between local regions. The experimental results on a modified open dataset (where query images are limited to occluded ones) confirm that the proposed method outperforms the pooling methods because of robustness enhancement. Furthermore, testing on three public retrieval datasets shows that combining LSTs with conventional pooling methods achieves the best results.

  • Multiple Regular Expression Pattern Monitoring over Probabilistic Event Streams

    Kento SUGIURA  Yoshiharu ISHIKAWA  

     
    PAPER

      Pubricized:
    2020/02/03
      Vol:
    E103-D No:5
      Page(s):
    982-991

    As smartphones and IoT devices become widespread, probabilistic event streams, which are continuous analysis results of sensing data, have received a lot of attention. One of the applications of probabilistic event streams is monitoring of time series events based on regular expressions. That is, we describe a monitoring query such as “Has the tracked object moved from RoomA to RoomB in the past 30 minutes?” by using a regular expression, and then check whether corresponding events occur in a probabilistic event stream with a sliding window. Although we proposed the fundamental monitoring method of time series events in our previous work, three problems remain: 1) it is based on an unusual assumption about slide size of a sliding window, 2) the grammar of pattern queries did not include “negation”, and 3) it was not optimized for multiple monitoring queries. In this paper, we propose several techniques to solve the above problems. First, we remove the assumption about slide size, and propose adaptive slicing of sliding windows for efficient probability calculation. Second, we calculate the occurrence probability of a negation pattern by using an inverted DFA. Finally, we propose the merge of multiple DFAs based on disjunction to process multiple queries efficiently. Experimental results using real and synthetic datasets demonstrate effectiveness of our approach.

  • Implementation of a Multi-Word Compare-and-Swap Operation without Garbage Collection

    Kento SUGIURA  Yoshiharu ISHIKAWA  

     
    PAPER

      Pubricized:
    2022/02/03
      Vol:
    E105-D No:5
      Page(s):
    946-954

    With the rapid increase in the number of CPU cores, software that can utilize these many cores is required. A lock-free algorithm based on compare-and-swap (CAS) operations is one of the concurrency control methods to implement such multi-threading software. A multi-word CAS (MwCAS) operation is an extension of a CAS operation to swap multiple words atomically. However, we noticed that the performance of the existing MwCAS implementation is limited because of garbage collection even if in a low-contention environment. To achieve high performance in low-contention workloads, we propose a new MwCAS algorithm without garbage collection. Experimental results show that our approach is three to five times faster than implementation with garbage collection in low-contention workloads. Moreover, the performance of the proposed method is also superior in a high-contention environment.

  • Design and Performance Analysis of Indexing Schemes for Set Retrieval of Nested Objects

    Yoshiharu ISHIKAWA  Hiroyuki KITAGAWA  

     
    PAPER-Implementation

      Vol:
    E78-D No:11
      Page(s):
    1424-1432

    Efficient retrieval of nested objects is an important issue in advanced database systems. So far, a number of indexing methods for nested objects have been proposed. However, they do not consider retrieval of nested objects based on the set comparison operators such as and . Previouly, we proposed four set access facilities for nested objects and compared their performance in terms of retrieval cost, storage cost, and update cost. In this paper, we extend the study and present refined algorithms and cost formulas applicable to more generalized situations. Our cost models and analysis not only contribute to the study of set-valued retrieval but also to cost estimation of various indexing methods for nested objects in general.