The search functionality is under construction.

Author Search Result

[Author] Hiroki ARIMURA(11hit)

  • Inductive Logic Programming: From Logic of Discovery to Machine Learning

    Hiroki ARIMURA  Akihiro YAMAMOTO  


    E83-D No:1

    Inductive Logic Programming (ILP) is a study of machine learning systems that use clausal theories in first-order logic as a representation language. In this paper, we survey theoretical foundations of ILP from the viewpoints of Logic of Discovery and Machine Learning, and try to unify these two views with the support of the modern theory of Logic Programming. Firstly, we define several hypothesis construction methods in ILP and give their proof-theoretic foundations by treating them as a procedure which complets incomplete proofs. Next, we discuss the design of individual learning algorithms using these hypothesis construction methods. We review known results on learning logic programs in computational learning theory, and show that these algorithms are instances of a generic learning strategy with proof completion methods.

  • The Complexity of Induced Tree Reconfiguration Problems

    Kunihiro WASA  Katsuhisa YAMANAKA  Hiroki ARIMURA  


    E102-D No:3

    Given two feasible solutions A and B, a reconfiguration problem asks whether there exists a reconfiguration sequence (A0=A, A1,...,Aℓ=B) such that (i) A0,...,Aℓ are feasible solutions and (ii) we can obtain Ai from Ai-1 under the prescribed rule (the reconfiguration rule) for each i ∈ {1,...,ℓ}. In this paper, we address the reconfiguration problem for induced trees, where an induced tree is a connected and acyclic induced subgraph of an input graph. We consider the following two rules as the prescribed rules: Token Jumping: removing u from an induced tree and adding v to the tree, and Token Sliding: removing u from an induced tree and adding v adjacent to u to the tree, where u and v are vertices of an input graph. As the main results, we show that (I) the reconfiguration problemis PSPACE-complete even if the input graph is of bounded maximum degree, (II) the reconfiguration problem is W[1]-hard when parameterized by both the size of induced trees and the length of the reconfiguration sequence, and (III) there exists an FPT algorithm when the problem is parameterized by both the size of induced trees and the maximum degree of an input graph under Token Jumping and Token Sliding.

  • Efficient Approximate 3-Dimensional Point Set Matching Using Root-Mean-Square Deviation Score

    Yoichi SASAKI  Tetsuo SHIBUYA  Kimihito ITO  Hiroki ARIMURA  


    E102-A No:9

    In this paper, we study the approximate point set matching (APSM) problem with minimum RMSD score under translation, rotation, and one-to-one correspondence in d-dimension. Since most of the previous works about APSM problems use similality scores that do not especially care about one-to-one correspondence between points, such as Hausdorff distance, we cannot easily apply previously proposed methods to our APSM problem. So, we focus on speed-up of exhaustive search algorithms that can find all approximate matches. First, we present an efficient branch-and-bound algorithm using a novel lower bound function of the minimum RMSD score for the enumeration version of APSM problem. Then, we modify this algorithm for the optimization version. Next, we present another algorithm that runs fast with high probability when a set of parameters are fixed. Experimental results on both synthetic datasets and real 3-D molecular datasets showed that our branch-and-bound algorithm achieved significant speed-up over the naive algorithm still keeping the advantage of generating all answers.

  • Efficient Substructure Discovery from Large Semi-Structured Data

    Tatsuya ASAI  Kenji ABE  Shinji KAWASOE  Hiroshi SAKAMOTO  Hiroki ARIMURA  Setsuo ARIKAWA  

    PAPER-Data Mining

    E87-D No:12

    In this paper, we consider a data mining problem for semi-structured data. Modeling semi-structured data as labeled ordered trees, we present an efficient algorithm for discovering frequent substructures from a large collection of semi-structured data. By extending the enumeration technique developed by Bayardo (SIGMOD'98) for discovering long itemsets, our algorithm scales almost linearly in the total size of maximal tree patterns contained in an input collection depending mildly on the size of the longest pattern. We also developed several pruning techniques that significantly speed-up the search. Experiments on Web data show that our algorithm runs efficiently on real-life datasets combined with proposed pruning techniques in the wide range of parameters.

  • Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

    Takuya TAKAGI  Shunsuke INENAGA  Kunihiko SADAKANE  Hiroki ARIMURA  


    E100-A No:9

    We present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in nlog σ+O(klog n) bits of space and supports fast pattern matching queries and updates, where σ is the alphabet size. Assume that α=logσn letters are packed in a single machine word on the standard word RAM model, and let f(k,n) denote the query and update times of the dynamic predecessor/successor data structure of our choice which stores k integers from universe [1,n] in O(klog n) bits of space. Then, given a string of length m, our packed c-tries support pattern matching queries and insert/delete operations in $O( rac{m}{alpha} f(k,n))$ worst-case time and in $O( rac{m}{alpha} + f(k,n))$ expected time. Our experiments show that our packed c-tries are faster than the standard compact tries (a.k.a. Patricia trees) on real data sets. We also discuss applications of our packed c-tries.

  • A Dynamically Reconfigurable FPGA-Based Pattern Matching Hardware for Subclasses of Regular Expressions

    Yusaku KANETA  Shingo YOSHIZAWA  Shin-ichi MINATO  Hiroki ARIMURA  Yoshikazu MIYANAGA  

    PAPER-Computer System

    E95-D No:7

    In this paper, we propose a novel architecture for large-scale regular expression matching, called dynamically reconfigurable bit-parallel NFA architecture (Dynamic BP-NFA), which allows dynamic loading of regular expressions on-the-fly as well as efficient pattern matching for fast data streams. This is the first dynamically reconfigurable hardware with guaranteed performance for the class of extended patterns, which is a subclass of regular expressions consisting of union of characters and its repeat. This class allows operators such as character classes, gaps, optional characters, and bounded and unbounded repeats of character classes. The key to our architecture is the use of bit-parallel pattern matching approach, in which the information of an input non-deterministic finite automaton (NFA) is first compactly encoded in bit-masks stored in a collection of registers and block RAMs. Then, the NFA is efficiently simulated by a fixed circuitry using bitwise Boolean and arithmetic operations consuming one input character per clock regardless of the actual contents of an input text. Experimental results showed that our hardwares for both string and extended patterns were comparable to previous dynamically reconfigurable hardwares in their performances.

  • Efficient Enumeration of Induced Matchings in a Graph without Cycles with Length Four

    Kazuhiro KURITA  Kunihiro WASA  Takeaki UNO  Hiroki ARIMURA  


    E101-A No:9

    In this study, we address a problem pertaining to the induced matching enumeration. An edge set M is an induced matching of a graph G=(V,E). The enumeration of matchings has been widely studied in literature; however, there few studies on induced matching. A straightforward algorithm takes O(Δ2) time for each solution that is coming from the time to generate a subproblem, where Δ is the maximum degree in an input graph. To generate a subproblem, an algorithm picks up an edge e and generates two graphs, the one is obtained by removing e from G, the other is obtained by removing e, adjacent edge to e, and edges adjacent to adjacent edge of e. Since this operation needs O(Δ2) time, a straightforward algorithm enumerates all induced matchings in O(Δ2) time per solution. We investigated local structures that enable us to generate subproblems within a short time and proved that the time complexity will be O(1) if the input graph is C4-free. A graph is C4-free if and only if none of its subgraphs have a cycle of length four.

  • Discovering Co-Cluster Structure from Relationships between Biased Objects

    Iku OHAMA  Takuya KIDA  Hiroki ARIMURA  

    PAPER-Artificial Intelligence, Data Mining

    E101-D No:12

    Latent variable models for relational data enable us to extract the co-cluster structure underlying observed relational data. The Infinite Relational Model (IRM) is a well-known relational model for discovering co-cluster structures with an unknown number of clusters. The IRM and several related models commonly assume that the link probability between two objects depends only on their cluster assignment. However, relational models based on this assumption often lead us to extract many non-informative and unexpected clusters. This is because the cluster structures underlying real-world relationships are often blurred by biases of individual objects. To overcome this problem, we propose a multi-layered framework, which extracts a clear de-blurred co-cluster structure in the presence of object biases. Then, we propose the Multi-Layered Infinite Relational Model (MLIRM) which is a special instance of the proposed framework incorporating the IRM as a co-clustering model. Furthermore, we reveal that some relational models can be regarded as special cases of the MLIRM. We derive an efficient collapsed Gibbs sampler to perform posterior inference for the MLIRM. Experiments conducted using real-world datasets have confirmed that the proposed model successfully extracts clear and interpretable cluster structures from real-world relational data.

  • The Relevance Dependent Infinite Relational Model for Discovering Co-Cluster Structure from Relationships with Structured Noise

    Iku OHAMA  Hiromi IIDA  Takuya KIDA  Hiroki ARIMURA  

    PAPER-Artificial Intelligence, Data Mining

    E99-D No:4

    Latent variable models for relational data enable us to extract the co-cluster structures underlying observed relational data. The Infinite Relational Model (IRM) is a well-known relational model for discovering co-cluster structures with an unknown number of clusters. The IRM assumes that the link probability between two objects (e.g., a customer and an item) depends only on their cluster assignment. However, relational models based on this assumption often lead us to extract many non-informative and unexpected clusters. This is because the underlying co-cluster structures in real-world relationships are often destroyed by structured noise that blurs the cluster structure stochastically depending on the pair of related objects. To overcome this problem, in this paper, we propose an extended IRM that simultaneously estimates denoised clear co-cluster structure and a structured noise component. In other words, our proposed model jointly estimates cluster assignment and noise level for each object. We also present posterior probabilities for running collapsed Gibbs sampling to infer the model. Experiments on real-world datasets show that our model extracts a clear co-cluster structure. Moreover, we confirm that the estimated noise levels enable us to extract representative objects for each cluster.

  • Polynomial Time Inference of Unions of Two Tree Pattern Languages

    Hiroki ARIMURA  Takeshi SHINOHARA  Setsuko OTSUKI  


    E75-D No:4

    In this paper, we consider the polynomial time inferability from positive data for unions of two tree pattern languages. A tree pattern is a structured pattern known as a term in logic programming, and a tree pattern language is the set of all ground instances of a tree pattern. We present a polynomial time algorithm to find a minimal union of two tree pattern languages containing given examples. Our algorithm can be considered as a natural extension of Plotkin's least generalization algorithm, which finds a minimal single tree pattern language. By using this algorithm, we can realize a consistent and conservative polynomial time inference machine that identifies unions of two tree pattern languages from positive data in the limit.

  • Constant Time Enumeration of Subtrees with Exactly k Nodes in a Tree

    Kunihiro WASA  Yusaku KANETA  Takeaki UNO  Hiroki ARIMURA  

    PAPER-Graph Algorithms, Knowledge Discovery

    E97-D No:3

    By the motivation to discover patterns in massive structured data in the form of graphs and trees, we study a special case of the k-subtree enumeration problem with a tree of n nodes as an input graph, which is originally introduced by (Ferreira, Grossi, and Rizzi, ESA'11, 275-286, 2011) for general graphs. Based on reverse search technique (Avis and Fukuda, Discrete Appl. Math., vol.65, pp.21-46, 1996), we present the first constant delay enumeration algorithm that lists all k-subtrees of an input rooted tree in O(1) worst-case time per subtree. This result improves on the straightforward application of Ferreira et al.'s algorithm with O(k) amortized time per subtree when an input is restricted to tree. Finally, we discuss an application of our algorithm to a modification of the graph motif problem for trees.