1-17hit |
Satoshi MATSUMOTO Tomoyuki UCHIDA Takayoshi SHOUDAI Yusuke SUZUKI Tetsuhiro MIYAHARA
A regular pattern is a string consisting of constant symbols and distinct variable symbols. The language of a regular pattern is the set of all constant strings obtained by replacing all variable symbols in the regular pattern with non-empty strings. The present paper deals with the learning problem of languages of regular patterns within Angluin's query learning model, which is an established mathematical model of learning via queries in computational learning theory. The class of languages of regular patterns was known to be identifiable from one positive example using a polynomial number of membership queries, in the query learning model. In present paper, we show that the class of languages of regular patterns is identifiable from one positive example using a linear number of membership queries, with respect to the length of the positive example.
Takayoshi SHOUDAI Yuta YOSHIMURA Yusuke SUZUKI Tomoyuki UCHIDA Tetsuhiro MIYAHARA
A cograph (complement reducible graph) is a graph which can be generated by disjoint union and complement operations on graphs, starting with a single vertex graph. Cographs arise in many areas of computer science and are studied extensively. With the goal of developing an effective data mining method for graph structured data, in this paper we introduce a graph pattern expression, called a cograph pattern, which is a special type of cograph having structured variables. Firstly, we show that a problem whether or not a given cograph pattern g matches a given cograph G is NP-complete. From this result, we consider the polynomial time learnability of cograph pattern languages defined by cograph patterns having variables labeled with mutually different labels, called linear cograph patterns. Secondly, we present a polynomial time matching algorithm for linear cograph patterns. Next, we give a polynomial time algorithm for obtaining a minimally generalized linear cograph pattern which explains given positive data. Finally, we show that the class of linear cograph pattern languages is polynomial time inductively inferable from positive data.
Takayoshi SHOUDAI Kazuhide AIKOH Yusuke SUZUKI Satoshi MATSUMOTO Tetsuhiro MIYAHARA Tomoyuki UCHIDA
An efficient means of learning tree-structural features from tree-structured data would enable us to construct effective mining methods for tree-structured data. Here, a pattern representing rich tree-structural features common to tree-structured data and a polynomial time algorithm for learning important tree patterns are necessary for mining knowledge from tree-structured data. As such a tree pattern, we introduce a term tree pattern t such that any edge label of t belongs to a finite alphabet Λ, any internal vertex of t has ordered children and t has a new kind of structured variable, called a height-constrained variable. A height-constrained variable has a pair of integers (i, j) as constraints, and it can be replaced with a tree whose trunk length is at least i and whose height is at most j. This replacement is called height-constrained replacement. A sequence of consecutive height-constrained variables is called a variable-chain. In this paper, we present polynomial time algorithms for solving the membership problem and the minimal language (MINL) problem for term tree patternshaving no variable-chain. The membership problem for term tree patternsis to decide whether or not a given tree can be obtained from a given term tree pattern by applying height-constrained replacements to all height-constrained variables in the term tree pattern. The MINL problem for term tree patternsis to find a term tree pattern t such that the language generated by t is minimal among languages, generated by term tree patterns, which contain all given tree-structured data. Finally, we show that the class, i.e., the set of all term tree patternshaving no variable-chain, is polynomial time inductively inferable from positive data if |Λ| ≥ 2.
Many learning machines such as normal mixtures and layered neural networks are not regular but singular statistical models, because the map from a parameter to a probability distribution is not one-to-one. The conventional statistical asymptotic theory can not be applied to such learning machines because the likelihood function can not be approximated by any normal distribution. Recently, new statistical theory has been established based on algebraic geometry and it was clarified that the generalization and training errors are determined by two birational invariants, the real log canonical threshold and the singular fluctuation. However, their concrete values are left unknown. In the present paper, we propose a new concept, a quasi-regular case in statistical learning theory. A quasi-regular case is not a regular case but a singular case, however, it has the same property as a regular case. In fact, we prove that, in a quasi-regular case, two birational invariants are equal to each other, resulting that the symmetry of the generalization and training errors holds. Moreover, the concrete values of two birational invariants are explicitly obtained, hence the quasi-regular case is useful to study statistical learning theory.
Hirotaka KATO Satoshi MATSUMOTO Tetsuhiro MIYAHARA
An elementary formal system, EFS for short, is a kind of logic program over strings, and regarded as a set of rules to generate a language. For an EFS Γ, the language L(Γ) denotes the set of all strings generated by Γ. We consider a new form of EFS, called a restricted two-clause EFS, and denote by rEFS the set of all restricted two-clause EFSs. Then we study the learnability of rEFS in the exact learning model. The class rEFS contains the class of regular patterns, which is extensively studied in Learning Theory. Let Γ* be a target EFS in rEFS of learning. In the exact learning model, an oracle for superset queries answers "yes" for an input EFS Γ in rEFS if L(Γ) is a superset of L(Γ*), and outputs a string in L(Γ*)-L(Γ), otherwise. An oracle for membership queries answers "yes" for an input string w if w is included in L(Γ*), and answers "no", otherwise. We show that any EFS in rEFS is exactly identifiable in polynomial time using membership and superset queries. Moreover, for other types of queries, we show that there exists no polynomial time learning algorithm for rEFS by using the queries. This result indicates the hardness of learning the class rEFS in the exact learning model, in general.
Hitoshi YAMASAKI Takayoshi SHOUDAI
A graph is an interval graph if and only if each vertex in the graph can be associated with an interval on the real line such that any two vertices are adjacent in the graph exactly when the corresponding intervals have a nonempty intersection. A number of interesting applications for interval graphs have been found in the literature. In order to find structural features common to structural data which can be represented by intervals, this paper proposes new interval graph structured patterns, called linear interval graph patterns, and a polynomial time algorithm for finding a minimally generalized linear interval graph pattern explaining a given finite set of interval graphs.
Ryoji TAKAMI Yusuke SUZUKI Tomoyuki UCHIDA Takayoshi SHOUDAI
Two-Terminal Series Parallel (TTSP, for short) graphs are used as data models in applications for electric networks and scheduling problems. We propose a TTSP term graph which is a TTSP graph having structured variables, that is, a graph pattern over a TTSP graph. Let TGTTSP be the set of all TTSP term graphs whose variable labels are mutually distinct. For a TTSP term graph g in TGTTSP, the TTSP graph language of g, denoted by L(g), is the set of all TTSP graphs obtained from g by substituting arbitrary TTSP graphs for all variables in g. Firstly, when a TTSP graph G and a TTSP term graph g are given as inputs, we present a polynomial time matching algorithm which decides whether or not L(g) contains G. The minimal language problem for the class LTTSP={L(g) | g ∈ TGTTSP} is, given a set S of TTSP graphs, to find a TTSP term graph g in TGTTSP such that L(g) is minimal among all TTSP graph languages which contain all TTSP graphs in S. Secondly, we give a polynomial time algorithm for solving the minimal language problem for LTTSP. Finally, we show that LTTSP is polynomial time inductively inferable from positive data.
Satoshi MATSUMOTO Takayoshi SHOUDAI Tomoyuki UCHIDA Tetsuhiro MIYAHARA Yusuke SUZUKI
A linear term tree is defined as an edge-labeled rooted tree pattern with ordered children and internal structured variables whose labels are mutually distinct. A variable can be replaced with arbitrary edge-labeled rooted ordered trees. We consider the polynomial time learnability of finite unions of linear term trees in the exact learning model formalized by Angluin. The language L(t) of a linear term tree t is the set of all trees obtained from t by substituting arbitrary edge-labeled rooted ordered trees for all variables in t. Moreover, for a finite set S of linear term trees, we define L(S)=∪t∈S L(t). A target of learning, denoted by T*, is a finite set of linear term trees, where the number of edge labels is infinite. In this paper, for any set T* of m linear term trees (m ≥ 0), we present a query learning algorithm which exactly identifies T* in polynomial time using at most 2mn2 Restricted Subset queries and at most m+1 Equivalence queries, where n is the maximum size of counterexamples. Finally, we note that finite sets of linear term trees are not learnable in polynomial time using Restricted Equivalence, Membership and Subset queries.
Kernel methods such as the support vector machines map input vectors into a high-dimensional feature space and linearly separate them there. The dimensionality of the feature space depends on a kernel function and is sometimes of an infinite dimension. The Gauss kernel is such an example. We discuss the effective dimension of the feature space with the Gauss kernel and show that it can be approximated to a sum of polynomial kernels and that its dimensionality is determined by the boundedness of the input space by considering the Taylor expansion of the kernel Gram matrix.
Eiju HIROWATARI Kouichi HIRATA Tetsuhiro MIYAHARA Setsuo ARIKAWA
This paper investigates the interaction of mind changes and anomalies for inductive inference of recursive real-valued functions. We show that the criteria for inductive inference of recursive real-valued functions by bounding the number of mind changes and anomalies preserve the same hierarchy as that of recursive functions, if the length of each anomaly as an interval is bounded. However, we also show that, without bounding it, the hierarchy of some criteria collapses. More precisely, while the class of recursive real-valued functions inferable in the limit allowing no more than one anomaly is properly contained in the class allowing just two anomalies, the latter class coincides with the class allowing arbitrary and bounded number of anomalies.
Although consistent learning is sufficient for PAC-learning, it has not been found what strategy makes learning more efficient, especially on the sample complexity, i.e., the number of examples required. For the first step towards this problem, classes that have consistent learning algorithms with one-sided error are considered. A combinatorial quantity called maximal particle sets is introduced, and an upper bound of the sample complexity of consistent learning with one-sided error is obtained in terms of maximal particle sets. For the class of n-dimensional axis-parallel rectangles, one of those classes that are consistently learnable with one-sided error, the cardinality of the maximal particle set is estimated and O(d/ε1/ε log 1/δ) upper bound of the learning algorithm for the class is obtained. This bound improves the bounds due to Blumer et al. and meets the lower bound within a constant factor.
In this paper we investigate the learnability of relations in Inductive Logic Programming, by using equality theories as background knowledge. We assume that a hypothesis and an observation are respectively a definite program and a set of ground literals. The targets of our learning algorithm are relations. By using equality theories as background knowledge we introduce tree structure into definite programs. The structure enable us to narrow the search space of hypothesis. We give pairs of a hypothesis language and a knowledge language in order to discuss the learnability of relations from the view point of inductive inference and PAC learning.
Hiroyuki OHNISHI Hiroyuki SEKI Tadao KASAMI
Recognizable series is a model of a sequential machine. A recognizable series S is represented by a triple (λ,µ,γ), called a linear representation of S, where λ is a row vector of dimension n specifying the initial state, γ is a column vector of dimension n specifying the output at a state, and µ is a morphism from input words to nn matrices specifying the state transition. The output for an input word w is defined as λ(µw) γ, called the coefficient of w in S, and written as (S,w). We present an algorithm which constructs a reduced linear representation of an unknown recognizable series S, with coefficients in a commutative field, using coefficient queries and equivalence queries. The answer to a coefficient query, with a word w, is the coefficient (S, w) of w in S. When one asks an equivalence query with a linear representation (λ,µ,γ), if (λ,µ,γ) is a linear representation of S, yes is returned, and otherwise a word c such that λ (µc) γ
Ryuichi NAKANISHI Hiroyuki SEKI Tadao KASAMI
Learning correctly from queries" is a formal learning model proposed by Angluin. In this model, for a class Γ of language representations, a learner asks queries to a teacher of an unknown language Lq which can be represented by some GqΓ, and eventually outputs a language representation GΓ which represents Lq and halts. An algorithm (leaner) A is said to learn a class of languages represented by Γ in the weak definition if the time complexity of A is some polynomial of n and m, where n is the minimum size of the lagunage representations in Γ which represent Lq, and m is the maximum length of the counterexamples returned in an execution. On the other band, A is said to learn represented by Γ in the strong definition if at any point τ of the execution, the time consumed up to τ is some polynomial of n and m, where n is the same as above, and m is the maximum length of the counterexamples returned up to τ. In this paper, adequacy of the model is examined, and it is shown that both in the weak and strong definitions, there exist learners which extract a long counterexample, and identify Lq by using equivalence queries exhaustively. For example, there exists a learner which learns the class CFL of context-free languages represented by the class CFG of context-free grammars in the weak definition using only equivalence queries. Next, two restrictions concerning with learnability criteria are introduced. Proper termination condition is that when a teacher replies with yes" to an equivalence query, then the learner must halt immediately. The other condition, called LBC-condition, is that in the weak/strong definition, the time complexity must be some polynomial of n and log m. In this paper, it is shown that under these conditions, there still exist learners which execute exhaustive search. For instance, there exists a learner which learns CFL represented by CFG in the weak definition using membership queries and equivalence queries under the proper termination condition, and there also exists a learner that learns CFL represented by CFG in the strong definition using subset queries and superset queries under LBC-condition. These results suggest that the weak definition is not an adequate learning model even if the proper termination condition is assumed. Also, the model becomes inadequate in the strong definition if some combination of queries, such as subset queries and superset queries, is used instead of equivalence queries. Many classes of languages become learnable by our extracting long counterexample" technique. However, it is still open whether or not CFL represented by CFG is learnable in the strong definition from membership queries and equivalence queries, although the answer is known to be negative if at least one of (1) quadratic residues modulo a composite, (2) inverting RSA encryption, or (3) factoring Blum integers, is intractable.
A new regularization cost function for generalization in real-valued function learning is proposed. This cost function is derived from the maximum likelihood method using a modified sample distribution, and consists of a sum of square errors and a stabilizer which is a function of integrated square derivatives. Each of the regularization parameters which gives the minimum estimation error can be obtained uniquely and non-empirically. The parameters are not constants and change in value during learning. Numerical simulation shows that this cost function predicts the true error accurately and is effective in neural network learning.
We investigate the relationship between two different notions of reducibility among prediction (learning) problems within the distribution-free learning model of Valiant (PAC learning model). The notions of reducibility we consider are the analogues for prediction problems of the many-one reducibility and of the Turing reducibility. The former is the notion of prediction preserving reducibility developed by Pitt and Warmuth, and its generalization. Concerning these two notions of reducibility, we show that there exist a pair of prediction problems A and B, whose membership problems are polynomial time solvable, such that A is reducible to B with respect to the Turing reducibility, but not with respect to the prediction preserving reducibility. We show this result by making use of the notion of a class of polynomially sparse variants of a concept representation class. We first show that any class A of polynomially sparse variants of another class B is reducible to B with respect to the Turing reducibility'. We then prove the existence of a prediction problem R and a class R of polynomially sparse variants of R, such that R does not reduce to R with respect to the prediction preserving reducibility.
Setsuo ARIKAWA Satoru MIYANO Ayumi SHINOHARA Takeshi SHINOHARA Akihiro YAMAMOTO
The elementary formal system (EFS, for short) is a kind of logic program which directly manipulates character strings. This paper outlines in brief the authors' studies on algorithmic learning theory developed in the framework of EFS's. We define two important classes of EFS's and a new hierarchy of various language classes. Then we discuss EFS's as logic programs. We show that EFS's form a good framework for inductive inference of languages by presenting model inference system for EFS's in Shapiro's sense. Using the framework we also show that inductive inference from positive data and PAC-learning are both much more powerful than they have been believed. We illustrate an application of our theoretical results to Molecular Biology.