Lan ZHANG Tokio OKAZAKI Katsushi INOUE Akira ITO Yue WANG
This paper introduces a probabilistic rebound automaton (pra), and investigates its accepting power and closure property. We show that (1) the class of languages recognized by pra's with error probability less than 1/2, PRA, is incomparable with the class of context-free languages, (2) there is a language accepted by a two-way nondeterministic one counter automaton, but not in PRA, and (3) there is a language accepted by a deterministic one-marker rebound automaton, but not in PRA. We also show that PRA is not closed under concatenation and Kleene + .
Kiyotaka ATSUMI Shigeru MASUYAMA
This paper studies the ambiguity reduction ability of a probabilistic context-free grammar. We theoretically analyze a common behavior of any probabilistic context-free grammar. Moreover, we confirm by experiments that a probabilistic context-free grammar learnt from Japanese corpus has the ambiguity reduction ability as expected by the theoretical analysis.
We introduce a subclass of context-free languages, called pure context-free (PCF) languages, which is generated by context-free grammars with only one type of symbol (i. e. , terminals and nonterminals are not distinguished), and consider the problem of identifying paralleled even monogenic pure context-free (pem-PCF) languages, PCF languages with restricted and enhanced features, from positive data only. In this paper we show that the ploblem of identifying the class of pem-PCF languages is reduced to the ploblem of identifying the class of monogenic PCF (mono-PCF), by decomposing each string of pem-PCF languages. Then, with its result, we show that the class of pem-PCF languages is polynomial time identifiable in the limit from positive data. Further, we refer to properties of its identification algorithm.
In this paper, we introduce a parallel algorithm for parsing context-free languages. Our algorithm can handle arbitrary context-free grammars since it is based on Earley's algorithm. Our algorithm can operate on any loosely coupled multiprocessor which can provide a topology of a one-way ring. Our algorithm uses p processors to parse an input string of length n where 1 p n. It is shown that our algorithm requires O(n3/p) time. The algorithm uses a simple job allocation strategy. However, it achieves high load balancing and uses the processors efficiently.
Noriyuki TANIDA Takashi YOKOMORI
A subclass of context-free languages, called pure context-free languages, which is generated by context-free grammar with only one type of symbol (i.e., terminals and nonterminals are not distinguished), is introduced and the problem of identifying from positive data a restricted class of monogenic pure context-free languages (mono-PCF languages, in short) is investigated. The class of mono-PCF languages is incomparable to the class of regular languages. In this paper we show that the class of mono-PCF languages is polynomial time identifiable from positive data. That is, there is an algorithm that, given a mono-PCF language L, identifies from positive data, a grammar generating L, called a monogenic pure context-free grammar (mono-PCF grammar, in short) satisfying the property that the time for updating a conjecture is bounded by O(N3), where N is the sum of lengths of all positive data provided. This is in contrast with another result in this paper that the class of PCF languages is not identifiable in the limit from positive data.
String grammars (languages) have been extensively studied from 60's. On the other hand, the transformational grammar, proposed by N. Chomsky, contains the transformation from the set of derivation trees of context-free language to the surface set. And the grammar regarded a tree as an input sentence to some transducer. After that from latter half of 60's, the studies of acceptors, transducers, and so on, whose input is a tree, have been studied extensively. And recently some pushdown tree automata were introduced, and their fundamental properties and some other various properties were investigated [11]-[17]. Furthermore, a top-down pushdown tree transducer (t-PDTT for short), which is an extension of a top-down pushdown automaton (t-PDTA for short), was introduced and its fundamental properties were investigated [19]. In this paper, we define the various subclasses of context-free tree grammar (CFTG for short) by the combination of variables contained in the rules. Furthermore, we consider a monadic case of CFTG which is a special case of CFTG. Based on these definitions, we classify the subclasses of CFTG, and we investigate some inclusion properties of subclasses of CFTL (where CFTL indicates the class of context-free tree languages).
For the purpose of automatic speech recognition, language models (LMs) are used to predict possible succeeding words for a given partial word sequence and thereby to reduce the search space. In this paper several kinds of stochastic language models (SLMs) are evaluated-bigram, trigram, hidden Markov model (HMM), bigram-HMM, stochastic context-free grammar (SCFG) and hand-written Bunsetsu Grammar. To compare the predictive power of these SLMs, the evaluation was conducted from two points of views: (1) relationship between the number of model parameters and entropy, (2) predictive rate of succeeding part of speech (POS) and succeeding word. We propose a new type of bigram-HMM and compare it with the other models. Two kinds of approximations are tried and examined through experiments. Results based on both of English Brown-Corpus and Japanese ATR dialog database showed that the extended bigram-HMM had better performance than the others and was more suitable to be a language model.
Kiyotaka ATSUMI Shigeru MASUYAMA
We propose a parallel parsing algorithm based on Earley's method, which works in O(log2n) time using O(n4.752) processors on CREW PRAM. This algorithm runs with less number of precessors compared with previously proposed W. Rytter's algorithm.
We introduce a generalized cln grammar (gclng), a generalization of a completely linearly nested context-free grammar (clncfg), of which variables are partitioned linearly and each rule satisfies similar conditions as those of clncfg related to its partition. We show that the class of languages generated by gclng's coincides with the class of quasi-rational languages, and consider the inclusion relations between languages generated by gclng's and those generated by clncfg's.
As a generalization of the tree automaton, tree automata with various types of memory are introduced and their relation to context-free grammars with memory is studied. Relations between computation trees of tree automata with memory and derivation trees of context-free grammars with memory are established, and as a consequence, the languages generated by context-free grammars with memory are characterized in terms of the sets of trees recognizable by tree automata with memory. Also various types of traversal of labeled trees recognizable by tree automata with memory are considered.
Yuichi KAJI Hiroyuki SEKI Tadao KASAMI
Finite state translation systems (fsts') are a widely studied computational model in the area of tree automata theory. In this paper, the string generating capacities of fsts' and their subclasses are studied. First, it is shown that the class of string languages generated by deterministic fsts' equals to that of parallel multiple context-free grammars, which are an extension of context-free grammars. As a corollary, it can be concluded that the recognition problem for a deterministic fsts is solvable in O(ne1)-time, where n is the length of an input word and e is a constant called the degree of the deterministic fsts'. In contrast to the latter fact, it is also shown that nondeterministic monadic fsts' with state-bound 2 can generate an NP-complete language.
This paper describes an error-correcting parser (ec-parser) for context-free languages that is an extension of the Leiss's parser. Since the ec-parser uses precomputed informations and a pruning technique by lookahead, the ec-parser is always faster than the Lyon's parser. Several examples are shown.
Some problems in formal language theory are considered and are shown to be deterministic exponential time complete. They include the problems for a given context-free grammar G, a nondeterministic finite automaton M, a deterministic pushdown automaton MD, of determining whether L(G)L(M), and whether L(MD)L(M). Polynomial time reductions are presented from the pebble game problem, known to be deterministic exponential time complete, to each of these problems.
CFGs (context-free grammars) with various types of memory are introduced and their generative capacities are investigated. For an automata-theoretic characterization, a new type of automaton called partitioning automaton is introduced and it is shown that the class of languages generated by CFGs with memory type X is equal to the class of languages accepted by partitioning automata of type X.
Yuichi KAJI Ryuichi NAKANISHI Hiroyuki SEKI Tadao KASAMI
Parallel multiple context-free grammars (pmcfg's) and multiple context-free grammars (mcfg's) were introduced as extensions of context-free grammars to describe the syntax of natural languages. Pmcfg's and mcfg's deal with tuples of strings, and it has been shown that the universal recognition problem for mcfg's is EXP-POLY time-complete where the universal recognition problem is the problem to decide whether G generates w for a given grammar G and string w. In this paper, the universal recognition problems for the class of pmcfg's and for the subclass of pmcfg's with the information-lossless condition are shown to be EXP-POLY time-complete and PSPACE-complete, respectively. It is also shown that the problems for pmcfg's and for mcfg's with a bounded dimension are both -complete and those for pmcfg's and for mcfg's with a bounded degree are both -complete. As a corollary, the problem for modified head grammars introduced by Vijay-Shanker, et al. to define the syntax of natural languages is shown to be in deterministic polynomial time.
Yuichi KAJI Ryuichi NAKANISI Hiroyuki SEKI Tadao KASAMI
Multiple context-free grammars (mcfg's) are a subclass of generalized context-free grammars introduced by Pollard in order to describe the syntax of natural languages. First, this paper shows that the universal recognition problem for mcfg's is EXP-POLY time-complete, where the universal recognition problem is the one to decide whether G generates w for a given grammar G and string w. Next, it is shown that the problem for linear context-free rewriting systems introduced by Vijay-Shanker et al., which is a proper subclass of mcfg's, is PSPACE-complete.