Eiji OHIRA Hirohiko SAGAWA Tomoko SAKIYAMA Masaru OHKI
This paper discusses sign word segmentation methods and extraction of motion features for sign language recognition. Because Japanese sign language grammar has not yet been systematized and because sign language does not have prepositions, it is more difficult to use grammar and meaning information in sign language recognition than in speech recognition. Segmentation significantly improves recognition efficiency, so we propose a method of dividing sign language based on rests and on the envelope and minimum of motion speed. The sign unit corresponding to a sign word is detected based on the divided position using such features as the change of hand shape. Experiments confirmed the validity of word segmentation of sign language based on the temporal structure of motion.
We introduce a generalized cln grammar (gclng), a generalization of a completely linearly nested context-free grammar (clncfg), of which variables are partitioned linearly and each rule satisfies similar conditions as those of clncfg related to its partition. We show that the class of languages generated by gclng's coincides with the class of quasi-rational languages, and consider the inclusion relations between languages generated by gclng's and those generated by clncfg's.
Jiro NAGANUMA Takeshi OGURA Tamio HOSHINO
This paper proposes a new environment for high-level VLSI design specification validation using "Algorithmic Debugging" and evaluates its benefits on three significant examples (a protocol processor, an 8-bit CPU, and a Prolog processor). A design is specified at a high-level using the structured analysis (SA) method, which is useful for analyzing and understanding the functionality to be realized. The specification written in SA is transformed into a logic programming language and is simulated in it. The errors (which terminate with an incorrect output in the simulation) included in the three large examples are efficiently located by answering junt a few queries from the algorithmic debugger. The number of interactions between the designer and the debugger is reduced by a factor of ten to a hundred compared to conventional simulation based validation methodologies. The correct SA specification can be automatically translated into a Register Transfer Level (RTL) specification suitable for logic synthesis. In this environment, a designer is freed from the tedious task of debugging a RTL specification, and can concentrate on the design itself. This environment promises to be an important step towards efficient high-level VLSI design specification validation.
Sachiko ANDO Ryuichi NAKANISHI Hiroyuki SEKI Tadao KASAMI
Lexical-functional grammars (lfg's) were introduced to define the syntax of natural languages. In lfg's, a finite set of attribute-value pairs called an f-structure is associated with each internal node in a derivation tree. For efficient parsing, some subclasses of lfg's were proposed. However, these subclasses have been shown to generate at least one -complete language. In this paper, we introduce a subclass of lfg's called pd-lfg's. In pd-lfg's, an f-structure forms a pushdown stack. For a node v in a derivation tree and at most one specified child vi of v, the f-structure of vi is obtained by performing a specified pushdown stack operation on the f-structure of v. We prove the equivalence of the generative capacity of modified head grammars (mhg's) and that of pd-lfg's. Since the languages generated by mhg's are known to be recognizable in O(n6) time, the languages generated by pd-lfg's can be recognized in O(n6) time.
Hiroyuki OHNISHI Hiroyuki SEKI Tadao KASAMI
Recognizable series is a model of a sequential machine. A recognizable series S is represented by a triple (λ,µ,γ), called a linear representation of S, where λ is a row vector of dimension n specifying the initial state, γ is a column vector of dimension n specifying the output at a state, and µ is a morphism from input words to nn matrices specifying the state transition. The output for an input word w is defined as λ(µw) γ, called the coefficient of w in S, and written as (S,w). We present an algorithm which constructs a reduced linear representation of an unknown recognizable series S, with coefficients in a commutative field, using coefficient queries and equivalence queries. The answer to a coefficient query, with a word w, is the coefficient (S, w) of w in S. When one asks an equivalence query with a linear representation (λ,µ,γ), if (λ,µ,γ) is a linear representation of S, yes is returned, and otherwise a word c such that λ (µc) γ
ACk is the class of problems solvable by an alternating Turing machine in space O(log n) and alternation depth O(logk n) [S. A. Cook, A taxonomy of problems with fast parallel algorithms, Inform. Contr. vol. 64]. We consider a game played by two persons: each player alternately moves a marker along an edge of a given digraph, and the first palyer who cannot move loses the game. It is shown that the problem to determine whether the first player can win the game on a digraph with n nodes exactly after logk n moves is complete for ACk nuder NC1 reducibility.
Ryuichi NAKANISHI Hiroyuki SEKI Tadao KASAMI
Learning correctly from queries" is a formal learning model proposed by Angluin. In this model, for a class Γ of language representations, a learner asks queries to a teacher of an unknown language Lq which can be represented by some GqΓ, and eventually outputs a language representation GΓ which represents Lq and halts. An algorithm (leaner) A is said to learn a class of languages represented by Γ in the weak definition if the time complexity of A is some polynomial of n and m, where n is the minimum size of the lagunage representations in Γ which represent Lq, and m is the maximum length of the counterexamples returned in an execution. On the other band, A is said to learn represented by Γ in the strong definition if at any point τ of the execution, the time consumed up to τ is some polynomial of n and m, where n is the same as above, and m is the maximum length of the counterexamples returned up to τ. In this paper, adequacy of the model is examined, and it is shown that both in the weak and strong definitions, there exist learners which extract a long counterexample, and identify Lq by using equivalence queries exhaustively. For example, there exists a learner which learns the class CFL of context-free languages represented by the class CFG of context-free grammars in the weak definition using only equivalence queries. Next, two restrictions concerning with learnability criteria are introduced. Proper termination condition is that when a teacher replies with yes" to an equivalence query, then the learner must halt immediately. The other condition, called LBC-condition, is that in the weak/strong definition, the time complexity must be some polynomial of n and log m. In this paper, it is shown that under these conditions, there still exist learners which execute exhaustive search. For instance, there exists a learner which learns CFL represented by CFG in the weak definition using membership queries and equivalence queries under the proper termination condition, and there also exists a learner that learns CFL represented by CFG in the strong definition using subset queries and superset queries under LBC-condition. These results suggest that the weak definition is not an adequate learning model even if the proper termination condition is assumed. Also, the model becomes inadequate in the strong definition if some combination of queries, such as subset queries and superset queries, is used instead of equivalence queries. Many classes of languages become learnable by our extracting long counterexample" technique. However, it is still open whether or not CFL represented by CFG is learnable in the strong definition from membership queries and equivalence queries, although the answer is known to be negative if at least one of (1) quadratic residues modulo a composite, (2) inverting RSA encryption, or (3) factoring Blum integers, is intractable.
Hiroshi SAWADA Yasuhiko TAKENAGA Shuzo YAJIMA
Binary decision diagrams (BDD's) are graph representations of Boolean functions, and at the same time they can be regarded as a computational model. In this paper, we discuss relations between BDD's and other computational models and clarify the computational power of BDD's. BDD's have the property that each variable is examined only once according to a total order of the variables. We characterize families of BDD's by on-line deterministic Turing machines and families of permutations. To clarify the computational power of BDD's, we discuss the difference of the computational power with respect to the way of reading inputs. We also show that the language TADGAP (Topologically Arranged Deterministic Graph Accessibility Problem) is simultaneously complete for both of the class U-PolyBDD of languages accepted by uniform families of polynomial-size BDD's and the clas DL of languages accepted by log-space bounded deterministic Turing machines. From the results, we can see that the problem whether U-PolyBDD U-NC1 is equivalent to a famous open problem whether DL U-NC1, where U-NC1 is the class of languages accepted by uniform families of log-depth constant fan-in logic circuits.
Let S(n) be a space constructible function such that S(n) log n. In this paper, we show that AuxSpTu (S(n),T(n)) NSPACE (S(n)log T(n)), where AuxSpTu (S(n),T(n)) is the class of languages accepted by nondeterministic auxiliary pushdown automata operating simultaneously in O(S(n)) space and O(T(n)) turns of the auxiliary tape head.
Yuichi KAJI Hiroyuki SEKI Tadao KASAMI
Finite state translation systems (fsts') are a widely studied computational model in the area of tree automata theory. In this paper, the string generating capacities of fsts' and their subclasses are studied. First, it is shown that the class of string languages generated by deterministic fsts' equals to that of parallel multiple context-free grammars, which are an extension of context-free grammars. As a corollary, it can be concluded that the recognition problem for a deterministic fsts is solvable in O(ne1)-time, where n is the length of an input word and e is a constant called the degree of the deterministic fsts'. In contrast to the latter fact, it is also shown that nondeterministic monadic fsts' with state-bound 2 can generate an NP-complete language.
J. Bresnan and R. M. Kaplan introduced lexical-functional grammars (LFGs, for short) as a new formalism for human language syntax. It is important to show formal properties of this kind of grammars in order to characterize the formal complexity of human languages. In this paper, we will show that the emptiness problem for LFGs is undecidable.
This paper introduces a constraint logic programming (CLP) language cu-Prolog as an implementation framework for constraint-based natural language processing. Compared to other CLP languages, cu-Prolog has several unique features. Most CLP languages take algebraic equations or inequations as constraints. cu-Prolog, on the other hand, takes Prolog atomic formulas in terms of user-defined predicates. cu-Prolog, thus, can describe symbolic and combinatorial constraints occurring in the constraint-based grammar formalisms. As a constraint solver, cu-Prolog uses the unfold/fold transformation, which is well known as a program transformation technique, dynamically with some heuristics. To treat the information partiality described with feature structures, cu-Prolog uses PST (Partially Specified Term) as its data structure. Sections 1 and 2 give an introduction to the constraint-based grammar formalisms on which this paper is based and the outline of cu-Prolog is explained in Sect. 3 with implementation issues described in Sect. 4. Section 5 illustrates its linguistic application to disjunctive feature structure (DFS) and parsing constraint-based grammar formalisms such as Japanese Phrase Structure Grammar (JPSG). In either application, a disambiguation process is realized by transforming constraints, which gives a picture of constraint-based NLP.
A case structure expression is one of the most important forms to represent the meaning of the sentence. Case structure analysis is usually performed by consulting case frame information in a verb dictionary. However, this analysis is very difficult because of several problems, such as word sense ambiguity and structural ambiguity. A conventional method for solving these problems is to use the method of selectional restriction, but this method has a drawback in the semantic marker (SM) method --the trade-off between descriptive power and construction cost. In this paper, we propose a method of case structure analysis based on examples in case frame dictionary This method uses the case frame dictionary which has some typical example sentences for each case frame, and it selects a proper case frame for an input sentence by matching the input sentence with the examples in the case frame dictionary. The best matching score, which is utilized for selecting a proper case frame for a predicate, can be considered as the score for the case structure of the predicate. Therefore, when there are two or more readings for a sentence because of structural ambiguity, the best reading of a sentence can be selected by evaluating the sum of the scores for the case structures of all predicates in a sentence. We report on experiments which shows that this method is superior to the conventional, coarse-grained SM method, and also describe the superiority of the example-based method over the SM method.
Kenji KITA Tsuyoshi MORIMOTO Kazumi OHKURA Shigeki SAGAYAMA Yaneo YANO
This paper describes Japanese spoken sentence recognition using hybrid language modeling, which combines the advantages of both syntactic and stochastic language models. As the baseline system, we adopted the HMM-LR speech recognition system, with which we have already achieved good performance for Japanese phrase recognition tasks. Several improvements have been made to this system aimed at handling continuously spoken sentences. The first improvement is HMM training with continuous utterances as well as word utterances. In previous implementations, HMMs were trained with only word utterances. Continuous utterances are included in the HMM training data because coarticulation effects are much stronger in continuous utterances. The second improvement is the development of a sentential grammar for Japanese. The sentential grammar was created by combining inter- and intra-phrase CFG grammars, which were developed separately. The third improvement is the incorporation of stochastic linguistic knowledge, which includes stochastic CFG and a bigram model of production rules. The system was evaluated using continuously spoken sentences from a conference registration task that included approximately 750 words. We attained a sentence accuracy of 83.9% in the speaker-dependent condition.
This paper presents a new method for resolving lexical (word sense) ambiguities inherent in natural language sentences. The Sentence Analyzer (SENA) was developed to resolve such ambiguities by using constraints and example-based preferences. The ambiguities are packed into a single dependency structure, and grammatical and lexical constraints are applied to it in order to reduce the degree of ambiguity. The application of constraints is realized by a very effective constraint-satisfaction technique. Remaining ambiguities are resolved by the use of preferences calculated from an example-base, which is a set of fully parsed word-to-word dependencies acquired semi-automatically from on-line dictionaries.
Yasunori ISHIHARA Hiroyuki SEKI Tadao KASAMI Jun SHIMABUKURO Kazuhiko OKAWA
This paper presents a method of translating natural language specifications of communication protocols into algebraic specifications. Such a natural language specification specifies action sequences performed by the protocol machine (program). Usually, a sentence implicitly specifies the state of the protocol machine at which the described actions must be performed. The authors propose a method of analyzing the implicitly specified states of the protocol machine taking the OSI session protocol specification (265 sentences) as an example. The method uses the following properties: (a) syntactic properties of a natural language (English in this paper); (b) syntactic properties introduced by the target algebraic specifications, e.g., type constraints; (c) properties specific to the target domain, e.g., properties of data types. This paper also shows the result of applying this method to the main part of the OSI session protocol specification (29 paragraphs, 98 sentences). For 95 sentences, the translation system uniquely determines the states specified implicitly by these sentences, using only (a) and (b) described above. By using (c) in addition, each implicitly specified state in the remaining three sentences is uniquely determined.
This paper describes an error-correcting parser (ec-parser) for context-free languages that is an extension of the Leiss's parser. Since the ec-parser uses precomputed informations and a pruning technique by lookahead, the ec-parser is always faster than the Lyon's parser. Several examples are shown.
Knowledge-based Database Assistant is an expert system designed to help novice users formulate correct and complete database queries. This paper describes a knowledge-based database assistant with advanced facilities such as (1) a menu-based querymaking guidance, (2) a menu-based natural-language user-interface, and (3) database-commands generator which formulates formal database queries with SQL language. The system works as an intelligent front-end to an SQL database system or a computer-aided SQL tutorial-system. In this paper, we also discuss a semantic-network model, named S-Net, which is used to represent the knowledge for formal database-query formulating processes. The menu-based English user-interface allows end-users to make a query by filling a certain query pattern with appropriate words. The query-pattern filling process is guided by pop-up menus provided by the system. The query-pattern instances thus obtained are then translated into formal database queries. The translation is carried out by evaluating operations on S-Net knowledge-base which conveys knowledge about application domain, and the underlying database schema.
Mitsuo WAKATSUKI Etsuji TOMITA
We are concerned with a subclass of deterministic pushdown automata (dpda) called very simple dpda's, and present a new direct branching algorithm for checking the inclusion for a pair of languages accepted by these dpda's. As usual, we take the maximal thickness (i.e., the length of the shortest input strings that make each stack symbol go to empry) of all stack symbols into account as one parameter of the given dpda's. Then the worst-case time complexity of our algorithm is polynomial with respect to these parameters. Without considering the thickness, the complexity is single exponential in the description length of the given dpda's. As far as we are concerned with very simple dpda's, our algorithm is very simple and direct, and is faster and much better than the previously given algorithms for the inclusion problem of dpda's.
P. N. SANKARSHANAN Hideaki KOBAYASHI Pankaj KUKKAL Hiroyuki KANBARA
This paper presents a description and an analysis of three standard" hardware description languages (HDLs): Very High Speed Integrated Circuit HDL (VHDL), Verilog-HDL, and Unified Design Language for Integrated Circuits (UDL/I), Kyoto University Education Chip (KUE-Chip) is used as a design benchmark to compare the features and syntax of VHDL, Verilog-HDL, and UDL/I.