1-6hit |
Alfredo M. MAEDA Hideto TOMABECHI Jun-ichi AOE
Graph unification is doubtlessly the most expensive process in unification-based grammar parsing since it takes the vast majority of the total parsing time of natural language sentences. A parsing time overload in unification consists in that, in general, no less than 60% of the graph unifications performed actually fail. Thus one way to achieve unification time speed-up is focusing on an efficient, fast way to deal with such unification failures. In this paper, a process, prior to unification itself, capable of filtering or stopping a considerably high percentage of graphs that would fail unification is proposed. This unification-filtering process consists of comparison of signatures that correspond to each one of the graphs to be unified. Unification-filter (hereafter UF) is capable of stopping around 87% of the non-unifiable graphs before unification itself takes place. UF takes significantly less time to detect graphs that do not unify and discard them than it would take to unification to fail the attempt to unify the same graphs. As a result of using UF, unification is performed in an around 71% of the time for the fastest known unification algorithm.
Masami SHISHIBORI Makoto OKADA Tooru SUMITOMO Jun-ichi AOE
In many applications, information retrieval is a very important research field. In several key strategies, the binary trie is famous as a fast access method able to retrieve keys in order. Especially, a Patricia trie gives the shallowest trie by eliminating all nodes which have only one arc, and it requires the smallest storage among the other trie structures. If trie structures are implemented, however, the greater the number of the registered keys, the larger storage is required. In order to solve this problem, Jonge et al. proposed a method to change the normal binary trie into a compact bit stream. This paper proposes the improved trie representation for the Patricia trie, as well as the methods for searching and inserting the key on it. The theoretical and experimental results, using 50,000 Japanese nouns and 50,000 English words, show that this method generates 25-39 percent shorter bit streams than the traditional method. This method, thus, enables us to provide more compact storage and faster access than the traditional method.
Masami SHISHIBORI Takeshi ARITA Hisatoshi MOCHIZUKI Jun-ichi AOE
In accordance with the diffusion of applications, such as the Desk Top Publishing system, the Document Formatting system and the Document Editing system, it is easy to make a document by using a computer. However, as for allocating the diagrams (figures and tables), there are few document processing systems able to allocate diagrams on the appropriate places automatically. In a document processing system it is a very important issue to allocate diagrams on the most suitable places. This paper defines the criteria for allocating diagrams on the suitable positions by investigating published papers. These criteria concern 1) the order of diagrams to be allocated, 2) the stability of the diagram allocations, 3) the distance between the diagram and the location of the corresponding first reference in the text, 4) the allocation balance of diagrams in a text, 5) the restricted areas where diagrams shouldn't be allocated, 6) the allocation priorities between diagrams of different width. Moreover, this paper proposes a method for deciding the diagram allocations satisfying the above criteria automatically and fast on document formatting systems. In this case we have limited its application to one type of ducuments, which is papers. Especially, this method can skillfully allocate diagrams of different width on the page by reallocating the diagrams and texts within it, and can allocate diagrams over the document uniformly.
Masami SHISHIBORI Kazuaki ANDO Yuuichirou KASHIWAGI Jun-ichi AOE
Natural language interface systems can accept more unrestricted queries from users than other systems, however it is impossible to understand erroneous sentences which include the syntax errors, unknown words and misspelling. In order to realize the superior natural language interface, the automatic error correction for erroneous sentences is one of problems to be solved. The method to apply the LR parsing strategies is one of the famous approaches as the robust error recovery scheme. This method is able to obtain a high correction accuracy, however it takes a great deal of time to parse the sentence, such that it becomes a very important task to improve the time-cost. In this paper, we propose the method to improve the time efficiency, keeping the correction accuracy of the traditional method. This method makes use of a new parsing table that denotes the states to be transited after accepting each symbol. By using this table, the symbol which is allocated just after the error position can be utilized for selecting correction symbols, as a result, the number of candidates produced on the correction process is reduced, and fast system can be realized. The experiment results, using 1,050 sentences including error characters, show that this method can correct error points 69 times faster than the traditional method, also keep the same correction accuracy as the traditional method.
Yong-Seok LEE Hideto TOMABECHI Jun-ichi AOE
Tomita's parsing method (GLR) is a practical and successful parsing method for natural language. However, one difficulty in the GLR is that interleaved constraint processing of syntax and semantics in parallel is not trivial during parsing, because it uses the precompiled table for a fast real-time parsing. In this paper, we present a method which makes the GLR adaptable to interleaved parsing while making some limitation on its generality. For interleaved parsing, the conflicts of the LR parsing table must be resolved at the parse time. The shift-reduce conflict among the above conflicts is the most serious one for interleaved parsing because of the lack of knowledge for the conflict resolution at the parse time. Therefore, we concentrate on resolving a shift-reduce conflict by introducing a grammar which is called a shift-first LR (k) grammar. Our method for this is that the conflict resolution is delayed by the shift-first strategy which makes an unconditional choice of shift actions in the case of a shift-reduce conflict. Then, a delayed resolution that resolves the conflict, is made. Depending on the decision of the resolution, the pseudo parsing, which parses symbols in the LR parser stack, proceeds. Our experiments showed that our parser is efficient while attaining the interleaved parsing at real time.
Ingrid KIRSCHNING Jun-Ichi AOE
The Time-Slicing paradigm is a newly developed method for the training of neural networks for speech recognition. The neural net is trained to spot the syllables in a continuous stream of speech. It generates a transcription of the utterance, be it a word, a phrase, etc. Combined with a simple error recovery method the desired units (words or phrases) can be retrieved. This paradigm uses a recurrent neural network trained in a modular fashion with natural connectionist glue. It processes the input signal sequentially regardless of the input's length and immediately extracts the syllables spotted in the speech stream. As an example, this character string is then compared to a set of possible words, picking out the five closest candidates. In this paper we describe the time-slicing paradigm and the training of the recurrent neural network together with details about the training samples. It also introduces the concept of natural connectionist glue and the recurrent neural network's architecture used for this purpose. Additionally we explain the errors found in the output and the process to reduce them and recover the correct words. The recognition rates of the network and the recovery rates for the words are also shown. The presented examples and recognition rates demonstrate the potential of the time-slicing method for continuous speech recognition.