Kazuhiro TAKADA Yuichi KAJI Tadao KASAMI
Some kind of practical problems such as security verification of cryptographic protocols can be described as a problem to accomplish a given purpose by using limited operations and limited materials only. To model such problems in a natural way, unification problems under constrained substitutions have been proposed. This paper is a collection of results on the decidability and the computational complexity of a syntactic unification problem under constrained substitutions. A number of decidable, undecidable, tractable and intractable results of the problem are presented. Since a unification problem under constrained substitutions can be regarded as an order-sorted unification problem with term declarations such that the number of sorts is only one, the results presented in this paper also indicate how the intractability of order-sorted unification problems is reduced by restecting the number of sorts to one.
Martin BOUDA Yoshiaki NAKANO Kunio TADA
Extremely compact multi-mode interference (MMI) devices using central wide-angle coupling of input and output waveguides are proposed. It is shown that MMI can be used to change the propagation direction of light without the need for corner mirrors or bent waveguides. The concept can also be used for very compact power splitters which are even smaller than conventional MMI power splitters. Coupling between waveguides at wide angles is discussed and a number of regularities are found. The results can be useful for the design of more compact integrated circuits by a reduction of the number of bent waveguides which usually take up the largest part of the area of a photonic integrated circuit.
Satoshi UEHARA Kyoki IMAMURA Takayasu KAIDA
Firstly we show a usuful property of the fast algorithm for computing linear complexities of p-ary periodic sequences with period pn (p: a prime). Secondly the property is successfully applied to obtain the value distribution of the linear complexity for p-ary periodic sequences with period pn.
Chang Su LEE Chong-Ho CHOI Young CHOI Se Ho CHOI
The defects in the cold rolled strips have textural characteristics, which are nonuniform due to its irregularities and deformities in geometrical appearance. In order to handle the textural characteristics of images with defects, this paper proposes a surface inspection method based on textural feature extraction using the wavelet transform. The wavelet transform is employed to extract local features from textural images with defects both in the frequency and in the spatial domain. To extract features effectively, an adaptive wavelet packet scheme is developed, in which the optimum number of features are produced automatically through subband coding gain. The energies for all subbands of the optimal quadtree of the adaptive wavelet packet algorithm and four entropy features in the level one LL subband, which correspond to the local features in the spatial domain, are extracted. A neural network is used to classify the defects of these features. Experiments with real image data show good training and generalization performances of the proposed method.
Kisa MATSUSHIMA Susumu TAKANASHI
Compressible viscous flows past a space plane have been elucidated by parallel computation on the NWT. The NWT is a vector-parallel architecture computer system which achieves remarkably high performance in processing speed and memory storage. We have examined the advantages of the NWT in order to simulate realistic flow problems in engineering, such as the investigation of global and local aerodynamic characteristics of a space plane. The accuracy of the computational results has been verified by comparison with experimental data. The simplified domain-decomposition technique introduced here is easy to apply for parallel implementation to significantly improve the acceleration rate of computations. The larger available memory storage enables us to conduct a grid refinement study through which several points concerning CFD simulation of a space plane are obtained.
Masahisa SHIMIZU Yasuhiro OUE Kazumasa OHNISHI Toru KITAMURA
Because a massively parallel computer processes vast amounts of data and generates many access requests from multiple processors simultaneously, parallel secondary storage requires large capacity and high concurrency. One effective method of implementation of such secondary storage is to use disk arrays which have multiple disks connected in parallel. In this paper, we propose a parallel file access method named DECODE (dynamic express changing of data entry) in which load balancing of each disk is achieved by dynamic determination of the write data position. For resolution of the problem of data fragmentation which is caused by the relocation of data during a write process, the concept of "Equivalent Area" is introduced. We have performed a preliminary performance evaluation using software simulation under various access statuses by changing the access pattern, access size and stripe size and confirmed the effectiveness of load balancing with this method.
The first optimizing compiler was developed at IBM in order to prove that high level language programming could be as efficient as hand-coded machine language. Computer architecture and compiler optimization interacted through a feedback loop, from the high-level language computer architectures of the 1970s to the RISC machines of the 1980s. In the supercomputing community, the availability of effective vectorizing compilers delivered easy-to-use performance in the 1980s to the present. These compilers were successful at least in part because they could predict poor performance spots in the program and report these to users. This fostered a feedback loop between programmers and compilers to develop high performance programs. Future optimizing compilers for high performance computers and supercomputers will have to take advantage of both feedback loops.
Naohisa TAKAHASHI Takeshi MIEI
We present a general framework with which we can evaluate the flexibility and efficiency of various replay systems for parallel programs. In our approach, program monitoring is modeled by making a virtual dataflow program graph, referred to as a VDG, that includes all the instructions executed by the program. The behavior of the program replay is modeled on the parallel interpretation of a VDG based on two basic parallel execution models for dataflow program graphs: a data-driven model and a demand-driven model. Previous attempts to replay parallel programs, known as Instant Replay and P-Sequence, are also modeled as variations of the data-driven replay, i.e. the datadriven interpretation of a VDG. We show that the demand-driven replay, i.e. the demand-driven interpretation of a VDG, is more flexible in program replay than the data-driven replay since it allows better control of parallelism and a more selective replay. We also show that we can implement a demand-driven replay that requires almost the same amount of data to be saved during program monitoring as does the data-driven replay, and which eliminates any centralized bottleneck during program monitoring by optimizing the demand propagation and using an effective data structure.
Yoshimasa OHNISHI Yoshinari SUGIMOTO Toshinori SUEYOSHI
We conducted research and development of Distributed Supercomputing Environment (DSE) based on distributed shared memory model to serve as a cluster computing environment to provide parallel processing facilities. Shared memory model and message passing model are well-known typical models of parallel processing. It is desired that hybrid programming environment will make the best use of the prominent features of both models. Consequently, we add a new message passing mechanism to present DSE, and create a prototype called Hybrid DSE as a hybrid model based cluster computing environment. In this paper, we describe the implementation of a message passing mechanism on DSE and performance evaluation of Hybrid DSE.
Kazuyoshi TAKAGI Koyo NITTA Hironori BOUNO Yasuhiko TAKENAGA Shuzo YAJIMA
Ordered Binary Decision Diagrams (OBDDs) are graph-based representations of Boolean functions which are widely used because of their good properties. In this paper, we introduce nondeterministic OBDDs (NOBDDs) and their restricted forms, and evaluate their expressive power. In some applications of OBDDs, canonicity, which is one of the good properties of OBDDs, is not necessary. In such cases, we can reduce the required amount of storage by using OBDDs in some non-canonical form. A class of NOBDDs can be used as a non-canonical form of OBDDs. In this paper, we focus on two particular methods which can be regarded as using restricted forms of NOBDDs. Our aim is to show how the size of OBDDs can be reduced in such forms from theoretical point of view. Firstly, we consider a method to solve satisfiability problem of combinational circuits using the structure of circuits as a key to reduce the NOBDD size. We show that the NOBDD size is related to the cutwidth of circuits. Secondly, we analyze methods that use OBDDs to represent Boolean functions as sets of product terms. We show that the class of functions treated feasibly in this representation strictly contains that in OBDDs and contained by that in NOBDDs.
Barbara M. CHAPMAN Piyush MEHROTRA Hans P. ZIMA
Highly parallel scalable multiprocessing systems (HMPs) are powerful tools for solving large-scale scientific and engineering problems. However, these machines are difficult to program since algorithms must exploit locality in order to achieve high performance. Vienna Fortran was the first fully specified data-parallel language for HMPs that provided features for the specification of data distribution and alignment at a high level of abstraction. In this paper we outline the major elements of Vienna Fortran and compare it to High Performance Fortran (HPF), a de-facto standard in this area. A significant weakness of HPF is its lack of support for many advanced applications, which require irregular data distributions and dynamic load balancing. We introduce HPF +, an extension of HPF based on Vienna Fortran, that provides the required functionality.
Kazuhito SHIDA Kaoru OHNO Masayuki KIMURA Yoshiyuki KAWAZOE
A large scale simulation for polymer chains in good solvent is performed. The implementation technique for efficient parallel execution, optimization, and load-balancing are discussed on this practical application. Finally, a simple performance model is proposed.
In this paper, we introduce a computational mode of a tree transducer called a bi-stage transducer and study its properties. We consider a mapping on trees realized by composition of any sequence of top-down transducers and bottom-up transducers, and call such a mapping a multi-phase tree transformation. We think a multi-phase tree transformation is sufficiently powerful. It is shown that in the case of rank-preserving transducers, a multi-phase tree transformation is realized by a bi-stage transducer.
This paper proposes a unified approach by means of the binary decision diagram, BDD in short, to solve #P-hand problems of counting the number of paths between two terminals in undirected and directed graphs. Our approach provides algorithms running in O (2O (n) ) time for typical planar graphs such as grid graphs. In fact, for any class of graphs having a good elimination ordering, this paradigm provides efficient solutions.
This paper discusses the largest common similar substructure (in short, LCSS) problem for trees. The problem is, for all pairs of "substructure of A and that of B," to find one of them, denoted by A and B', such that A is most similar to B' and the sum of the number of vertices of A and that of B' is largest. An algorithm for the LCSS problem for unrooted and unordered trees (in short, trees) and that for trees embedded in a plane (in short, Co-trees) are proposed. The time complexity of the algorithm for trees is O (max (ma, mb)2 NaNb) and that for CO-trees is O (mambNaNb), where, ma (mb) and Na (Nb) are the largest degree of a vertex of tree Ta (Tb) and the number of vertices of Ta (Tb), respectively. It is easy to modify the algorithms for enumerating all of the LCSSs for trees and CO-trees. The algorithms can be applied to structure-activity studies in chemistry and various structure comparison problems.
Kaoru WATANABE Hiroshi TAMURA Keisuke NAKANO Masakazu SENGOKU
In this paper we extend the p-collection problem to a flow network with lower bounds, and call the extended problem the lower-bounded p-collection problem. First we discuss the complexity of this problem to show NP-hardness for a network with path structure. Next we present a linear time algorithm for the lower-bounded 1-collection problem in a network with tree structure, and a pseudo-polynomial time algorithm with dynamic programming type for the lower-bounded p-collection problem in a network with tree structure. Using the pseudo-polynomial time algorithm, we show an exponential algorithm, which is efficient in a connected network with few cycles, for the lower-bounded p-collection problem.
Takeshi FUKUDA Yasuhiko MORIMOTO Shinichi MORISHITA Takeshi TOKUYAMA
In this paper, we investigate inverse problems of the interval query problem in application to data mining. Let I be the set of all intervals on U = {1, 2, , n}. Consider an objective function f(I), conditional functions ui(I) on I, and define an optimization problem of finding the interval I maximizing f(I) subject to ui(I) > Ki for given real numbers Ki (i = 1, 2, , h). We propose efficient alogorithms to solve the above optimization problem if the objective function is either additive or quotient, and the conditional functions are additive, where a function f is additive if f(I) = ΣiIf^(i) extending a function f^ on U, and quotient if it is represented as a quotient of two additive functions. We use computational-geometric methods such as convex hull, range searching, and multidimensional divide-and-conquer.
We frequently use a polynomial to approximate a complex function. This study shows a method which determines the optimum coefficients and the number of terms of the polynomial, and the error of the polynomial is estimated.
Akimasa YOSHIDA Ken'ichi KOSHIZUKA Wataru OGATA Hironori KASAHARA
This paper proposes a data-localization scheduling scheme inside a processor-cluster for multigrain parallel processing, which hierarchically exploits parallelism among coarsegrain tasks like loops, medium-grain tasks like loop iterations and near-fine-grain tasks like statements. The proposed scheme assigns near-fine-grain or medium-grain tasks inside coarse-grain tasks onto processors inside a processor-cluster so that maximum parallelism can be exploited and inter-processor data transfer can be minimum after data-localization for coarse-grain tasks across processor-clusters. Performance evaluation on a multiprocessor system OSCAR shows that multigrain parallel processing with the proposed data-localization scheduling can reduce execution time for application programs by 10% compared with multigrain parallel processing without data-localization.
Sam APPLETON Shannon MORTON Michael LIEBELT
In this paper we describe the implementation of complex architectures using a general design approach for two-phase asynchronous systems. This fundamental approach, called Event Controlled Systems, can be used to widely extend the utility of two phase systems. We describe solutions that we have developed that dramatically improve the performance of static and dynamic-logic asynchronous pipelines, and briefly describe a complex microprocessor designed using ECS.