Byungho KIM Boseob KWON Hyunsoo YOON Jung Wan CHO
Multipath interconnection networks can support higher bandwidth than those of nonblocking networks by passing multiple packets to the same output simultaneously and these packets are buffered in the output buffer. The delay-throughput performance of the output buffer in multipath networks is closely related to output traffic distribution, packet arrival process at each output link connected to a given output buffer. The output traffic distributions are different according to the various input traffic patterns. Focusing on nonuniform output traffic distributions, this paper develops a new, general analytic model of the output buffer in multipath networks, which enables us to investigate the delay-throughput performance of the output buffer under various input traffic patterns. This paper also introduces Multipath Crossbar network as a representative multipath network which is the base architecture of our analysis. It is shown that the output buffer performances such as packet loss probability and delay improve as nonuniformity of the output traffic distribution becomes larger.
Hideharu YAHATA Yoji NISHIO Kunihiro KOMIYAJI Hiroshi TOYOSHIMA Atsushi HIRAISHI Yoshitaka KINOSHITA
A 167-MHz 1-Mbit CMOS synchronous cache SRAM was developed using 0.40-µm process technology. The floor plan was designed so that the address registers are located in the center of the chip, and high-speed circuits were developed such as the quasi latch (QL) sense amplifier and the one-shot control (OSC) output register. To maintain suitable setup and hold time margins, an equivalent margin (EM) design method was developed. 167-MHz operation was measured at a supply voltage of 2.5 V and an ambient temperature of 75. The same margins 1.1 ns of the setup time and hold time were measured for the specifications of a setup time of 2.0 ns and a hold time of 0.5 ns.
Takeshi FUKUDA Yasuhiko MORIMOTO Shinichi MORISHITA Takeshi TOKUYAMA
In this paper, we investigate inverse problems of the interval query problem in application to data mining. Let I be the set of all intervals on U = {1, 2, , n}. Consider an objective function f(I), conditional functions ui(I) on I, and define an optimization problem of finding the interval I maximizing f(I) subject to ui(I) > Ki for given real numbers Ki (i = 1, 2, , h). We propose efficient alogorithms to solve the above optimization problem if the objective function is either additive or quotient, and the conditional functions are additive, where a function f is additive if f(I) = ΣiIf^(i) extending a function f^ on U, and quotient if it is represented as a quotient of two additive functions. We use computational-geometric methods such as convex hull, range searching, and multidimensional divide-and-conquer.
Masahisa SHIMIZU Yasuhiro OUE Kazumasa OHNISHI Toru KITAMURA
Because a massively parallel computer processes vast amounts of data and generates many access requests from multiple processors simultaneously, parallel secondary storage requires large capacity and high concurrency. One effective method of implementation of such secondary storage is to use disk arrays which have multiple disks connected in parallel. In this paper, we propose a parallel file access method named DECODE (dynamic express changing of data entry) in which load balancing of each disk is achieved by dynamic determination of the write data position. For resolution of the problem of data fragmentation which is caused by the relocation of data during a write process, the concept of "Equivalent Area" is introduced. We have performed a preliminary performance evaluation using software simulation under various access statuses by changing the access pattern, access size and stripe size and confirmed the effectiveness of load balancing with this method.
Kisa MATSUSHIMA Susumu TAKANASHI
Compressible viscous flows past a space plane have been elucidated by parallel computation on the NWT. The NWT is a vector-parallel architecture computer system which achieves remarkably high performance in processing speed and memory storage. We have examined the advantages of the NWT in order to simulate realistic flow problems in engineering, such as the investigation of global and local aerodynamic characteristics of a space plane. The accuracy of the computational results has been verified by comparison with experimental data. The simplified domain-decomposition technique introduced here is easy to apply for parallel implementation to significantly improve the acceleration rate of computations. The larger available memory storage enables us to conduct a grid refinement study through which several points concerning CFD simulation of a space plane are obtained.
Barbara M. CHAPMAN Piyush MEHROTRA Hans P. ZIMA
Highly parallel scalable multiprocessing systems (HMPs) are powerful tools for solving large-scale scientific and engineering problems. However, these machines are difficult to program since algorithms must exploit locality in order to achieve high performance. Vienna Fortran was the first fully specified data-parallel language for HMPs that provided features for the specification of data distribution and alignment at a high level of abstraction. In this paper we outline the major elements of Vienna Fortran and compare it to High Performance Fortran (HPF), a de-facto standard in this area. A significant weakness of HPF is its lack of support for many advanced applications, which require irregular data distributions and dynamic load balancing. We introduce HPF +, an extension of HPF based on Vienna Fortran, that provides the required functionality.
In this paper, we introduce a computational mode of a tree transducer called a bi-stage transducer and study its properties. We consider a mapping on trees realized by composition of any sequence of top-down transducers and bottom-up transducers, and call such a mapping a multi-phase tree transformation. We think a multi-phase tree transformation is sufficiently powerful. It is shown that in the case of rank-preserving transducers, a multi-phase tree transformation is realized by a bi-stage transducer.
Kazuyoshi TAKAGI Koyo NITTA Hironori BOUNO Yasuhiko TAKENAGA Shuzo YAJIMA
Ordered Binary Decision Diagrams (OBDDs) are graph-based representations of Boolean functions which are widely used because of their good properties. In this paper, we introduce nondeterministic OBDDs (NOBDDs) and their restricted forms, and evaluate their expressive power. In some applications of OBDDs, canonicity, which is one of the good properties of OBDDs, is not necessary. In such cases, we can reduce the required amount of storage by using OBDDs in some non-canonical form. A class of NOBDDs can be used as a non-canonical form of OBDDs. In this paper, we focus on two particular methods which can be regarded as using restricted forms of NOBDDs. Our aim is to show how the size of OBDDs can be reduced in such forms from theoretical point of view. Firstly, we consider a method to solve satisfiability problem of combinational circuits using the structure of circuits as a key to reduce the NOBDD size. We show that the NOBDD size is related to the cutwidth of circuits. Secondly, we analyze methods that use OBDDs to represent Boolean functions as sets of product terms. We show that the class of functions treated feasibly in this representation strictly contains that in OBDDs and contained by that in NOBDDs.
Kazuhito SHIDA Kaoru OHNO Masayuki KIMURA Yoshiyuki KAWAZOE
A large scale simulation for polymer chains in good solvent is performed. The implementation technique for efficient parallel execution, optimization, and load-balancing are discussed on this practical application. Finally, a simple performance model is proposed.
The first optimizing compiler was developed at IBM in order to prove that high level language programming could be as efficient as hand-coded machine language. Computer architecture and compiler optimization interacted through a feedback loop, from the high-level language computer architectures of the 1970s to the RISC machines of the 1980s. In the supercomputing community, the availability of effective vectorizing compilers delivered easy-to-use performance in the 1980s to the present. These compilers were successful at least in part because they could predict poor performance spots in the program and report these to users. This fostered a feedback loop between programmers and compilers to develop high performance programs. Future optimizing compilers for high performance computers and supercomputers will have to take advantage of both feedback loops.
Naohisa TAKAHASHI Takeshi MIEI
We present a general framework with which we can evaluate the flexibility and efficiency of various replay systems for parallel programs. In our approach, program monitoring is modeled by making a virtual dataflow program graph, referred to as a VDG, that includes all the instructions executed by the program. The behavior of the program replay is modeled on the parallel interpretation of a VDG based on two basic parallel execution models for dataflow program graphs: a data-driven model and a demand-driven model. Previous attempts to replay parallel programs, known as Instant Replay and P-Sequence, are also modeled as variations of the data-driven replay, i.e. the datadriven interpretation of a VDG. We show that the demand-driven replay, i.e. the demand-driven interpretation of a VDG, is more flexible in program replay than the data-driven replay since it allows better control of parallelism and a more selective replay. We also show that we can implement a demand-driven replay that requires almost the same amount of data to be saved during program monitoring as does the data-driven replay, and which eliminates any centralized bottleneck during program monitoring by optimizing the demand propagation and using an effective data structure.
Yoshimasa OHNISHI Yoshinari SUGIMOTO Toshinori SUEYOSHI
We conducted research and development of Distributed Supercomputing Environment (DSE) based on distributed shared memory model to serve as a cluster computing environment to provide parallel processing facilities. Shared memory model and message passing model are well-known typical models of parallel processing. It is desired that hybrid programming environment will make the best use of the prominent features of both models. Consequently, we add a new message passing mechanism to present DSE, and create a prototype called Hybrid DSE as a hybrid model based cluster computing environment. In this paper, we describe the implementation of a message passing mechanism on DSE and performance evaluation of Hybrid DSE.
It is known that in a nonblocking packet switch with input queueing, head-of-line (HOL) blocking affects significantly the maximum throughput of the packet switch. To alleviate the HOL blocking effect, a window policy has been proposed in that each input queue can scan up to a fixed number of packets (called the window size) to select for transmission on the outputs. However, the performance of the window policy has never been precisely characterized. In this paper, we use a closed queueing network model to characterize the performance of the packet switch with window policy. We obtain explicit closed-form formulae for the maximum throughput of the packet switch as a function of the window size. Both balanced and imbalanced traffic patterns are discussed. The formulae can easily determine the effectiveness of the window policy.
Sam APPLETON Shannon MORTON Michael LIEBELT
In this paper we describe the implementation of complex architectures using a general design approach for two-phase asynchronous systems. This fundamental approach, called Event Controlled Systems, can be used to widely extend the utility of two phase systems. We describe solutions that we have developed that dramatically improve the performance of static and dynamic-logic asynchronous pipelines, and briefly describe a complex microprocessor designed using ECS.
Norio TAGAWA Toshio SUZUKI Tadashi MORIYA
The present paper clarifies that the variance of the maximum likelihood estimator (MLE) of a parameter does not reach the Cramer-Rao lower bound (CRLB) when fitting a straight-line to observed two-dimensional data. In addition, the variance of the MLE can be shown to be equal to the CRLB only if observed noise reduces to a one-dimensional Gaussian variable. For most practical applications, it can be assumed that noise is added only to the range direction. In this case, the MLE is clearly an asymptotically effective estimator. However, even if we assume such a noise model, ML line-fitting to the data from many points of view has a high computational cost. The present paper proposes an alternative fitting method in order to provide a cost-effective unbiased estimator. The reliability of this new method is analyzed statistically and by computer simulation.
Akira TAKEUCHI Satoshi OHTSU Seiichi MUROYAMA
The designed converter has a two-input-winding transformer powered by single-phase AC voltage and an energy storage capacitor. Small size and enhanced conversion efficiency are achieved, because more than half of the energy is supplied to the load via a single conversion stage, and fast output-voltage regulation is achieved by controlling the charging and discharging of the storage capacitor. The design and control methods for the converter take into account the reset conditions of the transformer and stability in the output voltage control. An almost unity power factor and a low output voltage ripple were achieved with this converter fabricated as a breadboard circuit using small capacitors.
Kuniaki NAOI Naohisa TAKAHASHI
We show that the truth of a prenex normal form Presburger sentence bounded only by existential quantifiers (or an EPP-sentence) involving two variables can be decided in deterministic polynomial time. Specifically, an upper bound of the computation for the decision is O(n3u), where n is the number of atoms of the EPP-sentence, and u is the largest absolute value of all coefficients in the EPP-sentence. In the analysis for the upper bound, the random access machine is assumed for the machine model. Additionally, a uniform cost criterion is assumed. Deciding the truth of an EPP-sentence is an NP-complete problem, when the number of variables is not fixed. Furthermore, whether the truth of an EPP-sentence involving two or more variables can be decided in deterministic polynomial time, when the number of variables is fixed, or not has remained an open problem. We previously proposed a procedure for quickly deciding the truth of an EPP-sentence on the basis of a suggestion by D.C.Cooper. We found the upper bound by analyzing the decision procedure. The procedure can be applied to both automated correctness proof of specification in various design fields and detection of infeasible paths in a program. In the procedure, a matrix denoting coefficients of the variables in the EPP-sentence is triangulated.
This paper presents algorithms for computing the Minkowski sum of two polygons P and Q for a family of problems. For P being convex and Q being monotone, an algorithm is given with O (nm) time and space complexity. For both P and Q being monotone polygons, an O (nm log nm) time algorithm is presented and it is shown that the complexity of the sum is Θ (nmα(min(n,m))) in the worst case, where α() is the inverse of Ackermann's function. Finally, an O ((nm+k)log nm) time complexity algorithm is given when P is monotone and Q is simple, where k in the worst case could be Θ(n2m). The complexity of P Q is shown to be Θ(n2m) in the worst case. Here, m and n denote the number of edges of P and Q, respectively.
Hiroyuki ATARASHI Masao NAKAGAWA
Partial capture effect for multi-carrier radio packet communication network is evaluated in frequency selective fading channel. In multi-carrier modulation (MCM) network where each terminal uses several sub-carriers for transmission,the terminals have different instantaneous frequency responses because of its location, fading pattern, and other various factors. This generates the difference of received power in frequency domain, then partial capture effect can be considered at each sub-carrier. Moreover these partially captured packets are not damaged by inter symbol interference (ISI) caused by frequency selective fading, which seriously degrades single-carrier modulation (SCM) network. From this point of view we present the partial capture effect for the MCM network in the frequency selective fading environment. The results show that the MCM network with partial capture has more advantages than the MCM network without partial capture in terms of the throughput and the average number of transmissions.
Takayuki NAGAYASU Hiroshi KUBO Keishi MURAKAMI Tadashi FUJINO
This paper presents a novel approach to a soft-output equalizer, which makes a symbol-by-symbol soft-decision based on a posteriori probabilities (APP's) criterion in the presence of intersymbol interference. The authors propose a soft-output Viterbi equalizer (SOVE) employing expanded memory length in a trellis of the Viterbi algorithm with small arithmetic complexity. The proposed equalizer gives suboptimum soft-decision closer to that of a equalizer with the maximum a posteriori probabilities (MAP) algorithm than the conventional SOVE.