The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PU(3318hit)

2881-2900hit(3318hit)

  • Nonuniform Output Traffic Distributions in the Multipath Crossbar Network

    Byungho KIM  Boseob KWON  Hyunsoo YOON  Jung Wan CHO  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    417-424

    Multipath interconnection networks can support higher bandwidth than those of nonblocking networks by passing multiple packets to the same output simultaneously and these packets are buffered in the output buffer. The delay-throughput performance of the output buffer in multipath networks is closely related to output traffic distribution, packet arrival process at each output link connected to a given output buffer. The output traffic distributions are different according to the various input traffic patterns. Focusing on nonuniform output traffic distributions, this paper develops a new, general analytic model of the output buffer in multipath networks, which enables us to investigate the delay-throughput performance of the output buffer under various input traffic patterns. This paper also introduces Multipath Crossbar network as a representative multipath network which is the base architecture of our analysis. It is shown that the output buffer performances such as packet loss probability and delay improve as nonuniformity of the output traffic distribution becomes larger.

  • A 167-MHz 1-Mbit CMOS Synchronous Cache SRAM

    Hideharu YAHATA  Yoji NISHIO  Kunihiro KOMIYAJI  Hiroshi TOYOSHIMA  Atsushi HIRAISHI  Yoshitaka KINOSHITA  

     
    PAPER

      Vol:
    E80-C No:4
      Page(s):
    557-565

    A 167-MHz 1-Mbit CMOS synchronous cache SRAM was developed using 0.40-µm process technology. The floor plan was designed so that the address registers are located in the center of the chip, and high-speed circuits were developed such as the quasi latch (QL) sense amplifier and the one-shot control (OSC) output register. To maintain suitable setup and hold time margins, an equivalent margin (EM) design method was developed. 167-MHz operation was measured at a supply voltage of 2.5 V and an ambient temperature of 75. The same margins 1.1 ns of the setup time and hold time were measured for the specifications of a setup time of 2.0 ns and a hold time of 0.5 ns.

  • Interval Finding and Its Application to Data Mining

    Takeshi FUKUDA  Yasuhiko MORIMOTO  Shinichi MORISHITA  Takeshi TOKUYAMA  

     
    PAPER

      Vol:
    E80-A No:4
      Page(s):
    620-626

    In this paper, we investigate inverse problems of the interval query problem in application to data mining. Let I be the set of all intervals on U = {1, 2, , n}. Consider an objective function f(I), conditional functions ui(I) on I, and define an optimization problem of finding the interval I maximizing f(I) subject to ui(I) > Ki for given real numbers Ki (i = 1, 2, , h). We propose efficient alogorithms to solve the above optimization problem if the objective function is either additive or quotient, and the conditional functions are additive, where a function f is additive if f(I) = ΣiIf^(i) extending a function f^ on U, and quotient if it is represented as a quotient of two additive functions. We use computational-geometric methods such as convex hull, range searching, and multidimensional divide-and-conquer.

  • Parallel File Access for Implementing Dynamic Load Balancing on a Massively Parallel Computer

    Masahisa SHIMIZU  Yasuhiro OUE  Kazumasa OHNISHI  Toru KITAMURA  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    466-472

    Because a massively parallel computer processes vast amounts of data and generates many access requests from multiple processors simultaneously, parallel secondary storage requires large capacity and high concurrency. One effective method of implementation of such secondary storage is to use disk arrays which have multiple disks connected in parallel. In this paper, we propose a parallel file access method named DECODE (dynamic express changing of data entry) in which load balancing of each disk is achieved by dynamic determination of the write data position. For resolution of the problem of data fragmentation which is caused by the relocation of data during a write process, the concept of "Equivalent Area" is introduced. We have performed a preliminary performance evaluation using software simulation under various access statuses by changing the access pattern, access size and stripe size and confirmed the effectiveness of load balancing with this method.

  • High-Performance Parallel Computation of Flows Past a Space Plane Using NWT

    Kisa MATSUSHIMA  Susumu TAKANASHI  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    524-530

    Compressible viscous flows past a space plane have been elucidated by parallel computation on the NWT. The NWT is a vector-parallel architecture computer system which achieves remarkably high performance in processing speed and memory storage. We have examined the advantages of the NWT in order to simulate realistic flow problems in engineering, such as the investigation of global and local aerodynamic characteristics of a space plane. The accuracy of the computational results has been verified by comparison with experimental data. The simplified domain-decomposition technique introduced here is easy to apply for parallel implementation to significantly improve the acceleration rate of computations. The larger available memory storage enables us to conduct a grid refinement study through which several points concerning CFD simulation of a space plane are obtained.

  • Vienna Fortran and the Path Towards a Standard Parallel Language

    Barbara M. CHAPMAN  Piyush MEHROTRA  Hans P. ZIMA  

     
    INVITED PAPER

      Vol:
    E80-D No:4
      Page(s):
    409-416

    Highly parallel scalable multiprocessing systems (HMPs) are powerful tools for solving large-scale scientific and engineering problems. However, these machines are difficult to program since algorithms must exploit locality in order to achieve high performance. Vienna Fortran was the first fully specified data-parallel language for HMPs that provided features for the specification of data distribution and alignment at a high level of abstraction. In this paper we outline the major elements of Vienna Fortran and compare it to High Performance Fortran (HPF), a de-facto standard in this area. A significant weakness of HPF is its lack of support for many advanced applications, which require irregular data distributions and dynamic load balancing. We introduce HPF +, an extension of HPF based on Vienna Fortran, that provides the required functionality.

  • Multi-Phase Tree Transformations

    Akio FUJIYOSHI  Takumi KASAI  

     
    LETTER-Thought and Language

      Vol:
    E80-A No:4
      Page(s):
    761-768

    In this paper, we introduce a computational mode of a tree transducer called a bi-stage transducer and study its properties. We consider a mapping on trees realized by composition of any sequence of top-down transducers and bottom-up transducers, and call such a mapping a multi-phase tree transformation. We think a multi-phase tree transformation is sufficiently powerful. It is shown that in the case of rank-preserving transducers, a multi-phase tree transformation is realized by a bi-stage transducer.

  • Computational Power of Nondeterministic Ordered Binary Decision Diagrams and Their Subclasses

    Kazuyoshi TAKAGI  Koyo NITTA  Hironori BOUNO  Yasuhiko TAKENAGA  Shuzo YAJIMA  

     
    PAPER

      Vol:
    E80-A No:4
      Page(s):
    663-669

    Ordered Binary Decision Diagrams (OBDDs) are graph-based representations of Boolean functions which are widely used because of their good properties. In this paper, we introduce nondeterministic OBDDs (NOBDDs) and their restricted forms, and evaluate their expressive power. In some applications of OBDDs, canonicity, which is one of the good properties of OBDDs, is not necessary. In such cases, we can reduce the required amount of storage by using OBDDs in some non-canonical form. A class of NOBDDs can be used as a non-canonical form of OBDDs. In this paper, we focus on two particular methods which can be regarded as using restricted forms of NOBDDs. Our aim is to show how the size of OBDDs can be reduced in such forms from theoretical point of view. Firstly, we consider a method to solve satisfiability problem of combinational circuits using the structure of circuits as a key to reduce the NOBDD size. We show that the NOBDD size is related to the cutwidth of circuits. Secondly, we analyze methods that use OBDDs to represent Boolean functions as sets of product terms. We show that the class of functions treated feasibly in this representation strictly contains that in OBDDs and contained by that in NOBDDs.

  • Parallelized Simulation of Complicated Polymer Structures and lts Efficiency

    Kazuhito SHIDA  Kaoru OHNO  Masayuki KIMURA  Yoshiyuki KAWAZOE  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    531-537

    A large scale simulation for polymer chains in good solvent is performed. The implementation technique for efficient parallel execution, optimization, and load-balancing are discussed on this practical application. Finally, a simple performance model is proposed.

  • The Effect of Optimizing Compilers on Architecture and Programs

    Michael WOLFE  

     
    INVITED PAPER

      Vol:
    E80-D No:4
      Page(s):
    403-408

    The first optimizing compiler was developed at IBM in order to prove that high level language programming could be as efficient as hand-coded machine language. Computer architecture and compiler optimization interacted through a feedback loop, from the high-level language computer architectures of the 1970s to the RISC machines of the 1980s. In the supercomputing community, the availability of effective vectorizing compilers delivered easy-to-use performance in the 1980s to the present. These compilers were successful at least in part because they could predict poor performance spots in the program and report these to users. This fostered a feedback loop between programmers and compilers to develop high performance programs. Future optimizing compilers for high performance computers and supercomputers will have to take advantage of both feedback loops.

  • Reproducing the Behavior of a Parallel Program by Using Dataflow Execution Models

    Naohisa TAKAHASHI  Takeshi MIEI  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    495-503

    We present a general framework with which we can evaluate the flexibility and efficiency of various replay systems for parallel programs. In our approach, program monitoring is modeled by making a virtual dataflow program graph, referred to as a VDG, that includes all the instructions executed by the program. The behavior of the program replay is modeled on the parallel interpretation of a VDG based on two basic parallel execution models for dataflow program graphs: a data-driven model and a demand-driven model. Previous attempts to replay parallel programs, known as Instant Replay and P-Sequence, are also modeled as variations of the data-driven replay, i.e. the datadriven interpretation of a VDG. We show that the demand-driven replay, i.e. the demand-driven interpretation of a VDG, is more flexible in program replay than the data-driven replay since it allows better control of parallelism and a more selective replay. We also show that we can implement a demand-driven replay that requires almost the same amount of data to be saved during program monitoring as does the data-driven replay, and which eliminates any centralized bottleneck during program monitoring by optimizing the demand propagation and using an effective data structure.

  • A High-Performance Cluster Computing Environment Based on Hybrid Shared Memory/Message Passing Model

    Yoshimasa OHNISHI  Yoshinari SUGIMOTO  Toshinori SUEYOSHI  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    448-454

    We conducted research and development of Distributed Supercomputing Environment (DSE) based on distributed shared memory model to serve as a cluster computing environment to provide parallel processing facilities. Shared memory model and message passing model are well-known typical models of parallel processing. It is desired that hybrid programming environment will make the best use of the prominent features of both models. Consequently, we add a new message passing mechanism to present DSE, and create a prototype called Hybrid DSE as a hybrid model based cluster computing environment. In this paper, we describe the implementation of a message passing mechanism on DSE and performance evaluation of Hybrid DSE.

  • The Maximum Throughput of a Nonblocking Packet Switch with Window Policy

    Dye-Jyun MA  

     
    PAPER-Switching and Communication Processing

      Vol:
    E80-B No:4
      Page(s):
    573-580

    It is known that in a nonblocking packet switch with input queueing, head-of-line (HOL) blocking affects significantly the maximum throughput of the packet switch. To alleviate the HOL blocking effect, a window policy has been proposed in that each input queue can scan up to a fixed number of packets (called the window size) to select for transmission on the outputs. However, the performance of the window policy has never been precisely characterized. In this paper, we use a closed queueing network model to characterize the performance of the packet switch with window policy. We obtain explicit closed-form formulae for the maximum throughput of the packet switch as a function of the window size. Both balanced and imbalanced traffic patterns are discussed. The formulae can easily determine the effectiveness of the window policy.

  • High Performance Two-Phase Asynchronous Pipelines

    Sam APPLETON  Shannon MORTON  Michael LIEBELT  

     
    PAPER-Design

      Vol:
    E80-D No:3
      Page(s):
    287-295

    In this paper we describe the implementation of complex architectures using a general design approach for two-phase asynchronous systems. This fundamental approach, called Event Controlled Systems, can be used to widely extend the utility of two phase systems. We describe solutions that we have developed that dramatically improve the performance of static and dynamic-logic asynchronous pipelines, and briefly describe a complex microprocessor designed using ECS.

  • Cost-Effective Unbiased Straight-Line Fitting to Multi-Viewpoint Range Data

    Norio TAGAWA  Toshio SUZUKI  Tadashi MORIYA  

     
    PAPER

      Vol:
    E80-A No:3
      Page(s):
    472-479

    The present paper clarifies that the variance of the maximum likelihood estimator (MLE) of a parameter does not reach the Cramer-Rao lower bound (CRLB) when fitting a straight-line to observed two-dimensional data. In addition, the variance of the MLE can be shown to be equal to the CRLB only if observed noise reduces to a one-dimensional Gaussian variable. For most practical applications, it can be assumed that noise is added only to the range direction. In this case, the MLE is clearly an asymptotically effective estimator. However, even if we assume such a noise model, ML line-fitting to the data from many points of view has a high computational cost. The present paper proposes an alternative fitting method in order to provide a cost-effective unbiased estimator. The reliability of this new method is analyzed statistically and by computer simulation.

  • A Single-Ended Boost-Type High-Power-Factor Converter Using a Two-Input-Winding Transformer

    Akira TAKEUCHI  Satoshi OHTSU  Seiichi MUROYAMA  

     
    PAPER-Power Supply

      Vol:
    E80-B No:3
      Page(s):
    483-490

    The designed converter has a two-input-winding transformer powered by single-phase AC voltage and an energy storage capacitor. Small size and enhanced conversion efficiency are achieved, because more than half of the energy is supplied to the load via a single conversion stage, and fast output-voltage regulation is achieved by controlling the charging and discharging of the storage capacitor. The design and control methods for the converter take into account the reset conditions of the transformer and stability in the output voltage control. An almost unity power factor and a low output voltage ripple were achieved with this converter fabricated as a breadboard circuit using small capacitors.

  • An n3u Upper Bound on the Complexity for Deciding the Truth of a Presburger Sentence Involving Two Variables Bounded Only by Existential Quantifiers

    Kuniaki NAOI  Naohisa TAKAHASHI  

     
    PAPER-Algorithm and Computational Complexity

      Vol:
    E80-D No:2
      Page(s):
    223-231

    We show that the truth of a prenex normal form Presburger sentence bounded only by existential quantifiers (or an EPP-sentence) involving two variables can be decided in deterministic polynomial time. Specifically, an upper bound of the computation for the decision is O(n3u), where n is the number of atoms of the EPP-sentence, and u is the largest absolute value of all coefficients in the EPP-sentence. In the analysis for the upper bound, the random access machine is assumed for the machine model. Additionally, a uniform cost criterion is assumed. Deciding the truth of an EPP-sentence is an NP-complete problem, when the number of variables is not fixed. Furthermore, whether the truth of an EPP-sentence involving two or more variables can be decided in deterministic polynomial time, when the number of variables is fixed, or not has remained an open problem. We previously proposed a procedure for quickly deciding the truth of an EPP-sentence on the basis of a suggestion by D.C.Cooper. We found the upper bound by analyzing the decision procedure. The procedure can be applied to both automated correctness proof of specification in various design fields and detection of infeasible paths in a program. In the procedure, a matrix denoting coefficients of the variables in the EPP-sentence is triangulated.

  • Computing the Minkowski Sum of Monotone Polygons

    Antonio HERNAN'DEZ-BARRERA  

     
    PAPER-Algorithm and Computational Complexity

      Vol:
    E80-D No:2
      Page(s):
    218-222

    This paper presents algorithms for computing the Minkowski sum of two polygons P and Q for a family of problems. For P being convex and Q being monotone, an algorithm is given with O (nm) time and space complexity. For both P and Q being monotone polygons, an O (nm log nm) time algorithm is presented and it is shown that the complexity of the sum is Θ (nmα(min(n,m))) in the worst case, where α() is the inverse of Ackermann's function. Finally, an O ((nm+k)log nm) time complexity algorithm is given when P is monotone and Q is simple, where k in the worst case could be Θ(n2m). The complexity of P Q is shown to be Θ(n2m) in the worst case. Here, m and n denote the number of edges of P and Q, respectively.

  • Partial Capture Effect for Multi-Carrier Radio Packet Communication Network

    Hiroyuki ATARASHI  Masao NAKAGAWA  

     
    PAPER-Radio Communication

      Vol:
    E80-B No:2
      Page(s):
    372-378

    Partial capture effect for multi-carrier radio packet communication network is evaluated in frequency selective fading channel. In multi-carrier modulation (MCM) network where each terminal uses several sub-carriers for transmission,the terminals have different instantaneous frequency responses because of its location, fading pattern, and other various factors. This generates the difference of received power in frequency domain, then partial capture effect can be considered at each sub-carrier. Moreover these partially captured packets are not damaged by inter symbol interference (ISI) caused by frequency selective fading, which seriously degrades single-carrier modulation (SCM) network. From this point of view we present the partial capture effect for the MCM network in the frequency selective fading environment. The results show that the MCM network with partial capture has more advantages than the MCM network without partial capture in terms of the throughput and the average number of transmissions.

  • A Soft-Output Viterbi Equalizer Employing Expanded Memory Length in a Trellis

    Takayuki NAGAYASU  Hiroshi KUBO  Keishi MURAKAMI  Tadashi FUJINO  

     
    LETTER-Radio Communication

      Vol:
    E80-B No:2
      Page(s):
    381-385

    This paper presents a novel approach to a soft-output equalizer, which makes a symbol-by-symbol soft-decision based on a posteriori probabilities (APP's) criterion in the presence of intersymbol interference. The authors propose a soft-output Viterbi equalizer (SOVE) employing expanded memory length in a trellis of the Viterbi algorithm with small arithmetic complexity. The proposed equalizer gives suboptimum soft-decision closer to that of a equalizer with the maximum a posteriori probabilities (MAP) algorithm than the conventional SOVE.

2881-2900hit(3318hit)