The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PAR(2741hit)

2341-2360hit(2741hit)

  • Device Parameter Estimation of SOI MOSFET Using One-Dimensional Numerical Simulation Considering Quantum Mechanical Effects

    Rimon IKENO  Hiroshi ITO  Kunihiro ASADA  

     
    PAPER-Electronic Circuits

      Vol:
    E80-C No:6
      Page(s):
    806-811

    We have been studying on subthreshold characteristics of SOI (Silicon-On-Insulator) MOSFET's in terms of substrate bias dependence using a one-dimensional subthreshold device simulator based on Poisson equation in an SOI multilayer structure for estimating structural parameters of real devices. Here, we consider the quantum mechanical effects in the electron inversion layer of thin SOI MOSFET's, such as the two-dimensionally quantized electron states and transports, with a self-consistent solver of Poisson and Schrodinger equations and a mobility model by the relaxation time approximation. From results of simulations, we found a significant difference between this model and the classical model and concluded that the quantum mechanical effects need to be considered in analizing thin-film SOI devices.

  • Jamming Avoidance Responses in Weakly Electric Fishes: A Biological View of Signal Processing

    Masashi KAWASAKI  

     
    INVITED PAPER

      Vol:
    E80-A No:6
      Page(s):
    943-950

    Electric fishes generate an AC electric field around themselves by the electric organ in the tail. Spatial distortion of the field by nearby objects is detected by an electroreceptor array located an over the body surface to localize the object electrically when other senses such as vision and mechanosense are useless. Each fish has its own 'frequency band' for its electric organ discharges, and jamming of the electrolocation system occurs when two fish with similar discharge frequencies encounter. To avoid janmming, the fish shift their discharge frequencies in appropriate directions. A computational algorithm for this electrical behavior and its neuronal implementation by the brain have been discovered. The design features of the system, however, are rather complex for this simple behavior and cannot be readily explained by functional optimization processes during evolution. To gain insights into the origin of the design features, two independently evolved electric fish species which perform the same behavior are compared. Complex features of the neuronal computation may be explained by the evolutionary history of neuronal elements.

  • Balanced State Feedback Controllers for Descrete Event Systems Described by the Golaszewski-Ramadge Model

    Shigemasa TAKAI  Toshimitsu USHIO  Shinzo KODAMA  

     
    LETTER-Concurrent Systems

      Vol:
    E80-A No:5
      Page(s):
    928-931

    We study state feedback control of discrete event systems described by the Golaszewski-Ramadge model. We derive a necessary and sufficient condition for the existence of a balanced state feedback controller under partial observations.

  • Parallel Universal Simulation and Self-Reproduction in Cellular Spaces

    Katsuhiko NAKAMURA  

     
    PAPER-Automata,Languages and Theory of Computing

      Vol:
    E80-D No:5
      Page(s):
    547-552

    This paper describes cellular spaces (or cellular automata) with capabilities of parallel self-reproduction and of parallel universal simulation of other cellular spaces. It is shown that there is a 1-dimensional cellular space U, called a parallel universal simulator, that can simulate any given 1-dimensional cellular space S in the sense that if an initial configuration of U has a coded information of both the local function and an initial configuration of S, then U has the same computation result that S has and the computation time of U is proportional to that of S. Two models of nontrivial parallel self-reproduction are also shown. One model is based on "state-exchange" method, and the other is based on a fixed point program of the parallel universal simulator.

  • A Lookahead Heuristic for Heterogeneous Multiprocessor Scheduling with Communication Costs

    Dingchao LI  Akira MIZUNO  Yuji IWAHORI  Naohiro ISHII  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    489-494

    This paper describes a new approach to the scheduling problem that assigns tasks of a parallel program described as a task graph onto parallel machines. The approach handles interprocessor communication and heterogeneity, based on using both the theoretical results developed so far and a lookahead scheduling strategy. The experimental results on randomly generated task graphs demonstrate the effectiveness of this scheduling heuristic.

  • Modular Array Structures for Design and Multiplierless Realization of Two-Dimensional Linear Phase FIR Digital Filters

    Saed SAMADI  Akinori NISHIHARA  Nobuo FUJII  

     
    PAPER-Digital Signal Processing

      Vol:
    E80-A No:4
      Page(s):
    722-736

    It is shown that two-dimensional linear phase FIR digital filters with various shapes of frequency response can be designed and realized as modular array structures free of multiplier coefficients. The design can be performed by judicious selection of two low order linear phase transfer functions to be used at each module as kernel filters. Regular interconnection of the modules in L rows and K columns conditioned with boundary coefficients 1, 0 and 1/2 results in higher order digital filters. The kernels should be chosen appropriately to, first, generate the desired shape of frequency response characteristic and, second, lend themselves to multiplierless realization. When these two requirements are satisfied, the frequency response can be refined to possess narrower transition bands by adding additional rows and columns. General properties of the frequency response of the array are investigated resulting in Theorems that serve as valuable tools towards appropriate selection of the kernels. Several design examples are given. The array structures enjoy several favorable features. Specifically, regularity and lack of multiplier coefficients makes it suitable for high-speed systolic VLSI implementation. Computational complexity of the structure is also studied.

  • Factoring Hard Integers on a Parallel Machine

    Rene PERALTA  Masahiro MAMBO  Eiji OKAMOTO  

     
    PAPER

      Vol:
    E80-A No:4
      Page(s):
    658-662

    We describe our implementation of the Hypercube variation of the Multiple Polynomial Quadratic Sieve (HMPQS) integer factorization algorithm on a Parsytec GC computer with 128 processors. HMPQS is a variation on the Quadratic Sieve (QS) algorithm which inspects many quadratic polynomials looking for quadratic residues with small prime factors. The polynomials are organized as the nodes of an n-dimensional cube. We report on the performance of our implementations on factoring several large numbers for the Cunningham Project.

  • Design of Array Processors for 2-D Discrete Fourier Transform

    Shietung PENG  Igor SEDUKHIN  Stanislav SEDUKHIN  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    455-465

    In this paper the design of systolic array processors for computing 2-dimensional Discrete Fourier Transform (2-D DFT) is considered. We investigated three different computational schemes for designing systolic array processors using systematic approach. The systematic approach guarantees to find optimal systolic array processors from a large solution space in terms of the number of processing elements and I/O channels, the processing time, topology, pipeline period, etc. The optimal systolic array processors are scalable, modular and suitable for VLSI implementation. An application of the designed systolic array processors to the prime-factor DFT is also presented.

  • Data-Localization Scheduling inside Processor-Cluster for Multigrain Parallel Processing

    Akimasa YOSHIDA  Ken'ichi KOSHIZUKA  Wataru OGATA  Hironori KASAHARA  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    473-479

    This paper proposes a data-localization scheduling scheme inside a processor-cluster for multigrain parallel processing, which hierarchically exploits parallelism among coarsegrain tasks like loops, medium-grain tasks like loop iterations and near-fine-grain tasks like statements. The proposed scheme assigns near-fine-grain or medium-grain tasks inside coarse-grain tasks onto processors inside a processor-cluster so that maximum parallelism can be exploited and inter-processor data transfer can be minimum after data-localization for coarse-grain tasks across processor-clusters. Performance evaluation on a multiprocessor system OSCAR shows that multigrain parallel processing with the proposed data-localization scheduling can reduce execution time for application programs by 10% compared with multigrain parallel processing without data-localization.

  • Parallelized Simulation of Complicated Polymer Structures and lts Efficiency

    Kazuhito SHIDA  Kaoru OHNO  Masayuki KIMURA  Yoshiyuki KAWAZOE  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    531-537

    A large scale simulation for polymer chains in good solvent is performed. The implementation technique for efficient parallel execution, optimization, and load-balancing are discussed on this practical application. Finally, a simple performance model is proposed.

  • Vienna Fortran and the Path Towards a Standard Parallel Language

    Barbara M. CHAPMAN  Piyush MEHROTRA  Hans P. ZIMA  

     
    INVITED PAPER

      Vol:
    E80-D No:4
      Page(s):
    409-416

    Highly parallel scalable multiprocessing systems (HMPs) are powerful tools for solving large-scale scientific and engineering problems. However, these machines are difficult to program since algorithms must exploit locality in order to achieve high performance. Vienna Fortran was the first fully specified data-parallel language for HMPs that provided features for the specification of data distribution and alignment at a high level of abstraction. In this paper we outline the major elements of Vienna Fortran and compare it to High Performance Fortran (HPF), a de-facto standard in this area. A significant weakness of HPF is its lack of support for many advanced applications, which require irregular data distributions and dynamic load balancing. We introduce HPF +, an extension of HPF based on Vienna Fortran, that provides the required functionality.

  • The Effect of Optimizing Compilers on Architecture and Programs

    Michael WOLFE  

     
    INVITED PAPER

      Vol:
    E80-D No:4
      Page(s):
    403-408

    The first optimizing compiler was developed at IBM in order to prove that high level language programming could be as efficient as hand-coded machine language. Computer architecture and compiler optimization interacted through a feedback loop, from the high-level language computer architectures of the 1970s to the RISC machines of the 1980s. In the supercomputing community, the availability of effective vectorizing compilers delivered easy-to-use performance in the 1980s to the present. These compilers were successful at least in part because they could predict poor performance spots in the program and report these to users. This fostered a feedback loop between programmers and compilers to develop high performance programs. Future optimizing compilers for high performance computers and supercomputers will have to take advantage of both feedback loops.

  • High-Performance Parallel Computation of Flows Past a Space Plane Using NWT

    Kisa MATSUSHIMA  Susumu TAKANASHI  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    524-530

    Compressible viscous flows past a space plane have been elucidated by parallel computation on the NWT. The NWT is a vector-parallel architecture computer system which achieves remarkably high performance in processing speed and memory storage. We have examined the advantages of the NWT in order to simulate realistic flow problems in engineering, such as the investigation of global and local aerodynamic characteristics of a space plane. The accuracy of the computational results has been verified by comparison with experimental data. The simplified domain-decomposition technique introduced here is easy to apply for parallel implementation to significantly improve the acceleration rate of computations. The larger available memory storage enables us to conduct a grid refinement study through which several points concerning CFD simulation of a space plane are obtained.

  • Parallel File Access for Implementing Dynamic Load Balancing on a Massively Parallel Computer

    Masahisa SHIMIZU  Yasuhiro OUE  Kazumasa OHNISHI  Toru KITAMURA  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    466-472

    Because a massively parallel computer processes vast amounts of data and generates many access requests from multiple processors simultaneously, parallel secondary storage requires large capacity and high concurrency. One effective method of implementation of such secondary storage is to use disk arrays which have multiple disks connected in parallel. In this paper, we propose a parallel file access method named DECODE (dynamic express changing of data entry) in which load balancing of each disk is achieved by dynamic determination of the write data position. For resolution of the problem of data fragmentation which is caused by the relocation of data during a write process, the concept of "Equivalent Area" is introduced. We have performed a preliminary performance evaluation using software simulation under various access statuses by changing the access pattern, access size and stripe size and confirmed the effectiveness of load balancing with this method.

  • Blind Separation of Sources Using Temporal Correlation of the Observed Signals

    Mitsuru KAWAMOTO  Kiyotoshi MATSUOKA  Masahiro OYA  

     
    PAPER-Digital Signal Processing

      Vol:
    E80-A No:4
      Page(s):
    695-704

    This paper proposes a new method for recovering the original signals from their linear mixtures observed by the same number of sensors. It is performed by identifying the linear transform from the sources to the sensors, only using the sensor signals. The only assumption of the source signals is basically the fact that they are statistically mutually independent. In order to perform the 'blind' identification, some time-correlational information in the observed signals are utilized. The most important feature of the method is that the full information of available time-correlation data (second-order statistics) is evaluated, as opposed to the conventional methods. To this end, an information-theoretic cost function is introduced, and the unknown linear transform is found by minimizing it. The propsed method gives a more stable solution than the conventional methods.

  • Implementation of a Parallel Prolog System on a Distributed Memory Parallel Computer

    Hideo MATSUDA  Toru KAWABATA  Yukio KANEDA  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    504-509

    In this paper we propose a new method for parallel execution of Prolog programs and present its implementation on a distributed memory parallel computer, Fujitsu AP1000. In our method a number of processes (named Prolog engines) explore different branches of a search tree (named tasks) in parallel, which is the same as OR-parallelism. Unlike OR-parallelism, the mapping between Prolog engines and tasks is statically determined like data parallelism. Each Prolog engine can decide which task is executed by the engine without communicating with the other engines. In many search problems, however, such static task mapping may cause imbalance on the processing time of each engine since the computational costs to explore branches may vary substantially. To cope with this issue, we devise a method to adjust the task imbalance by periodical exchanging how many tasks were processed for each engine. Also for reducing communication overhead in load balancing, we limit the scope of engines that exchange the load information each other. The effectiveness of our method is evaluated by measuring execution times for N Queens and the Traveling Salesman Problem on the AP1000. Using 512 processors, we obtained 355-fold speedup for N Queens and 343-fold speedup on the Traveling Salesman Problem.

  • A Method of Finding Legal Sequence Number for a Class of Extended Series-Parallel Digraphs

    Qi-Wei GE  Naomi YOSHIOKA  

     
    PAPER

      Vol:
    E80-A No:4
      Page(s):
    635-642

    Topological sorting is, given with a directed acyclic graph G = (V, E), to find a total ordering of the vertices such that if (u, v) E then u is ordered before v. Instead of finding total orderings, we wish to find out how many total orderings exist in a given directed acyclic graph G = (V, E). Here we call a total ordering as legal sequence and the problem as legal sequence number problem. In this paper, we first propose theorems on equivalent transformation of graphs with respect to legal sequence number. Then we give a formula to calculate legal sequence number of basic series-parallel digraphs and a way of the calculation for general series-parallel digraphs. Finally we apply our results to show how to obtain legal sequence number for a class of extended series-parallel digraphs.

  • An Efficient Implementation of Term Rewriting System on a Distributed Memory Architecture

    Yoshinari HACHISU  Shinichirou YAMAMOTO  Takeshi HAMAGUCHI  Kiyoshi AGUSA  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    510-517

    Term Rewriting System (TRS) is a model of computation and it is used in various application such as algebraic specification. TRS has an inherent concurrency and it is suitable for parallel computing. We have already proposed BOB (Bundle Of Branches), which is a mechanism of data management for parallel rewriting. We have proposed a model of parallel rewriting using BOB and implemented a TRS simulator based on this model on a shared memory parallel computer. Because it fully depends on the feature of a shared memory architecture, that is, a process can access any memory element, it is hard to transport it on a distributed memory parallel computer. In this paper, we propose autonomous BOB model. This model is suitable for a distributed memory architecture since a process uses message passing protocol and the method of load balancing is provided. We implement a TRS simulator using this model on a distributed memory architecture and it runs about 30 times faster on 64 processors than on a single processor.

  • Parallel Algorithms for Maximal Linear Forests

    Ryuhei UEHARA  Zhi-Zhong CHEN  

     
    PAPER

      Vol:
    E80-A No:4
      Page(s):
    627-634

    The maximal linear forest problem is to find, given a graph G = (V, E), a maximal subset of V that induces a linear forest. Three parallel algorithms for this problem are presented. The first one is randomized and runs in O(log n) expected time using n2 processors on a CRCW PRAM. The second one is deterministic and runs in O(log 2n) timeusing n4 processors on an EREW PRAM. The last one is deterministic and runs in O(log 5n) time using n3 processors on an EREW PRAM. The results put the problem in the class NC.

  • Reproducing the Behavior of a Parallel Program by Using Dataflow Execution Models

    Naohisa TAKAHASHI  Takeshi MIEI  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    495-503

    We present a general framework with which we can evaluate the flexibility and efficiency of various replay systems for parallel programs. In our approach, program monitoring is modeled by making a virtual dataflow program graph, referred to as a VDG, that includes all the instructions executed by the program. The behavior of the program replay is modeled on the parallel interpretation of a VDG based on two basic parallel execution models for dataflow program graphs: a data-driven model and a demand-driven model. Previous attempts to replay parallel programs, known as Instant Replay and P-Sequence, are also modeled as variations of the data-driven replay, i.e. the datadriven interpretation of a VDG. We show that the demand-driven replay, i.e. the demand-driven interpretation of a VDG, is more flexible in program replay than the data-driven replay since it allows better control of parallelism and a more selective replay. We also show that we can implement a demand-driven replay that requires almost the same amount of data to be saved during program monitoring as does the data-driven replay, and which eliminates any centralized bottleneck during program monitoring by optimizing the demand propagation and using an effective data structure.

2341-2360hit(2741hit)