The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] multiprocessor(80hit)

41-60hit(80hit)

  • Evaluation of PARAdeg of Acyclic SWITCH-Less Program Nets

    Qi-Wei GE  Kenji ONAGA  

     
    LETTER

      Vol:
    E83-A No:6
      Page(s):
    1186-1191

    PARAdeg has been defined to try to measure parallelism inherent in a program net. Studies on computation of PARAdeg have been done, but the quantitative evaluation, on how much PARAdeg fits parallelism of program nets, has not been studied. In this paper, we do the evaluation by applying genetic algorithm to measure firing completion times when PARAdeg processors, and less and more processors are provided for 400 program nets. Our experimental results show that the firing completion times decrease rapidly with increase of processors till PARAdeg and slowly when processors are increased to more than PARAdeg, which implies PARAdeg is a reasonable standard to measure parallelism of program nets.

  • An Ordered-Deme Genetic Algorithm for Multiprocessor Scheduling

    Bong-Joon JUNG  Kwang-Il PARK  Kyu Ho PARK  

     
    PAPER-Algorithms

      Vol:
    E83-D No:6
      Page(s):
    1207-1215

    In static multiprocessor scheduling, heuristic algorithms have been widely used. Instead of gaining execution speed, most of them show non promising solutions since they search only a part of solution spaces. In this paper, we propose a scheduling algorithm using the genetic algorithm (GA) which is a well-known stochastic search algorithm. The proposed algorithm, named ordered-deme GA (OGA), is based on the multiple subpopulation GA, where a global population is divided into several subpopulations (demes) and each demes evolves independently. To find better schedules, the OGA orders demes from the highest to the lowest deme and migrates both the best and the worst individuals at the same time. In addition, the OGA adaptively assigns different mutation probabilities to each deme to improve search capability. We compare the OGA with well-known heuristic algorithms and other GAs for random task graphs and the task graphs from real numerical problems. The results indicate that the OGA finds mostly better schedules than others although being slower in terms of execution time.

  • Parallelism-Independent Scheduling Method

    Kirilka NIKOLOVA  Atusi MAEDA  Masahiro SOWA  

     
    PAPER

      Vol:
    E83-A No:6
      Page(s):
    1138-1150

    All the existing scheduling algorithms order the instructions of the program in such a way that it can be executed in minimal time only for one fixed number of processors. In this paper we propose a new scheduling method, called Parallelism-Independent Scheduling Method, which enables the execution of the scheduled program on parallel computers with any degree of parallelism in near-optimal time. We propose three Parallelism-Independent algorithms, which have the following phases: obtaining a parallel schedule by using a list scheduling heuristics, optimization of the parallel schedule by rearranging the tasks in each level, so that they can be executed efficiently with different degrees of parallelism, serialization of the parallel schedule, and insertion of markers for the parallel execution limits. The three algorithms differ in their optimization phase. To prove the efficiency of our algorithms, we have made simulations with random directed acyclic graphs with different size and degree of parallelism. We compared the results in terms of schedule length to those obtained using the Critical Path Algorithm separately for each degree of parallelism.

  • Approximation Algorithms for Multiprocessor Scheduling Problem

    Satoshi FUJITA  Masafumi YAMASHITA  

     
    INVITED SURVEY PAPER-Approximate Algorithms for Combinatorial Problems

      Vol:
    E83-D No:3
      Page(s):
    503-509

    In this paper, we consider the static multiprocessor scheduling problem for a class of multiprocessor systems consisting of m ( 1) identical processors connected by a complete network. The objective of this survey is to give a panoramic view of theoretical and/or practical approaches for solving the problem, that have been extensively conducted during the past three decades.

  • Data-Parallel Volume Rendering with Adaptive Volume Subdivision

    Kentaro SANO  Hiroyuki KITAJIMA  Hiroaki KOBAYASHI  Tadao NAKAMURA  

     
    PAPER-Computer Graphics

      Vol:
    E83-D No:1
      Page(s):
    80-89

    A data-parallel processing approach is promising for real-time volume rendering because of the massive parallelism in volume rendering. In data-parallel volume rendering, local results processing elements(PEs) generate from allocated subvolumes are integrated to form a final image. Generally, the integration causes an overhead unavoidable in data-parallel volume rendering due to communications among PEs. This paper proposes a data-parallel shear-warp volume rendering algorithm combined with an adaptive volume subdivision method to reduce the communication overhead and improve processing efficiency. We implement the parallel algorithm on a message-passing multiprocessor system for performance evaluation. The experimental results show that the adaptive volume subdivision method can reduce the overhead and achieve higher efficiency compared with a conventional slab subdivision method.

  • A Single Chip Multiprocessor Integrated with High Density DRAM

    Tadaaki YAMAUCHI  Lance HAMMOND  Oyekunle A. OLUKOTUN  Kazutami ARIMOTO  

     
    PAPER-Electronic Circuits

      Vol:
    E82-C No:8
      Page(s):
    1567-1577

    A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing memory latency and improving memory bandwidth. In this paper we evaluate the performance of a single chip multiprocessor integrated with DRAM when the DRAM is organized as on-chip main memory and as on-chip cache. We compare the performance of this architecture with that of a more conventional chip which only has SRAM-based on-chip cache. The DRAM-based architecture with four processors outperforms the SRAM-based architecture on floating point applications which are effectively parallelized and have large working sets. This performance difference is significantly better than that possible in a uniprocessor DRAM-based architecture, which performs only slightly faster than an SRAM-based architecture on the same applications. In addition, on multiprogrammed workloads, in which independent processes are assigned to every processor in a single chip multiprocessor, the large bandwidth of on-chip DRAM can handle the inter-access contention better. These results demonstrate that a multiprocessor takes better advantage of the large bandwidth provided by the on-chip DRAM than a uniprocessor.

  • Media Core Processor for Multimedia Application System

    Kosuke YOSHIOKA  Makoto HIRAI  Kozo KIMURA  Tokuzo KIYOHARA  

     
    PAPER

      Vol:
    E82-A No:2
      Page(s):
    206-214

    In this paper, we introduce a processor called Media Core Processor (MCP), which targets a system solution for consumer multimedia products. MCP is a heterogeneous multi-processor system designed to guarantee full frame MPEG decoding, and to reduce power consumption. In our processor architecture, each processing unit is optimized to support various characteristics of media processing. All processing units work in parallel in a macro-pipeline manner, thereby achieving high utilization of the processing units. A performance evaluation shows that audio/video full-frame decoding can be realized on 54 MHz operating frequency without any support from external hardware or a CPU. In addition, the high programmability of the MCP provides flexibility and reduces the time-to-market.

  • Selective Write-Update: A Method to Relax Execution Constraints in a Critical Section

    Jae Bum LEE  Chu Shik JHON  

     
    PAPER-Computer Systems

      Vol:
    E81-D No:11
      Page(s):
    1186-1194

    In a shared-memory multiprocessor, shared data are usually accessed in a critical section that is protected by a lock variable. Therefore, the order of accesses by multiple processors to the shared data corresponds to the order of acquiring the ownership of the lock variable. This paper presents a selective write-update protocol, where data modified in a critical section are stored in a write cache and, at a synchronization point, they are transferred only to the processor that will execute the critical section following the current processor. By using QOLB synchronization primitives, the next processor can be determined at the execution time. We prove that the selective write-update protocol ensures data coherency of parallel programs that comply with release consistency, and evaluate the performance of the protocol by analytical modeling and program-driven simulation. The simulation results show that our protocol can reduce the number of coherence misses in a critical section while avoiding the multicast of write-update requests on an interconnection network. In addition, we observe that synchronization latency can be decreased by reducing both the execution time of a critical section and the number of write-update requests. From the simulation results, it is shown that our protocol provides better performance than a write-invalidate protocol and a write-update protocol as the number of processors increases.

  • Processor Pipeline Design for Fast Network Message Handling in RWC-1 Multiprocessor

    Hiroshi MATSUOKA  Kazuaki OKAMOTO  Hideo HIRONO  Mitsuhisa SATO  Takashi YOKOTA  Shuichi SAKAI  

     
    PAPER

      Vol:
    E81-C No:9
      Page(s):
    1391-1397

    In this paper we describe the pipeline design and enhanced hardware for fast message handling in a RICA-1 processor, a processing element (PE) in the RWC-1 multiprocessor. The RWC-1 is based on the reduced inter-processor communication architecture (RICA), in which communications are combined with computation in the processor pipeline. The pipeline is enhanced with hardware mechanisms to support fine-grain parallel execution. The data paths of the RICA-1 super-scalar processor are commonly used for communication as well as instruction execution to minimize its implementation cost. A 128-PE system has been built on January 1998, and it is currently used for hardware debugging, software development and performance evaluation.

  • A Parallel and Distributed Genetic Algorithm on Loosely-Coupled Multiprocessor Systems

    Takashi MATSUMURA  Morikazu NAKAMURA  Juma OKECH  Kenji ONAGA  

     
    PAPER

      Vol:
    E81-A No:4
      Page(s):
    540-546

    In this paper we consider a parallel and distributed computation of genetic algorithms on loosely-coupled multiprocessor systems. Loosely-coupled ones are more suitable for massively parallel processing and also more easily VLSI implementation than tightly-coupled ones. However, communication overhead on parallel processing is more serious for loosely-coupled ones. We propose in this paper a parallel and distributed execution method of genetic algorithm on loosely-coupled multiprocessor systems of fixed network topologies in which each processor element carries out genetic operations on its own chromosome set and communicates with only the neighbors in order to save communication overhead. We evaluate the proposed method on the multiprocessor systems with ring, torus, and hypercube topologies for benchmark problem instances. From the results, we find that the ring topology is more suitable for the proposed parallel and distributed execution since variety of chromosomes in the ring is kept much more than that in the others. Moreover, we also propose a new network topology called cone which is a hierarchical connection of ring topologies. We show its effectiveness by experimental evaluation.

  • Value-Based Scheduling for Multiprocessor Real-Time Database Systems

    Shin-Mu TSENG  Y. H. CHIN  Wei-Pang YANG  

     
    LETTER-Databases

      Vol:
    E81-D No:1
      Page(s):
    137-143

    We present a new scheduling policy named Value-based Processor Allocation (VPA-k) for scheduling value-based transactions in a multiprocessor real-time database system. The value of a transaction represents the profit the transaction contributes to the system if it is completed before its deadline. Using VPA-k policy, the transactions with higher values are given higher priorities to execute first, while at most k percentage of the total processors are allocated to the urgent transactions dynamically. Through simulation experiments, VPA-k policy is shown to outperform other scheduling policies substantially in both maximizing the totally obtained values and minimizing the number of missed transactions.

  • A Simple Hardware Prefetching Scheme Using Sequentiality for Shared-Memory Multiprocessors

    Myoung Kwon TCHEUN  Seung Ryoul MAENG  Jung Wan CHO  

     
    PAPER-Computer Hardware and Design

      Vol:
    E80-D No:11
      Page(s):
    1055-1063

    To reduce the memory access latency on sharedmemory multiprocessors, several prefetching schemes have been proposed. The sequential prefetching scheme is a simple hardware-controlled scheme, which exploits the sequentiality of memory accesses to predict which blocks will be read in the near future. Aggressive sequential prefetching prefetches many blocks on each miss to reduce the miss rates and results in good performance for application programs with high sequentiality. However, conservative sequential prefetching prefetches a few blocks on each miss to avoid prefetching of useless blocks, which shows better performance than aggressive sequential prefetching for application programs with low sequentiality. We analyze the relationship between the sequentiality of application programs and the effectiveness of sequential prefetching on various memory and network latency and propose a new adaptive sequential prefetching scheme. Simply adding a small table to the sequential prefetching scheme, the proposed scheme prefetches a large number of blocks for application programs with high sequentiality and reduces the miss rates significantly, and prefetches a small number of blocks for application programs with low sequentiality and avoids loading useless blocks.

  • MINC: Multistage Interconnection Network with Cache Control Mechanism

    Toshihiro HANAWA  Takayuki KAMEI  Hideki YASUKAWA  Katsunobu NISHIMURA  Hideharu AMANO  

     
    PAPER-Interconnection Networks

      Vol:
    E80-D No:9
      Page(s):
    863-870

    A novel approach to the cache coherent Multistage Interconnection Network (MIN) called the MINC (MIN with Cache control mechanism) is proposed. In the MINC, the directory is located only on the shared memory using the Reduced Hierarchical Bit-map Directory schemes (RHBDs). In the RHBD, the bit-map directory is reduced and carried in the packet header for quick multicasting without accessing the directory in each hierarchy. In order to reduce unnecessary packets caused by compacting the bit map in the RHBD, a small cache called the pruning cache is introduced in the switching element. The simulation reveals the pruning cache works most effectively when it is provided in every switching element of the first stage, and it reduces the congestion more than 50% with only 4 entries. The MINC cache control chip with 16 inputs/outputs is implemented on the LPGA (Laser Programmable Gate Array), and works with a 66 MHz clock.

  • Design and Analysis of Multiwave Interconnection Networks for MCM-Based Parallel Processing

    Takafumi AOKI  Shinichi SHIONOYA  Tatsuo HIGUCHI  

     
    PAPER-Novel Concept Devices

      Vol:
    E80-C No:7
      Page(s):
    935-940

    This paper explores the potential of multiwave interconnectionsoptical interconnections that employ wavelength components as multiplexable information carriersfor constructing next-generation multiprocessor systems using MCM technology. A hypercube-based multiprocessor network called the multiwave hypercube (MWHC) is proposed, where multiwave interconnections provide highly-flexible dynamic communication channels among processing elements. A performance analysis shows that the use of multiwavelength optics makes possible the reduction of network complexity on an MCM substrate, while supporting low-latency message routing.

  • An Efficient Task Scheduling Scheme for Mesh Multicomputers

    Oh Han KANG  

     
    PAPER-Computer Systems

      Vol:
    E80-D No:6
      Page(s):
    646-652

    In this paper, we propose an efficient task scheduling scheme, called CTS (Class-based Task Scheduling), to obtain high performance in terms of high system utilization and low waiting times for tasks. While a better submesh allocation scheme can improve system performance, an allocation policy alone cannot improve performance significantly. This is due to the fact that the FCFS task scheduling policy leads to large external fragmentation. The CTS strategy maintains four separate queues, one for each incoming task class. This avoids the blacking property incurred in the FCFS scheduling. To reduce the external fragmentation, a job tends to wait for an occupied submesh of the same size instead of using a new submesh in the CTS strategy. Simulation results indicate that the proposed scheduling strategy improves the performance compared to the FCFS scheduling policy by reducing the average waiting delay significantly.

  • Achieving Fault Tolerance in Pipelined Multiprocessor Systems

    Jeng-Ping LIN  Sy-Yen KUO  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E80-D No:6
      Page(s):
    665-671

    This paper focuses on recovering from processor transient faults in pipelined multiprocessor systems. A pipelined machine may employ out of order execution and branch prediction techniques to increase performance, thus a precise computation state would not be available. We propose an efficient scheme to maintain the precise computation state in a pipelined machine. The goal of this paper is to implement checkpointing and rollback recovery utilizing the technique of precise interrupt in a pipelined system. Detailed analysis is included to demonstrate the effectiveness of this method.

  • Intelligent Memory: An Architecture for Lock-Free Synchronization

    Nakun SEONG  Naihoon JUNG  Byungho KIM  Hyunsoo YOON  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    441-447

    This paper presents intelligent memory, a new memory architecture capable of providing efficient lock-free synchronization. In the intelligent memory, a sequence of operations on a shared object associated with that memory module can be processed without any intervention so that an environment for the synchronization can be provided by executing a critical section itself in that memory module. For this, we present a memory architecture for the intelligent memory having minimal instruction set and develop a progtramming model, called Critical Section Procedure (CSP), which consists of shared data structures and operations on them. Intelligent memory is intended to eliminate waste of processing time such as busy waiting in spin lock and the retry due to process contentions in existing lock-free synchronization schemes. Simulation results show that the intelligent memory provides better throughput compared with the spin lock and the existing lock-free synchronization schemes.

  • Extending SCI on Hierarchical Directory Trees for Large-Scale Multiprocessors

    Ing-Zong LU  Tien-Fu CHEN  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    434-440

    SCI (Scalable Coherent Interface) is pointerbased coherent directory scheme for massively parallel multiprocessors. Large message latency is one of the problems with SCI because of its linked list structure: the searching latency of messages could grow as a linear order of the number of processors. In this paper, we focus on a hierarchical architecture to propose a new schemeEST(Extending SCI-Tree), which may reduce the message traffic and also take the advantages of the topology property. Simulation results show that the EST scheme is effective in reducing message latency and communication cost when compared with other schemes.

  • A Lookahead Heuristic for Heterogeneous Multiprocessor Scheduling with Communication Costs

    Dingchao LI  Akira MIZUNO  Yuji IWAHORI  Naohiro ISHII  

     
    PAPER

      Vol:
    E80-D No:4
      Page(s):
    489-494

    This paper describes a new approach to the scheduling problem that assigns tasks of a parallel program described as a task graph onto parallel machines. The approach handles interprocessor communication and heterogeneity, based on using both the theoretical results developed so far and a lookahead scheduling strategy. The experimental results on randomly generated task graphs demonstrate the effectiveness of this scheduling heuristic.

  • Parallel Genetic Algorithm for Constrained Clustering

    Myung-Mook HAN  Shoji TATSUMI  Yasuhiko KITAMURA  Takaaki OKUMOTO  

     
    LETTER-Modeling and Simulation

      Vol:
    E80-A No:2
      Page(s):
    416-422

    In this paper we discuss a certain constrained optimization problem which is often encountered in the geometrical optimization. Since these kinds of problems occur frequently, constrained genetic optimization becomes very important topic for research. This paper proposes a new methodology to handle constraints using the Genetic Algorithm through a multiprocessor system (FIN) which has a self-similarity network.

41-60hit(80hit)