The search functionality is under construction.

Author Search Result

[Author] Masahiro SOWA(5hit)

1-5hit
  • Proposition and Evaluation of Parallelism-Independent Scheduling Algorithms for DAGs of Tasks with Non-Uniform Execution Times

    Kirilka NIKOLOVA  Atusi MAEDA  Masahiro SOWA  

     
    PAPER

      Vol:
    E84-A No:6
      Page(s):
    1496-1505

    A parallel program with a fixed degree of parallelism cannot be executed efficiently, or at all, by a parallel computer with a different degree of parallelism. This will cause a problem in the distribution of software applications in the near future when parallel computers with various degrees of parallelism will be widely used. In this paper we propose a way to make the machine code of the programs parallelism-independent, i.e. executable in minimum time on parallel computers with any degree of parallelism. We propose and evaluate three parallelism-independent scheduling algorithms for direct acyclic graphs (DAGs) of tasks with non-uniform execution times. To prove their efficiency, we performed simulations both with random DAGs and DAGs extracted from real applications. We evaluate them in terms of schedule length, computation time and size of the scheduled program. Their results are compared to those of the traditional CP/MISF algorithm which is used separately for each number of processors.

  • Parallelism-Independent Scheduling Method

    Kirilka NIKOLOVA  Atusi MAEDA  Masahiro SOWA  

     
    PAPER

      Vol:
    E83-A No:6
      Page(s):
    1138-1150

    All the existing scheduling algorithms order the instructions of the program in such a way that it can be executed in minimal time only for one fixed number of processors. In this paper we propose a new scheduling method, called Parallelism-Independent Scheduling Method, which enables the execution of the scheduled program on parallel computers with any degree of parallelism in near-optimal time. We propose three Parallelism-Independent algorithms, which have the following phases: obtaining a parallel schedule by using a list scheduling heuristics, optimization of the parallel schedule by rearranging the tasks in each level, so that they can be executed efficiently with different degrees of parallelism, serialization of the parallel schedule, and insertion of markers for the parallel execution limits. The three algorithms differ in their optimization phase. To prove the efficiency of our algorithms, we have made simulations with random directed acyclic graphs with different size and degree of parallelism. We compared the results in terms of schedule length to those obtained using the Critical Path Algorithm separately for each degree of parallelism.

  • Data-Driven Control of Multi-Microprocessor System

    Masahiro SOWA  Motoaki OHKUBO  

     
    PAPER-Computers

      Vol:
    E67-E No:2
      Page(s):
    96-100

    A prototype multi-microprocessor system, GMMCS, with a time-slice common bus is built at University of Gunma. The concept of data-driven control is introduced for making the control program simple and clear. The system can also work as a procedural level data flow computer system with plural master processors. A hardware of the system has already been reported in the transaction of IECE Japan. This paper discusses mainly the software especially the method of introducing the concept into the system and the results of sorting programs executed using the system.

  • An FPGA-Based Information Detection Hardware System Employing Multi-Match Content Addressable Memory

    Duc-Hung LE  Katsumi INOUE  Masahiro SOWA  Cong-Kha PHAM  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:10
      Page(s):
    1708-1717

    A new information detection method has been proposed for a very fast and efficient search engine. This method is implemented on hardware system using FPGA. We take advantages of Content Addressable Memory (CAM) which has an ability of matching mode for designing the system. The CAM blocks have been designed using available memory blocks of the FPGA device to save access times of the whole system. The entire memory can return multi-match results concurrently. The system operates based on the CAMs for pattern matching, in a parallel manner, to output multiple addresses of multi-match results. Based on the parallel multi-match operations, the system can be applied for pattern matching with various required constraint conditions without using any search principles. The very fast multi-match results are achieved at 60 ns with the operation frequency 50 MHz. This increases the search performance of the information detection system which uses this method as the core system.

  • Dynamic Fast Issue (DFI) Mechanism for Dynamic Scheduled Processors

    Abderazek BEN ABDALLAH  Mudar SAREM  Masahiro SOWA  

     
    PAPER-VLSI Architecture

      Vol:
    E83-A No:12
      Page(s):
    2417-2425

    Superscalar processors can achieve increased performance by issuing instructions Out-of-Order (OoO) from the original instruction stream. Implementing an OoO instruction scheme requires a hardware mechanism to prevent incorrectly executed instructions from updating registers values. In addition, performance decreases if data dependencies, a branch or a trap among instructions appears. To this end we propose a new mechanism named Dynamic Fast Issue (DFI) mechanism to issue instructions in an OoO fashion to multiple parallel functional units without considerable hardware complexity. The above system, which will be implemented in our Superscalar Functional Assignments Register Microprocessor(FARM), solves data dependencies, supports precise interrupt and branch prediction, which are the main problems associated with the dynamic scheduling of instructions in superscalar machines. Results are written only once,Write-once, directly into the register file (RF). To ensure that results are written in order in their appropriate output registers, a record of instruction order and state is maintained by a status buffer (STB). A 64 entries integrated register file is implemented to hold both renamed and logical registers. To recover the processor state from an interrupt or a branch miss-prediction, a status buffer (STB) and a recovery list table (RLT) are implemented. Novel aspects of the above system architecture as well as the principle underlying this process and the constraints that must be met is presented. Performance evaluation results are performed through full-pipelined-level architectural simulator and SPECint95 benchmark programs.