The search functionality is under construction.

Author Search Result

[Author] Masaharu IMAI(32hit)

1-20hit(32hit)

  • FOREWORD

    Masaharu IMAI  Hitoshi KITAZAWA  

     
    FOREWORD

      Vol:
    E81-A No:12
      Page(s):
    2475-2475
  • Optimal Instruction Set Design through Adaptive Detabase Generation

    Nguyen Ngoc BINH  Masaharu IMAI  Akichika SHIOMI  Nobuyuki HIKICHI  

     
    PAPER

      Vol:
    E79-A No:3
      Page(s):
    347-353

    This paper proposes a new method to design an optimal pipelined instructions set processor for ASIP development using a formal HW/SW codesign methodology. First, a HW/SW partioning algorithm for selecting an optimal pipelined architecture is outlined. Then, an adaptive detabase approach is presented that enables to enhance the optimality of the design through very accurate estimation of the performance of a pipelined ASIP in the HW/SW partitioning process. The experimental results show that the proposed method is effective and efficient.

  • Synthesizable HDL Generation for Pipelined Processors from a Micro-Operation Description

    Makiko ITOH  Yoshinori TAKEUCHI  Masaharu IMAI  Akichika SHIOMI  

     
    PAPER

      Vol:
    E83-A No:3
      Page(s):
    394-400

    A synthesizable HDL generation method for pipelined processors is proposed. By using the proposed method, data-path and control logic descriptions of a target processor is generated from a clock based instruction set specification. From the experimental results, feasibility of the proposed method is evaluated and the amount of processor design time was drastically reduced than that of conventional RT level manual design in HDL.

  • A Double-Tree Structured Multicomputer System and Its Application to Combinatorial Problems

    Masaharu IMAI  

     
    PAPER-Computer System

      Vol:
    E69-E No:9
      Page(s):
    1002-1010

    In this paper, a combinatorial problem oriented multicomputer system called DON (Double-Tree Structured Network Machine) is proposed. And a parallel branch-and-bound program scheme for the DON system is described. The DON system is composed of two binary-tree structured subsystems and a system controller. The DON system works as a post-end processor of a host computer system. The DON system is designed to achieve high parallelism and efficient pipeline ability. One of the most distinctive features of the DON system, compared to a conventional single-tree machine, is that the algorithms with pipeline features can be easily implemented and executed more efficiently. From the experimental results through simulation, it appears that the DON system can solve large scale combinatorial problems more efficiently than a conventional single-tree machine.

  • Performance Evaluation of STRON: A Hardware Implementation of a Real-Time OS

    Takumi NAKANO  Yoshiki KOMATSUDAIRA  Akichika SHIOMI  Masaharu IMAI  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2375-2382

    In a real-time system, it is required to reduce the response time to an interrupt signal, as well as the execution time of a Real-Time Operating System (RTOS). In order to satisfy this requirement, we have proposed a method of implementing some of the functionalities of an RTOS using hardware. Based on this idea, we have implemented a VLSI chip, called STRON (silicon TRON: The Realtime Operating system Nucleus), to enhance the performance of an RTOS, where the STRON chip works as a peripheral unit of any MPU. In this paper we describe the hardware architecture of the STRON chip and the performance evaluation results of the RTOS using the STRON chip. The following results were obtained. (1) The STRON chip is implemented in only about 10,000 gates when the number of each object (task, event flag, semaphore, and interrupt) is 7. (2) The task scheduler can execute within 8 clocks in a fixed period using the hardware algorithm when the number of tasks is 7. (3) Most of the basic µITRON system calls using the STRON chip can be executed in a fixed period of a few microseconds. (4) The execution time of a system call, measured by a multitask application program model, can be reduced to about one-fifth that in the case of the conventional software RTOS. (5) The total performance, including context switching, is about 2.2 times faster than that of the software RTOS. We conclude that the execution time of the part of the system call implemented by the STRON chip can almost be ignored, but the part of the interface software and context switching related to the architecture of a MPU strongly influence the total performance of an RTOS.

  • A Compiler Generation Method for HW/SW Codesign Based on Configurable Processors

    Shinsuke KOBAYASHI  Kentaro MITA  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER-Hardware/Software Codesign

      Vol:
    E85-A No:12
      Page(s):
    2586-2595

    This paper proposes a compiler generation method for PEAS-III (Practical Environment for ASIP development), which is a configurable processor development environment for application domain specific embedded systems. Using the PEAS-III system, not only the HDL description of a target processor but also its target compiler can be generated. Therefore, execution cycles and dynamic power consumption can be rapidly evaluated. Two processors and their derivatives were designed using the PEAS-III system in the experiment. Experimental results show that the trade-offs among area, performance and power consumption of processors were analyzed in about twelve hours and the optimal processor was selected under the design constraints by using generated compilers and processors.

  • A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency

    Katsuya SHINOHARA  Norimasa OHTSUKI  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2356-2365

    This paper proposes an ASIP performance optimization method taking clock frequency into account. The performance of an instruction set processor can be measured using the execution time of an application program, which can be determined by the clock cycles to perform the application program divided by the applied clock frequency. Therefore, the clock frequency should also be tuned in order to maximize the performance of the processor under the given design constraints. Experimental results show that the proposed method determines an optimal combination of FUs considering clock frequency.

  • A New Available Bandwidth Estimation Method Using RTT for a Bottleneck Link

    Masaharu IMAI  Yoshio SUGIZAKI  Koichi ASATANI  

     
    PAPER-Network

      Vol:
    E97-B No:4
      Page(s):
    712-720

    The Internet real-time applications are growing rapidly, and available bandwidth estimation is required. Available bandwidth estimation methods by end host have been studied e.g. Pathload and pathChirp. These methods parameterize probe packet volume and observe the delay variation to estimate available bandwidth. In these methods, the probe packets impose heavy overhead loads on the network. In this paper, we propose a new available bandwidth estimation method based on the frequency of minimum RTT of probe packets in multi hop links. This method estimates bandwidth utilization and available bandwidth of a bottleneck link without significantly increasing network overhead. Estimation accuracies are evaluated for available bandwidth by implementing the proposed method. The proposed method shows better performance than pathChirp or Pathload, requiring fewer probe packets and less estimation time simultaneously.

  • Deformable Part Model Based Arrhythmia Detection Using Time Domain Features

    Yuuka HIRAO  Yoshinori TAKEUCHI  Masaharu IMAI  Jaehoon YU  

     
    PAPER-Digital Signal Processing

      Vol:
    E100-A No:11
      Page(s):
    2221-2229

    Heart disease is one of the major causes of death in many advanced countries. For prevention or treatment of heart disease, getting an early diagnosis from a long time period of electrocardiogram (ECG) examination is necessary. However, it could be a large burden on medical experts to analyze this large amount of data. To reduce the burden and support the analysis, this paper proposes an arrhythmia detection method based on a deformable part model, which absorbs individual variation of ECG waveform and enables the detection of various arrhythmias. Moreover, to detect the arrhythmia in low processing delay, the proposed method only utilizes time domain features. In an experimental result, the proposed method achieved 0.91 F-measure for arrhythmia detection.

  • Memory Space Controllable Search Strategies for Branch-and-Bound Algorithms

    Masaharu IMAI  Yuuji YOSHIDA  Teruo FUKUMURA  

     
    PAPER-Miscellaneous

      Vol:
    E65-E No:5
      Page(s):
    257-264

    The amount of memory space required by a branch-and-bound algorithm depends on the search strategy used in the algorithm. From the viewpoint of implementing branch-and-bound algorithms, it is desirable that the amount of memory space can be bounded to some feasible size. In this paper, we propose two new search strategies for branch-and-bound algorithms, by which the amount of required memory space is controllable. These strategies are named pdfs (parallel depth-first search)" and blis (breadth limited search)", respectively. One of the main results of this paper is that (a) the amount of required memory space of any of these strategies is a linear function of the size of the given problem and (b) the amount of required memory space is controllable by adjusting appropriate parameter. That is, these search strategies are adaptable to the available memory space. Another result of this paper is that the computational performance of a branch-and-bound algorithm, using any of these strategies, can be improved by adjusting appropriate parameters.

  • VLSI Architecture for Real-Time Fractal Image Coding Processors

    Hideki YAMAUCHI  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER

      Vol:
    E83-A No:3
      Page(s):
    452-458

    This paper proposes an efficient architecture for fractal image coding processors. The proposed architecture achieves high-speed image coding comparable to conventional JPEG processing. This architecture achieves less than 33.3 msec fractal image compression coding against a 512 512 pixel image and enables full-motion fractal image coding. The circuit size of the proposed architecture design is comparable to those of JPEG processors and much smaller than those of previously proposed fractal processors.

  • An Efficient Scheduling Algorithm for Pipelined Instruction Set Processor and Its Application to ASIP Hardware/Software Codesign

    Nguyen Ngoc BINH  Masaharu IMAI  Akichika SHIOMI  Nobuyuki HIKICHI  Yoshimichi HONMA  Jun SATO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E78-A No:3
      Page(s):
    353-362

    In this paper we describe the formal conditions to detect and resolve all kinds of pipeline data hazards and propose a scheduling algorithm for pipelined instruction set processor synthesis. The algorithm deals with multi cycle operations and tries to minimize the pipeline execution cycles under a given hardware configuration with/without hardware interlock. The main feature that makes the proposed algorithm different from existing ones is the algorithm is for estimating the performance in HW/SW partitioning, with capability of handling a module library of different FUs and dealing with multi cycle operations to be implemented in software. Experimental results of application to ASIP HW/SW codesign show that the proposed algorithm is effective and considerable pipeline execution cycle reduction rates can be achieved. The time complexity of the scheduing algorithm is of O(n2) in the worst case, where n is the number of instructions in a given basic block.

  • Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling

    Yuki KOBAYASHI  Murali JAYAPALA  Praveen RAGHAVAN  Francky CATTHOOR  Masaharu IMAI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:2
      Page(s):
    604-612

    Clustering L0 buffers is effective for energy reduction in the instruction memory caches of embedded VLIW processors. However, the efficiency of the clustering depends on the schedule of the target application. For improving the energy efficiency of L0 clusters, an operation shuffling is proposed, which explores assignment of operations for each cycle, generates various schedules, and evaluates them to find an energy efficient schedule. This approach can find energy efficient schedules, however, it takes a long time to obtain the final result. In this paper, we propose a new method to directly generate an energy efficient schedule without iterations of operation shuffling. In the proposed method, a compiler schedules operations using the result of the single operation shuffling as a constraint. We propose some optimization algorithms to generate an energy efficient schedule for a given L0 cluster organization. The proposed method can drastically reduce the computational effort since it performs the operation shuffling only once. The experimental results show that comparable energy reduction is achieved by using the proposed method while the computational effort can be reduced significantly over the conventional operation shuffling.

  • Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram

    Hiroaki TANAKA  Yoshinori TAKEUCHI  Keishi SAKANUSHI  Masaharu IMAI  Hiroki TAGAWA  Yutaka OTA  Nobu MATSUMOTO  

     
    PAPER-System Level Design

      Vol:
    E90-A No:12
      Page(s):
    2800-2809

    SIMD instructions are often implemented in modern multimedia oriented processors. Although SIMD instructions are useful for many digital signal processing applications, most compilers do not exploit SIMD instructions. The difficulty in the utilization of SIMD instructions stems from data parallelism in registers. In assembly code generation, the positions of data in registers must be noted. A technique of generating pack instructions which pack or reorder data in registers is essential for exploitation of SIMD instructions. This paper presents a code generation technique for SIMD instructions with pack instructions. SIMD instructions are generated by finding and grouping the same operations in programs. After the SIMD instruction generation, pack instructions are generated. In the pack instruction generation, Multi-valued Decision Diagram (MDD) is introduced to represent and to manipulate sets of packed data. Experimental results show that the proposed code generation technique can generate assembly code with SIMD and pack instructions performing repacking of 8 packed data in registers for a RISC processor with a dual-issue coprocessor which supports SIMD and pack instructions. The proposed method achieved speedup ratio up to about 8.5 by SIMD instructions and multiple-issue mechanism of the target processor.

  • Two-Stage Configurable Decoder Model for Domain Specific FEC Decoder Design

    Ittetsu TANIGUCHI  Ayataka KOBAYASHI  Keishi SAKANUSHI  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E94-A No:12
      Page(s):
    2659-2668

    Forward error correction (FEC) is one of important and heavy tasks for wireless communication. Leading edge mobile embedded systems usually support not only one FEC standard, but multiple FEC standards in order to adapt to various wireless communication standards. In this paper, we propose two-stage configurable decoder model (2-Stage CDM) for multiple FEC standards for Viterbi and Turbo coding which have a variation under the constraint length, coding rate, etc. Proposed decoder model realizes a decoder instance which supports dedicated multiple FEC standards, and rapid design for domain specific decoder is realized. Proposed decoder model is configurable in two stages: at hardware generation time and at runtime, and designers can easily specify these specifications by various design parameters. Experimental results show proposed two-stage configurable decoder model supports various domain specific FEC decoder including existing decoder, and the decoder instances based on proposed 2-Stage CDM have sufficient throughput for each communication standard and reasonable area overhead compared with existing decoder.

  • An Instruction Set Optimization Algorithm for Pipelined ASIPs

    Nguyen Ngoc BINH  Masaharu IMAI  Akichika SHIOMI  Nobuyuki HIKICHI  

     
    PAPER

      Vol:
    E78-A No:12
      Page(s):
    1707-1714

    This paper proposes a new method to design an optimal pipelined instruction set processor using formal HW/SW codesign methodology. A HW/SW partitioning algorithm for selecting an optimal pipelined architecture is introduced. The codesign task addressed in this paper is to find a set of hardware implemented operations to achieve the highest performance of an ASIP with pipelined architecture under given gate count and power consumption constraints. The problem formalization as well as the proposed algorithm can be considered as an extension of our previous work toward a pipelined architecture. The experimental results show that the proposed method is quite effective and efficient.

  • A Small-Area and Low-Power SoC for Less-Invasive Pressure Sensing Capsules in Ambulatory Urodynamic Monitoring

    Hirofumi IWATO  Keishi SAKANUSHI  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    487-494

    To measure the detrusor pressure for diagnosing lower urinary tract symptoms, we designed a small-area and low-power System on a Chip (SoC). The SoC should be small and low power because it is encapsulated in tiny air-tight capsules which are simultaneously inserted in the urinary bladder and rectum for several days. Since the SoC is also required to be programmable, we designed an Application Specific Instruction set Processor (ASIP) for pressure measurement and wireless communication, and implemented almost required functions on the ASIP. The SoC was fabricated using a 0.18 µm CMOS mixed-signal process and the chip size is 2.5 2.5 mm2. Evaluation results show that the power consumption of the SoC is 93.5 µW, and that it can operate the capsule for seven days with a tiny battery.

  • Reconfigurable AGU: An Address Generation Unit Based on Address Calculation Pattern for Low Energy and High Performance Embedded Processors

    Ittetsu TANIGUCHI  Praveen RAGHAVAN  Murali JAYAPALA  Francky CATTHOOR  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E92-A No:4
      Page(s):
    1161-1173

    Low energy and high performance embedded processor is crucial in the future nomadic embedded systems design. Improvement of memory accesses, especially improvement of spatial and temporal locality is well known technique to reduce energy and increase performance. However, after transformations that improve locality, address calculation often becomes a bottleneck. In this paper, we propose novel AGU (Address Generation Unit) exploration and mapping technique based on a reconfigurable AGU model. Experimental results show that the proposed techniques help exploring AGU architectures effectively and designers can get trade-offs of real life applications for about 10 hours.

  • Optimal Scheme for Search State Space and Scheduling on Multiprocessor Systems

    Hassan A. YOUNESS  Keishi SAKANUSHI  Yoshinori TAKEUCHI  Ashraf SALEM  Abdel-Moneim WAHDAN  Masaharu IMAI  

     
    PAPER

      Vol:
    E92-A No:4
      Page(s):
    1088-1095

    A scheduling algorithm aims to minimize the overall execution time of the program by properly allocating and arranging the execution order of the tasks on the core processors such that the precedence constraints among the tasks are preserved. In this paper, we present a new scheduling algorithm by using geometry analysis of the Task Precedence Graph (TPG) based on A* search technique and uses a computationally efficient cost function for guiding the search with reduced complexity and pruning techniques to produce an optimal solution for the allocation/scheduling problem of a parallel application to parallel and multiprocessor architecture. The main goal of this work is to significantly reduce the search space and achieve the optimality or near optimal solution. We implemented the algorithm on general task graph problems that are processed on most of related search work and obtain the optimal scheduling with a small number of states. The proposed algorithm reduced the exhaustive search by at least 50% of search space. The viability and potential of the proposed algorithm is demonstrated by an illustrative example.

  • Proposal of a New Design Environment for Application Specific Integrated Processor: IDEAS

    Jun SATO  Masaharu IMAI  Tetsuya HAKATA  Nobuyuki HIKICHI  

     
    LETTER-VLSI Design

      Vol:
    E74-A No:5
      Page(s):
    1014-1016

    This letter proposes a new framework for ASIP (Application Specific Integrated Processor) development. The system is called IDEAS (Integrated Design Environment for Application Specific Integrated Processor). IDEAS accepts a set of application programs and its expected data as input, and profiles these programs both statically and dynamically. According to the profiled results, the system decides the architecture of ASIP, and synthesizes the CPU core design of the ASIP, and generates the software development tools for the ASIP such as compiler and simulator.

1-20hit(32hit)