IEICE global.ieice.org Site

Author Search Result

[Author] Masaharu IMAI(32hit)

1-20hit(32hit)

FOREWORD
Masaharu IMAI Hitoshi KITAZAWA

FOREWORD

Vol:
E81-A No:12
Page(s):
2475-2475
Optimal Instruction Set Design through Adaptive Detabase Generation
Nguyen Ngoc BINH Masaharu IMAI Akichika SHIOMI Nobuyuki HIKICHI

PAPER

Vol:
E79-A No:3
Page(s):
347-353
This paper proposes a new method to design an optimal pipelined instructions set processor for ASIP development using a formal HW/SW codesign methodology. First, a HW/SW partioning algorithm for selecting an optimal pipelined architecture is outlined. Then, an adaptive detabase approach is presented that enables to enhance the optimality of the design through very accurate estimation of the performance of a pipelined ASIP in the HW/SW partitioning process. The experimental results show that the proposed method is effective and efficient.
Synthesizable HDL Generation for Pipelined Processors from a Micro-Operation Description
Makiko ITOH Yoshinori TAKEUCHI Masaharu IMAI Akichika SHIOMI

PAPER

Vol:
E83-A No:3
Page(s):
394-400
A synthesizable HDL generation method for pipelined processors is proposed. By using the proposed method, data-path and control logic descriptions of a target processor is generated from a clock based instruction set specification. From the experimental results, feasibility of the proposed method is evaluated and the amount of processor design time was drastically reduced than that of conventional RT level manual design in HDL.
A Double-Tree Structured Multicomputer System and Its Application to Combinatorial Problems
Masaharu IMAI

PAPER-Computer System

Vol:
E69-E No:9
Page(s):
1002-1010
In this paper, a combinatorial problem oriented multicomputer system called DON (Double-Tree Structured Network Machine) is proposed. And a parallel branch-and-bound program scheme for the DON system is described. The DON system is composed of two binary-tree structured subsystems and a system controller. The DON system works as a post-end processor of a host computer system. The DON system is designed to achieve high parallelism and efficient pipeline ability. One of the most distinctive features of the DON system, compared to a conventional single-tree machine, is that the algorithms with pipeline features can be easily implemented and executed more efficiently. From the experimental results through simulation, it appears that the DON system can solve large scale combinatorial problems more efficiently than a conventional single-tree machine.
Performance Evaluation of STRON: A Hardware Implementation of a Real-Time OS
Takumi NAKANO Yoshiki KOMATSUDAIRA Akichika SHIOMI Masaharu IMAI

PAPER

Vol:
E82-A No:11
Page(s):
2375-2382
In a real-time system, it is required to reduce the response time to an interrupt signal, as well as the execution time of a Real-Time Operating System (RTOS). In order to satisfy this requirement, we have proposed a method of implementing some of the functionalities of an RTOS using hardware. Based on this idea, we have implemented a VLSI chip, called STRON (silicon TRON: The Realtime Operating system Nucleus), to enhance the performance of an RTOS, where the STRON chip works as a peripheral unit of any MPU. In this paper we describe the hardware architecture of the STRON chip and the performance evaluation results of the RTOS using the STRON chip. The following results were obtained. (1) The STRON chip is implemented in only about 10,000 gates when the number of each object (task, event flag, semaphore, and interrupt) is 7. (2) The task scheduler can execute within 8 clocks in a fixed period using the hardware algorithm when the number of tasks is 7. (3) Most of the basic µITRON system calls using the STRON chip can be executed in a fixed period of a few microseconds. (4) The execution time of a system call, measured by a multitask application program model, can be reduced to about one-fifth that in the case of the conventional software RTOS. (5) The total performance, including context switching, is about 2.2 times faster than that of the software RTOS. We conclude that the execution time of the part of the system call implemented by the STRON chip can almost be ignored, but the part of the interface software and context switching related to the architecture of a MPU strongly influence the total performance of an RTOS.
A Compiler Generation Method for HW/SW Codesign Based on Configurable Processors
Shinsuke KOBAYASHI Kentaro MITA Yoshinori TAKEUCHI Masaharu IMAI

PAPER-Hardware/Software Codesign

Vol:
E85-A No:12
Page(s):
2586-2595
This paper proposes a compiler generation method for PEAS-III (Practical Environment for ASIP development), which is a configurable processor development environment for application domain specific embedded systems. Using the PEAS-III system, not only the HDL description of a target processor but also its target compiler can be generated. Therefore, execution cycles and dynamic power consumption can be rapidly evaluated. Two processors and their derivatives were designed using the PEAS-III system in the experiment. Experimental results show that the trade-offs among area, performance and power consumption of processors were analyzed in about twelve hours and the optimal processor was selected under the design constraints by using generated compilers and processors.
A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency
Katsuya SHINOHARA Norimasa OHTSUKI Yoshinori TAKEUCHI Masaharu IMAI

PAPER

Vol:
E82-A No:11
Page(s):
2356-2365
This paper proposes an ASIP performance optimization method taking clock frequency into account. The performance of an instruction set processor can be measured using the execution time of an application program, which can be determined by the clock cycles to perform the application program divided by the applied clock frequency. Therefore, the clock frequency should also be tuned in order to maximize the performance of the processor under the given design constraints. Experimental results show that the proposed method determines an optimal combination of FUs considering clock frequency.
A New Available Bandwidth Estimation Method Using RTT for a Bottleneck Link
Masaharu IMAI Yoshio SUGIZAKI Koichi ASATANI

PAPER-Network

Vol:
E97-B No:4
Page(s):
712-720
The Internet real-time applications are growing rapidly, and available bandwidth estimation is required. Available bandwidth estimation methods by end host have been studied e.g. Pathload and pathChirp. These methods parameterize probe packet volume and observe the delay variation to estimate available bandwidth. In these methods, the probe packets impose heavy overhead loads on the network. In this paper, we propose a new available bandwidth estimation method based on the frequency of minimum RTT of probe packets in multi hop links. This method estimates bandwidth utilization and available bandwidth of a bottleneck link without significantly increasing network overhead. Estimation accuracies are evaluated for available bandwidth by implementing the proposed method. The proposed method shows better performance than pathChirp or Pathload, requiring fewer probe packets and less estimation time simultaneously.
Deformable Part Model Based Arrhythmia Detection Using Time Domain Features
Yuuka HIRAO Yoshinori TAKEUCHI Masaharu IMAI Jaehoon YU

PAPER-Digital Signal Processing

Vol:
E100-A No:11
Page(s):
2221-2229
Heart disease is one of the major causes of death in many advanced countries. For prevention or treatment of heart disease, getting an early diagnosis from a long time period of electrocardiogram (ECG) examination is necessary. However, it could be a large burden on medical experts to analyze this large amount of data. To reduce the burden and support the analysis, this paper proposes an arrhythmia detection method based on a deformable part model, which absorbs individual variation of ECG waveform and enables the detection of various arrhythmias. Moreover, to detect the arrhythmia in low processing delay, the proposed method only utilizes time domain features. In an experimental result, the proposed method achieved 0.91 F-measure for arrhythmia detection.
Memory Space Controllable Search Strategies for Branch-and-Bound Algorithms
Masaharu IMAI Yuuji YOSHIDA Teruo FUKUMURA

PAPER-Miscellaneous

Vol:
E65-E No:5
Page(s):
257-264
The amount of memory space required by a branch-and-bound algorithm depends on the search strategy used in the algorithm. From the viewpoint of implementing branch-and-bound algorithms, it is desirable that the amount of memory space can be bounded to some feasible size. In this paper, we propose two new search strategies for branch-and-bound algorithms, by which the amount of required memory space is controllable. These strategies are named pdfs (parallel depth-first search)" and blis (breadth limited search)", respectively. One of the main results of this paper is that (a) the amount of required memory space of any of these strategies is a linear function of the size of the given problem and (b) the amount of required memory space is controllable by adjusting appropriate parameter. That is, these search strategies are adaptable to the available memory space. Another result of this paper is that the computational performance of a branch-and-bound algorithm, using any of these strategies, can be improved by adjusting appropriate parameters.
VLSI Architecture for Real-Time Fractal Image Coding Processors
Hideki YAMAUCHI Yoshinori TAKEUCHI Masaharu IMAI

PAPER

Vol:
E83-A No:3
Page(s):
452-458
This paper proposes an efficient architecture for fractal image coding processors. The proposed architecture achieves high-speed image coding comparable to conventional JPEG processing. This architecture achieves less than 33.3 msec fractal image compression coding against a 512 512 pixel image and enables full-motion fractal image coding. The circuit size of the proposed architecture design is comparable to those of JPEG processors and much smaller than those of previously proposed fractal processors.
An Efficient Scheduling Algorithm for Pipelined Instruction Set Processor and Its Application to ASIP Hardware/Software Codesign
Nguyen Ngoc BINH Masaharu IMAI Akichika SHIOMI Nobuyuki HIKICHI Yoshimichi HONMA Jun SATO

PAPER-VLSI Design Technology and CAD

Vol:
E78-A No:3
Page(s):
353-362
In this paper we describe the formal conditions to detect and resolve all kinds of pipeline data hazards and propose a scheduling algorithm for pipelined instruction set processor synthesis. The algorithm deals with multi cycle operations and tries to minimize the pipeline execution cycles under a given hardware configuration with/without hardware interlock. The main feature that makes the proposed algorithm different from existing ones is the algorithm is for estimating the performance in HW/SW partitioning, with capability of handling a module library of different FUs and dealing with multi cycle operations to be implemented in software. Experimental results of application to ASIP HW/SW codesign show that the proposed algorithm is effective and considerable pipeline execution cycle reduction rates can be achieved. The time complexity of the scheduing algorithm is of O(n2) in the worst case, where n is the number of instructions in a given basic block.
Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling
Yuki KOBAYASHI Murali JAYAPALA Praveen RAGHAVAN Francky CATTHOOR Masaharu IMAI

PAPER-VLSI Design Technology and CAD

Vol:
E91-A No:2
Page(s):
604-612
Clustering L0 buffers is effective for energy reduction in the instruction memory caches of embedded VLIW processors. However, the efficiency of the clustering depends on the schedule of the target application. For improving the energy efficiency of L0 clusters, an operation shuffling is proposed, which explores assignment of operations for each cycle, generates various schedules, and evaluates them to find an energy efficient schedule. This approach can find energy efficient schedules, however, it takes a long time to obtain the final result. In this paper, we propose a new method to directly generate an energy efficient schedule without iterations of operation shuffling. In the proposed method, a compiler schedules operations using the result of the single operation shuffling as a constraint. We propose some optimization algorithms to generate an energy efficient schedule for a given L0 cluster organization. The proposed method can drastically reduce the computational effort since it performs the operation shuffling only once. The experimental results show that comparable energy reduction is achieved by using the proposed method while the computational effort can be reduced significantly over the conventional operation shuffling.
Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram
Hiroaki TANAKA Yoshinori TAKEUCHI Keishi SAKANUSHI Masaharu IMAI Hiroki TAGAWA Yutaka OTA Nobu MATSUMOTO

PAPER-System Level Design

Vol:
E90-A No:12
Page(s):
2800-2809
SIMD instructions are often implemented in modern multimedia oriented processors. Although SIMD instructions are useful for many digital signal processing applications, most compilers do not exploit SIMD instructions. The difficulty in the utilization of SIMD instructions stems from data parallelism in registers. In assembly code generation, the positions of data in registers must be noted. A technique of generating pack instructions which pack or reorder data in registers is essential for exploitation of SIMD instructions. This paper presents a code generation technique for SIMD instructions with pack instructions. SIMD instructions are generated by finding and grouping the same operations in programs. After the SIMD instruction generation, pack instructions are generated. In the pack instruction generation, Multi-valued Decision Diagram (MDD) is introduced to represent and to manipulate sets of packed data. Experimental results show that the proposed code generation technique can generate assembly code with SIMD and pack instructions performing repacking of 8 packed data in registers for a RISC processor with a dual-issue coprocessor which supports SIMD and pack instructions. The proposed method achieved speedup ratio up to about 8.5 by SIMD instructions and multiple-issue mechanism of the target processor.
Two-Stage Configurable Decoder Model for Domain Specific FEC Decoder Design
Ittetsu TANIGUCHI Ayataka KOBAYASHI Keishi SAKANUSHI Yoshinori TAKEUCHI Masaharu IMAI

PAPER-High-Level Synthesis and System-Level Design

Vol:
E94-A No:12
Page(s):
2659-2668
Forward error correction (FEC) is one of important and heavy tasks for wireless communication. Leading edge mobile embedded systems usually support not only one FEC standard, but multiple FEC standards in order to adapt to various wireless communication standards. In this paper, we propose two-stage configurable decoder model (2-Stage CDM) for multiple FEC standards for Viterbi and Turbo coding which have a variation under the constraint length, coding rate, etc. Proposed decoder model realizes a decoder instance which supports dedicated multiple FEC standards, and rapid design for domain specific decoder is realized. Proposed decoder model is configurable in two stages: at hardware generation time and at runtime, and designers can easily specify these specifications by various design parameters. Experimental results show proposed two-stage configurable decoder model supports various domain specific FEC decoder including existing decoder, and the decoder instances based on proposed 2-Stage CDM have sufficient throughput for each communication standard and reasonable area overhead compared with existing decoder.
An Instruction Set Optimization Algorithm for Pipelined ASIPs
Nguyen Ngoc BINH Masaharu IMAI Akichika SHIOMI Nobuyuki HIKICHI

PAPER

Vol:
E78-A No:12
Page(s):
1707-1714
This paper proposes a new method to design an optimal pipelined instruction set processor using formal HW/SW codesign methodology. A HW/SW partitioning algorithm for selecting an optimal pipelined architecture is introduced. The codesign task addressed in this paper is to find a set of hardware implemented operations to achieve the highest performance of an ASIP with pipelined architecture under given gate count and power consumption constraints. The problem formalization as well as the proposed algorithm can be considered as an extension of our previous work toward a pipelined architecture. The experimental results show that the proposed method is quite effective and efficient.
A Small-Area and Low-Power SoC for Less-Invasive Pressure Sensing Capsules in Ambulatory Urodynamic Monitoring
Hirofumi IWATO Keishi SAKANUSHI Yoshinori TAKEUCHI Masaharu IMAI

PAPER

Vol:
E95-C No:4
Page(s):
487-494
To measure the detrusor pressure for diagnosing lower urinary tract symptoms, we designed a small-area and low-power System on a Chip (SoC). The SoC should be small and low power because it is encapsulated in tiny air-tight capsules which are simultaneously inserted in the urinary bladder and rectum for several days. Since the SoC is also required to be programmable, we designed an Application Specific Instruction set Processor (ASIP) for pressure measurement and wireless communication, and implemented almost required functions on the ASIP. The SoC was fabricated using a 0.18 µm CMOS mixed-signal process and the chip size is 2.5 2.5 mm2. Evaluation results show that the power consumption of the SoC is 93.5 µW, and that it can operate the capsule for seven days with a tiny battery.
Reconfigurable AGU: An Address Generation Unit Based on Address Calculation Pattern for Low Energy and High Performance Embedded Processors
Ittetsu TANIGUCHI Praveen RAGHAVAN Murali JAYAPALA Francky CATTHOOR Yoshinori TAKEUCHI Masaharu IMAI

PAPER-VLSI Design Technology and CAD

Vol:
E92-A No:4
Page(s):
1161-1173
Low energy and high performance embedded processor is crucial in the future nomadic embedded systems design. Improvement of memory accesses, especially improvement of spatial and temporal locality is well known technique to reduce energy and increase performance. However, after transformations that improve locality, address calculation often becomes a bottleneck. In this paper, we propose novel AGU (Address Generation Unit) exploration and mapping technique based on a reconfigurable AGU model. Experimental results show that the proposed techniques help exploring AGU architectures effectively and designers can get trade-offs of real life applications for about 10 hours.
Optimal Scheme for Search State Space and Scheduling on Multiprocessor Systems
Hassan A. YOUNESS Keishi SAKANUSHI Yoshinori TAKEUCHI Ashraf SALEM Abdel-Moneim WAHDAN Masaharu IMAI

PAPER

Vol:
E92-A No:4
Page(s):
1088-1095
A scheduling algorithm aims to minimize the overall execution time of the program by properly allocating and arranging the execution order of the tasks on the core processors such that the precedence constraints among the tasks are preserved. In this paper, we present a new scheduling algorithm by using geometry analysis of the Task Precedence Graph (TPG) based on A* search technique and uses a computationally efficient cost function for guiding the search with reduced complexity and pruning techniques to produce an optimal solution for the allocation/scheduling problem of a parallel application to parallel and multiprocessor architecture. The main goal of this work is to significantly reduce the search space and achieve the optimality or near optimal solution. We implemented the algorithm on general task graph problems that are processed on most of related search work and obtain the optimal scheduling with a small number of states. The proposed algorithm reduced the exhaustive search by at least 50% of search space. The viability and potential of the proposed algorithm is demonstrated by an illustrative example.
Proposal of a New Design Environment for Application Specific Integrated Processor: IDEAS
Jun SATO Masaharu IMAI Tetsuya HAKATA Nobuyuki HIKICHI

LETTER-VLSI Design

Vol:
E74-A No:5
Page(s):
1014-1016
This letter proposes a new framework for ASIP (Application Specific Integrated Processor) development. The system is called IDEAS (Integrated Design Environment for Application Specific Integrated Processor). IDEAS accepts a set of application programs and its expected data as input, and profiles these programs both statically and dynamically. According to the profiled results, the system decides the architecture of ASIP, and synthesizes the CPU core design of the ASIP, and generates the software development tools for the ASIP such as compiler and simulator.

1-20hit(32hit)

Author Search Result

[Author] Masaharu IMAI(32hit)

FOREWORD

Optimal Instruction Set Design through Adaptive Detabase Generation

Synthesizable HDL Generation for Pipelined Processors from a Micro-Operation Description

A Double-Tree Structured Multicomputer System and Its Application to Combinatorial Problems

Performance Evaluation of STRON: A Hardware Implementation of a Real-Time OS

A Compiler Generation Method for HW/SW Codesign Based on Configurable Processors

A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency

A New Available Bandwidth Estimation Method Using RTT for a Bottleneck Link

Deformable Part Model Based Arrhythmia Detection Using Time Domain Features

Memory Space Controllable Search Strategies for Branch-and-Bound Algorithms

VLSI Architecture for Real-Time Fractal Image Coding Processors

An Efficient Scheduling Algorithm for Pipelined Instruction Set Processor and Its Application to ASIP Hardware/Software Codesign

Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling

Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram

Two-Stage Configurable Decoder Model for Domain Specific FEC Decoder Design

An Instruction Set Optimization Algorithm for Pipelined ASIPs

A Small-Area and Low-Power SoC for Less-Invasive Pressure Sensing Capsules in Ambulatory Urodynamic Monitoring

Reconfigurable AGU: An Address Generation Unit Based on Address Calculation Pattern for Low Energy and High Performance Embedded Processors

Optimal Scheme for Search State Space and Scheduling on Multiprocessor Systems

Proposal of a New Design Environment for Application Specific Integrated Processor: IDEAS

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles