IEICE global.ieice.org Site

Keyword Search Result

[Keyword] hardware(260hit)

201-220hit(260hit)

A Hardware/Software Cosynthesis System for Digital Signal Processor Cores with Two Types of Register Files
Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI

PAPER

Vol:
E83-A No:3
Page(s):
442-451
In digital signal processing, bit width of intermediate variables should be longer than that of input and output variables in order to execute intermediate operations with high precision. Then a processor core for digital signal processing is required to have two types of register files, one of which is used by input and output variables and the other one is used by intermediate variables. This paper proposes a hardware/software cosynthesis system for digital signal processor cores with two types of register files. Given an application program and its data, the system synthesizes a hardware description of a processor core, an object code running on the processor core, and software environments. A synthesized processor core can be composed of a processor kernel, multiple data memory buses, hardware loop units, addressing units, and multiple functional units. Furthermore it can have two types of register files RF1 and RF2. The bit width and number of registers in RF1 or RF2 will be determined based on a given application program. Thus a synthesized processor core will have small area with keeping high precision of intermediate operations compared with a processor core with only one register file. The experimental results demonstrate the effectiveness of the proposed system.
System LSI Design Methods for Low Power LSIs
Hiroto YASUURA Tohru ISHIHARA

INVITED PAPER

Vol:
E83-C No:2
Page(s):
143-152
Low Power design has emerged as a both practically and theoretically attractive theme in modern LSI system design. This paper presents system level power optimization techniques. A brief survey of system level low power design approaches and several examples in detail are described. It reviews some techniques that have been proposed to overcome the power issue and gives guideline for prospective system level solutions.
Three-Layer Cooperative Architecture for MPEG-2 Video Encoder LSI
Mitsuo IKEDA Toshio KONDO Koyo NITTA Kazuhito SUGURI Takeshi YOSHITOME Toshihiro MINAMI Jiro NAGANUMA Takeshi OGURA

PAPER

Vol:
E83-C No:2
Page(s):
170-178
This paper presents an architecture for a single-chip MPEG-2 video encoder and demonstrates its flexibility and usefulness. The architecture based on three-layer cooperation provides flexible data-transfer that improves the encoder from the standpoints of versatility, scalability, and video quality. The LSI was successfully fabricated in the 0.25-µm four-metal CMOS process. Its small size and its low power consumption make it ideal for a wide range of applications, such as DVD recorders, PC-card encoders and HDTV encoders.
An Adaptive List-Output Viterbi Equalizer with Fast Compare-Select Operation
Kazuo TANADA Hiroshi KUBO Atsushi IWASE Makoto MIYAKE

PAPER

Vol:
E82-B No:12
Page(s):
2004-2011
This paper proposes an adaptive list-output Viterbi equalizer (LVE) with fast compare-select operation, in order to achieve a good trade-off between bit error rate (BER) performance and processing speed. An LVE, which keeps several survivors for each state, has good BER performance in the presence of wide-spread intersymbol interference. However, the LVE suffers from large processing delay due to its sorting-based compare-select operation. The proposed adaptive LVE greatly reduces its processing delay, because it simplifies compare-select operation. In addition, computer simulation shows that the proposed LVE causes only slight BER performance degradation due to its simplification of compare-select operation. Thus, the proposed LVE achieves better BER performance than decision-feedback sequence estimation (DFSE) without an increase in processing delay.
A Built-in Self-Reconfigurable Scheme for 3D Mesh Arrays
Itsuo TAKANAMI Tadayoshi HORITA

PAPER-Fault Tolerant Computing

Vol:
E82-D No:12
Page(s):
1554-1562
We propose a model for fault tolerant 3D processor arrays using one-and-half track switches. Spare processors are laid on the two opposite surfaces of the 3D array. The fault compensation process is performed by shifting processors on a continuous straight line (called compensation path) from a faulty processor to a spare on the surfaces. It is not allowed that compensantion paths are in the near-miss relation each other. Then, switches with only 4 states are needed to preserve the 3D mesh topology after compensating for faults. We give an algorithm in a convenient form for reconfiguring by hardware the 3D mesh arrays with faults. The algorithm can reconfigure the 3D mesh arrays in polynomial time. By computer simulation, we show the survival rates and the reliabilities of arrays which express the efficiencies of reconfiguration according to the algorithm. The reliabilities are compared with those of the model using double tracks for which the near-miss relation among compensation paths is allowed, but whose hardware overhead is almost double of that of the proposed model using one-and-half track. Finally, we design a logical circuit for hardware realization of the algorithm. Using the circuit, we can construct such a built-in self-reconfigurable 3D mesh array that the reconfiguration is done very quickly without an aid of a host computer.
A Memory Power Optimization Technique for Application Specific Embedded Systems
Tohru ISHIHARA Hiroto YASUURA

PAPER

Vol:
E82-A No:11
Page(s):
2366-2374
In this paper, a novel application specific power optimization technique utilizing small instruction ROM which is placed between an instruction cache or a main program memory and CPU core is proposed. Our optimization technique targets embedded systems which assume the following: (i) instruction memories are organized by two on-chip memories, a main program memory and a subprogram memory, (ii) these two memories can be independently powered-up or powered-down by a special instruction of a core processor, and (iii) a compiler optimizes an allocation of object code into these two memories so as to minimize average of read energy consumption. In many application programs, only a few basic blocks are frequently executed. Therefore, allocating these frequently executed basic blocks into low power subprogram memory leads significant energy reduction. Our experiments with actual ROM (Read Only Memory) modules created with 0.5 µm CMOS process technology, and MPEG2 codec program demonstrate significant energy reductions up to more than 50% at best case over the previous approach that applies only divided bit and word lines structure.
A Hardware/Software Cosynthesis System for Digital Signal Processor Cores
Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI

PAPER

Vol:
E82-A No:11
Page(s):
2325-2337
This paper proposes a hardware/software cosynthesis system for digital signal processor cores and a hardware/software partitioning algorithm which is one of the key issues for the system. The target processor has a VLIW-type core which can be composed of a processor kernel, multiple data memory buses (X-bus and Y-bus), hardware loop units, addressing units, and multiple functional units. The processor kernel includes five pipeline stages (RISC-type kernel) or three pipeline stages (DSP-type kernel). Given an application program written in the C language and a set of application data, the system synthesizes a processor core by selecting an appropriate kernel (RISC-type or DSP-type kernel) and required hardware units according to the application program/data and the hardware costs. The system also generates the object code for the application program and a software environment (compiler and simulator) for the processor core. The experimental results demonstrate that the system synthesizes processor cores effectively according to the features of an application program and the synthesized processor cores execute most application programs with the minimum number of clock cycles compared with several existing processors.
Performance Evaluation of STRON: A Hardware Implementation of a Real-Time OS
Takumi NAKANO Yoshiki KOMATSUDAIRA Akichika SHIOMI Masaharu IMAI

PAPER

Vol:
E82-A No:11
Page(s):
2375-2382
In a real-time system, it is required to reduce the response time to an interrupt signal, as well as the execution time of a Real-Time Operating System (RTOS). In order to satisfy this requirement, we have proposed a method of implementing some of the functionalities of an RTOS using hardware. Based on this idea, we have implemented a VLSI chip, called STRON (silicon TRON: The Realtime Operating system Nucleus), to enhance the performance of an RTOS, where the STRON chip works as a peripheral unit of any MPU. In this paper we describe the hardware architecture of the STRON chip and the performance evaluation results of the RTOS using the STRON chip. The following results were obtained. (1) The STRON chip is implemented in only about 10,000 gates when the number of each object (task, event flag, semaphore, and interrupt) is 7. (2) The task scheduler can execute within 8 clocks in a fixed period using the hardware algorithm when the number of tasks is 7. (3) Most of the basic µITRON system calls using the STRON chip can be executed in a fixed period of a few microseconds. (4) The execution time of a system call, measured by a multitask application program model, can be reduced to about one-fifth that in the case of the conventional software RTOS. (5) The total performance, including context switching, is about 2.2 times faster than that of the software RTOS. We conclude that the execution time of the part of the system call implemented by the STRON chip can almost be ignored, but the part of the interface software and context switching related to the architecture of a MPU strongly influence the total performance of an RTOS.
Hardware Synthesis from C Programs with Estimation of Bit Length of Variables
Osamu OGAWA Kazuyoshi TAKAGI Yasufumi ITOH Shinji KIMURA Katsumasa WATANABE

PAPER

Vol:
E82-A No:11
Page(s):
2338-2346
In the hardware synthesis methods with high level languages such as C language, optimization quality of the compilers has a great influence on the area and speed of the synthesized circuits. Among hardware-oriented optimization methods required in such compilers, minimization of the bit length of the data-paths is one of the most important issues. In this paper, we propose an estimation algorithm of the necessary bit length of variables for this aim. The algorithm analyzes the control/data-flow graph translated from C programs and decides the bit length of each variable. On several experiments, the bit length of variables can be reduced by half with respect to the declared length. This method is effective not only for reducing the circuit area but also for reducing the delay of the operation units such as adders.
FPGA-Based Hash Circuit Synthesis with Evolutionary Algorithms
Ernesto DAMIANI Valentino LIBERALI Andrea G. B. TETTAMANZI

PAPER

Vol:
E82-A No:9
Page(s):
1888-1896
An evolutionary algorithm is used to evolve a digital circuit which computes a simple hash function mapping a 16-bit address space into an 8-bit one. The target technology is FPGA, where the search space of the algorithm is made of the combinational functions computed by cells and of the interconnections among cells. The evolutionary technique has been applied to five different interconnection topologies, specified by neighbourhood graphs. This circuit is readily applicable to the design of set-associative cache memories. Possible use of the evolutionary approach presented in the paper for on-line tuning of the function during cache operation is also discussed.
A Novel Computationally Adaptive Hardware Algorithm for Video Motion Estimation
Vasily G. MOSHNYAGA

PAPER-Imaging Circuits and Algorithms

Vol:
E82-C No:9
Page(s):
1749-1754
A new hardware algorithm for the block matching video motion estimation is presented. The algorithm works in the full-search fashion but unlike the Full-Search Block Matching Algorithm (FSBMA) it adjusts the number of computations dynamically to variable picture contents. Due to incorporated mechanism of data-driven thresholding, the proposed algorithm performs as four times as less operations comparing to the FSBMA while maintaining the same quality of results. Its hardware implementation is simple and compact. A supportive hardware design as well as simulation results on benchmarks are outlined.
Evolutionary Design of Arithmetic Circuits
Takafumi AOKI Naofumi HOMMA Tatsuo HIGUCHI

PAPER

Vol:
E82-A No:5
Page(s):
798-806
This paper presents a new approach to designing arithmetic circuits by using a graph-based evolutionary optimization technique called Evolutionary Graph Generation (EGG). The key idea of the proposed method is to introduce a higher level of abstraction for arithmetic algorithms, in which arithmetic circuit structures are modeled as data-flow graphs associated with specific number representation systems. The EGG system employs evolutionary operations to transform the structure of graphs directly, which makes it possible to generate the desired circuit structure efficiently. The potential capability of EGG is demonstrated through an experiment of generating constant-coefficient multipliers.
A Method for Circular Pattern Recognition in a Binary Image and Its Implementation onto an FPGA
Yusuke TOKUNAGA Takahiro INOUE

PAPER

Vol:
E82-A No:2
Page(s):
246-254
A method for circular pattern recognition in a binary image and its implementation onto an FPGA are described. The proposed method is based on the template matching method using a modified matching degree. This method is implementable onto an FPGA and can realize a real-time system. The usefulness of the proposed method was confirmed by numerical simulations. The real-time performance was confirmed by experiments on the FPGA designed by using Verilog-HDL CAD tool.
Efficient and Flexible Cosimulation Environment for DSP Applications
Wonyong SUNG Soonhoi HA

PAPER-Co-design

Vol:
E81-A No:12
Page(s):
2605-2611
Hardware software codesign using various hardware and software implementation possibilities requires a cosimulation environment which has both flexibility and efficiency. In this paper, a hardware software cosimulation environment is developed using the backplane approach and optimized synchronization. To seamlessly integrate a new simulator, this paper defines and implements the backplane protocol for communication and synchronization between client simulators. Automatic interface generation facility is also devised for more effective cosimulation environment. To enhance the performance of cosimulation backplane, a series of optimized hardware software synchronization methods are introduced. Efforts are focused on reducing control packets between simulators as well as concurrent execution of simulators without roll-back. The environment is implemented based on Ptolemy and validated with a QAM example run on different configurations. With optimized synchronization method, we have achieved about 7 times speed-up compared with the lock-step synchronization.
Language and Compiler for Optimizing Datapath Widths of Embedded Systems
Akihiko INOUE Hiroyuki TOMIYAMA Takanori OKUMA Hiroyuki KANBARA Hiroto YASUURA

PAPER-Co-design

Vol:
E81-A No:12
Page(s):
2595-2604
The datapath width of a core processor has a strong effect on cost, power consumption, and performance of an embedded system integrated with memories into a single-chip. However, it is difficult for designers to appropriately determine the datapath width for each application because of the limited reusability of software and the lack of compilation techniques. The purpose of this paper is to clarify supports required from software for the optimal datapath width determination. As a solution, an embedded programming language, called Valen-C, and a retargetable Valen-C compiler are proposed. In this paper, the syntax and semantics of Valen-C along with the mechanism of the Valen-C retargetable compiler and how to preserve the accuracy of computation of programs in relation to various datapath widths are also described. Experiments with practical applications show that the total cost of the system including a core processor, ROM, and RAM is drastically reduced with little performance loss by reducing the datapath width.
Effectiveness of a High Speed Context Switching Method Using Register Bank
Jun-ichi ITO Takumi NAKANO Yoshinori TAKEUCHI Masaharu IMAI

PAPER-LSI Architecture

Vol:
E81-A No:12
Page(s):
2661-2667
This paper proposes a method to reduce the context switching time using a register bank to store contexts of working tasks. Hardware cost and performance were measured by modeling the register bank and controller in VHDL. Following results were obtained: (1) The controller can be implemented with a much smaller amount of hardware cost compared to that of the register bank, which is realized by SRAM module. (2) Context switching time can be reduced to less than 50% compared to that by software implementation. (3) Combination of the proposed architecture with our previous work (RTOS implemented in HW) gives us much higher performance of a hard real-time system.
Program Slicing on VHDL Descriptions and Its Evaluation
Shigeru ICHINOSE Mizuho IWAIHARA Hiroto YASUURA

PAPER-Design Reuse

Vol:
E81-A No:12
Page(s):
2585-2594
Providing various assistances for design modifications on HDL source codes is important for design reuse and quick design cycle in VLSI CAD. Program slicing is a software-engineering technique for analyzing, abstracting, and transforming programs. We show algorithms for extracting/removing behaviors of specified signals in VHDL descriptions. We also describe a VHDL slicing system and show experimental results of efficiently extracting components from VHDL descriptions.
Towards the IC Implementation of Adaptive Fuzzy Systems
Iluminada BATURONE Santiago SANCHEZ-SOLANO Jose L.HUERTAS

PAPER-Control and Adaptive Systems

Vol:
E81-A No:9
Page(s):
1877-1885
The required building blocks of CMOS fuzzy chips capable of performing as adaptive fuzzy systems are described in this paper. The building blocks are designed with mixed-signal current-mode cells that contain low-resolution A/D and D/A converters based on current mirrors. These cells provide the chip with an analog-digital programming interface. They also perform as computing elements of the fuzzy inference engine that calculate the output signal in either analog or digital formats, thus easing communication of the chip with digital processing environments and analog actuators. Experimental results of a 9-rule prototype integrated in a 2. 4-µm CMOS process are included. It has a digital interface to program the antecedents and consequents and a mixed-signal output interface. The proposed design approach enables the CMOS realization of low-cost and high-inference fuzzy systems able to cope with complex processes through adaptation. This is illustrated with simulated results of an application to the on-line identification of a nonlinear dynamical plant.
Plastic Cell Architecture: A Scalable Device Architecture for General-Purpose Reconfigurable Computing
Kouichi NAGAMI Kiyoshi OGURI Tsunemichi SHIOZAWA Hideyuki ITO Ryusuke KONISHI

PAPER

Vol:
E81-C No:9
Page(s):
1431-1437
We propose an architectural reference of programmable devices that we call Plastic Cell Architecture (PCA). PCA is a reference for implementing a device with autonomous reconfigurability, which we also introduce in this paper. This reconfigurability is a further step toward new reconfigurable computing, which introduces variable- and programmable-grained parallelism to wired logic computing. This computing follows the Object-Oriented paradigm: it regards configured circuits as objects. These objects will be described in a new hardware description language dealing with the semantics of dynamic module instantiation. PCA is the fusion of SRAM-based FPGAs and cellular automata (CA), where the CA are dedicated to support run time activities of objects. This paper mainly focus on autonomous reconfigurability and PCA. The following discussions examine a research direction towards general-purpose reconfigurable computing.
6. 144Mbit/s Burst Modem with an Adaptive Equalizer for TDMA Mobile Radio Communications
Satoshi DENNO Yushi SHIRATO

PAPER

Vol:
E81-B No:7
Page(s):
1453-1461
This paper describes methods used in the design of a high speed burst modem applied for mobile communication systems. The modem has burst mode operations including burst mode AGC (automatic gain control), burst mode BTR (bit timing recovery), adaptive equalization, and diversity based on a selection algorithm to achieve a higher performance in multipath fading channels. Moreover, the performance of the burst modem, which is developed using analog signal processing devices, DSPs (digital signal processors), and FPGAs (field programmable gate arrays), is analyzed experimentally. Results show that the modem can suppress irreducible BER values below 1. 0e-6 and attains a 2 dB implicit diversity gain over multipath fading channels modeled by a two-ray impulse response system with independent Rayleigh fading.

201-220hit(260hit)

Keyword Search Result

[Keyword] hardware(260hit)

A Hardware/Software Cosynthesis System for Digital Signal Processor Cores with Two Types of Register Files

System LSI Design Methods for Low Power LSIs

Three-Layer Cooperative Architecture for MPEG-2 Video Encoder LSI

An Adaptive List-Output Viterbi Equalizer with Fast Compare-Select Operation

A Built-in Self-Reconfigurable Scheme for 3D Mesh Arrays

A Memory Power Optimization Technique for Application Specific Embedded Systems

A Hardware/Software Cosynthesis System for Digital Signal Processor Cores

Performance Evaluation of STRON: A Hardware Implementation of a Real-Time OS

Hardware Synthesis from C Programs with Estimation of Bit Length of Variables

FPGA-Based Hash Circuit Synthesis with Evolutionary Algorithms

A Novel Computationally Adaptive Hardware Algorithm for Video Motion Estimation

Evolutionary Design of Arithmetic Circuits

A Method for Circular Pattern Recognition in a Binary Image and Its Implementation onto an FPGA

Efficient and Flexible Cosimulation Environment for DSP Applications

Language and Compiler for Optimizing Datapath Widths of Embedded Systems

Effectiveness of a High Speed Context Switching Method Using Register Bank

Program Slicing on VHDL Descriptions and Its Evaluation

Towards the IC Implementation of Adaptive Fuzzy Systems

Plastic Cell Architecture: A Scalable Device Architecture for General-Purpose Reconfigurable Computing

6. 144Mbit/s Burst Modem with an Adaptive Equalizer for TDMA Mobile Radio Communications

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles