IEICE global.ieice.org Site

Author Search Result

[Author] Kazuhito ITO(19hit)

1-19hit

Energy Minimization of Double Modular Redundant Conditional Processing by Common Condition Dependency
Kazuhito ITO

BRIEF PAPER-Integrated Electronics

Vol:
E103-C No:4
Page(s):
181-185
Double modular redundancy (DMR) is to execute operations twice and detect soft error by comparing the operation results. The error is corrected by executing necessary operations again. For the DMR design of conditional processing, a method is proposed which makes the secondary executions of the duplicated operations be dependent on the primary execution of the condition operation, thereby widening the schedule solution space and allowing better results to be derived. The energy minimization with the proposed method is formulated as ILP models and the optimum solution is obtained by using an ILP solver.
A Processor Accelerator for Software Decoding of BCH Codes
Kazuhito ITO

PAPER-VLSI Design Technology and CAD

Vol:
E93-A No:7
Page(s):
1329-1337
The BCH code is one of the well-known error correction codes and its decoding contains many operations in Galois field. These operations require many instruction steps or large memory area for look-up tables on ordinary processors. While dedicated hardware BCH decoders achieves higher decoding speed than software, the advantage of software decoding is its flexibility to decode BCH codes of variable parameters. In this paper, an auxiliary circuit to be embedded in a pipelined processor is proposed which accelerates software decoding of various BCH codes.
Bits Truncation Adapteve Pyramid Algorithm for Motion Estimation of MPEG2
Li JIANG Kazuhito ITO Hiroaki KUNIEDA

PAPER

Vol:
E80-A No:8
Page(s):
1438-1445
In this paper, a new bits truncation adaptive pyramid (BTAP) algorithm for motion estimation is presented. The method employs bits truncation of the gray level from 8bits to much less bits in the searching algorithm. Compared with conventional fast block matching algorithms, this method drastically improves speed for motion estimation of reduced gray-level images and preserves reasonable performance and algorithm reliability. Bits truncation concept is well combined with hierarchical pyramid algorithm in order to truncate adaptively according to image characteristics. The computation complexity is much less than that of pyramid algorithm and 3-Step motion estimation algorithm because of bit-truncated searbh and low overhead adaptation. Nevertheless, the PSNR property is also comparable with these two algorithms for various video sequences.
An Area-Time Efficient Key Equation Solver with Euclidean Algorithm for Reed-Solomon Decoders
Kazuhito ITO

PAPER-VLSI Design Technology and CAD

Vol:
E96-A No:2
Page(s):
609-617
Reed-Solomon (RS) code is one of the well-known and widely used error correction codes. Among the components of a hardware RS decoder, the key equation solver (KES) unit occupies a relatively large portion of the hardware. It is important to develop an efficient KES architecture to implement efficient RS decoders. In this paper, a novel polynomial division technique used in the Euclidean algorithm (EA) of the KES is presented which achieves the short critical path delay of one Galois multiplier and one Galois adder. Then a KES architecture with the EA is proposed which is efficient in the sense of the product of area and time.
Valid Digit and Overflow Information to Reduce Energy Dissipation of Functional Units in General Purpose Processors
Kazuhito ITO Takuya NUMATA

PAPER

Vol:
E96-C No:4
Page(s):
463-472
In order to reduce the dynamic energy dissipation in CMOS LSIs, it is effective to reduce the frequency of value changes of the signals. In this paper, a data expression with the valid digit and lower digit overflow information is proposed to suppress unnecessary signal changes in integer functional units and registers of general purpose processors. Experimental results show that the proposed method reduces the energy dissipation by 9.8% for benchmark programs.
Low Complexity Reed-Solomon Decoder Design with Pipelined Recursive Euclidean Algorithm
Kazuhito ITO

PAPER

Vol:
E99-A No:12
Page(s):
2453-2462
A Reed-Solomon (RS) decoder is designed based on the pipelined recursive Euclidean algorithm in the key equation solution. While the Euclidean algorithm uses less Galois multipliers than the modified Euclidean (ME) and reformulated inversionless Berlekamp-Massey (RiBM) algorithms, division between two elements in Galois field is required. By implementing the division with a multi-cycle Galois inverter and a serial Galois multiplier, the proposed key equation solver architecture achieves lower complexity than the conventional ME and RiBM based architectures. The proposed RS (255,239) decoder reduces the hardware complexity by 25.9% with 6.5% increase in decoding latency.
Hardware-Efficient Local Extrema Detection for Scale-Space Extrema Detection in SIFT Algorithm
Kazuhito ITO Hiroki HAYASHI

LETTER

Vol:
E99-A No:12
Page(s):
2507-2510
In this paper a hardware-efficient local extrema detection (LED) method used for scale-space extrema detection in the SIFT algorithm is proposed. By reformulating the reuse of the intermediate results in taking the local maximum and minimum, the necessary operations in LED are reduced without degrading the detection accuracy. The proposed method requires 25% to 35% less logic resources than the conventional method when implemented in an FPGA with a slight increase in latency.
Hardware Efficient and Low Latency Implementations of Look-Ahead ACS Computation for Viterbi Decoders
Kazuhito ITO Ryoto SHIRASAKA

PAPER-High-Level Synthesis and System-Level Design

Vol:
E96-A No:12
Page(s):
2680-2688
The throughput rate of Viterbi decoding (VD) is not limited by the speed of functional units when look-ahead computation techniques are used. The disadvantages of the look-ahead computation in VD are the hardware complexity and the decode latency. In this paper, implementation methods of the look-ahead ACS computation are proposed to improve the hardware efficiency and reduce the latency where the hardware efficiency and the latency can be balanced with a single parameter.
New Rate Control Method with Minimum Skipped Frames for Very Low Delay in H.263+ Codec
Trio ADIONO Tsuyoshi ISSHIKI Chawalit HONSAWEK Kazuhito ITO Dongju LI Hiroaki KUNIEDA

PAPER-Image

Vol:
E85-A No:6
Page(s):
1396-1407
A new H.263+ rate control method that has very low encoder-decoder delay, small buffer and low computational complexity for hardware realization is proposed in this paper. This method focuses on producing low encoder-decoder delay in order to solve the lip synchronization problem. Low encoder-decoder delay is achieved by improving target bit rate achievement and reducing processing delay. The target bit rate achievement is improved by allocating an optimum frame encoding bits, and employing a new adaptive threshold of zero vector motion estimation. The processing delay is reduced by simplifying quantization parameter computation, applying a new non-zero coefficient distortion measure and utilizing previous frame information in current frame encoding. The simulation results indicate very large number skipped frames reduction in comparison with the test model TMN8. There were 80 skipped frames less than that of TMN8 within a 380 frame sequence during encoding of a very high movement video sequence. The 27 kbps target bit rate is achieved with insignificant difference for various types of video sequences. The simulation results also show that our method successfully allocates encoding bits, maintains small data at the encoder buffer and avoids buffer from overflow and underflow.
Modularization and Processor Placement for DSP Neo-Systolic Array
Kazuhito ITO Kesami HAGIWARA Takashi SHIMIZU Hiroaki KUNIEDA

PAPER

Vol:
E76-A No:3
Page(s):
349-361
A further study on a VLSI system compiler, named VEGA (VLSI Embodiment for General Algorithms), is presented. It maps a general digital signal processing algorithm onto a neo-systolic array, which is a VLSI oriented multiprocessor array. Highly complicated mapping problem is divided into subproblems such as modularization, operation grouping, processor placement, scheduling, control logic synthesis, and mask pattern generation. In this paper, the modularization technique is proposed which homogenizes all the operations of the processing algorithm to multiply-add operations. The processor placement algorithm to map processing algorithm onto a neo-systolic array so as to minimize data transfer time is also proposed.
A Low Power and Hardware Efficient Syndrome Key Equation Solver Architecture and Its Folding with Pipelining
Kazuhito ITO

PAPER-VLSI Design Technology and CAD

Vol:
E98-A No:5
Page(s):
1058-1066
Syndrome key equation solution is one of the important processes in the decoding of Reed-Solomon codes. This paper proposes a low power key equation solver (KES) architecture where the power consumption is reduced by decreasing the required number of multiplications without degrading the decoding throughput and latency. The proposed method employs smaller number of multipliers than a conventional low power KES architecture. The critical path in the proposed KES circuit is minimized so that the operation at a high clock frequency is possible. A low power folded KES architecture is also proposed to further reduce the hardware complexity by executing folded operations in a pipelined manner with a slight increase in decoding latency.
Reduction of LSI Maximum Power Consumption with Standard Cell Library of Stack Structured Cells
Yuki IMAI Shinichi NISHIZAWA Kazuhito ITO

PAPER

Pubricized:
2021/09/01
Vol:
E105-A No:3
Page(s):
487-496
Environmental power generation devices such as solar cells are used as power sources for IoT devices. Due to the large internal resistance of such power source, LSIs in the IoT devices may malfunction when the LSI operates at high speed, a large current flows, and the voltage drops. In this paper, a standard cell library of stacked structured cells is proposed to increase the delay of logic circuits within the range not exceeding the clock cycle, thereby reducing the maximum current of the LSIs. We show that the maximum power consumption of LSIs can be reduced without increasing the energy consumption of the LSIs.
Register Minimization and its Application in Schedule Exploration for Area Minimization for Double Modular Redundancy LSI Design
Yuya KITAZAWA Kazuhito ITO

PAPER

Pubricized:
2021/09/01
Vol:
E105-A No:3
Page(s):
530-539
Double modular redundancy (DMR) is to execute an operation twice and detect a soft error by comparing the duplicated operation results. The soft error is corrected by re-executing necessary operations. The re-execution requires error-free input data and registers are needed to store such necessary error-free data. In this paper, a method to minimize the required number of registers is proposed where an appropriate subgraph partitioning of operation nodes are searched. In addition, using the proposed register minimization method, a minimization of the area of functional units and registers required to implement the DMR design is proposed.
An Overlapped Scheduling Method for an Iterative Processing Algorithm with Conditional Operations
Kazuhito ITO Tatsuya KAWASAKI

PAPER

Vol:
E81-A No:3
Page(s):
429-438
One of the ways to execute a processing algorithm in high speed is parallel processing on multiple computing resources such as processors and functional units. To identify the minimum number of computing resources, the most important is the scheduling to determine when each operation in the processing algorithm is executed. Among feasible schedules satisfying all the data dependencies in the processing algorithm, an overlapped schedule can achieve the fastest execution speed for an iterative processing algorithm. In the case of processing algorithms with operations which are executed on some conditions, computing resources can be shared by those conditional operations. In this paper, we propose a scheduling method which derives an overlapped schedule where the required number of computing resources is minimized by considering the sharing by conditional operations.
Energy Minimization of Full TMR Design with Optimized Selection of Temporal/Spatial TMR Mode and Supply Voltage
Kazuhito ITO

PAPER-High-Level Synthesis and System-Level Design

Vol:
E97-A No:12
Page(s):
2530-2539
While Triple modular Redundancy (TMR) is effective in eliminating soft errors in LSIs, the overhead of the triplicated area as well as the triplicated energy consumption is the problem. In addition to the spatial TMR mode where executions are simply tripricated and the majority is taken, the temporal TMR mode is available where only two copies of an operation are executed and the results are compared, then if the results differ, the third copy is executed to get the correct result. Appropriately selecting the power supply voltage is also an effective technique to reduce the energy consumption. In this paper, a method to derive a TMR design is proposed which selects the TMR mode and supply voltage for each operation to minimize the energy consumption within the time and area constraints.
A Trace-Back Method with Source States for Viterbi Decoding of Rate-1/n Convolutional Codes
Kazuhito ITO

PAPER-VLSI Design Technology and CAD

Vol:
E95-A No:4
Page(s):
767-775
The Viterbi algorithm is widely used for decoding of the convolutional codes. The trace-back method is preferable to the register exchange method because of lower power consumption especially for convolutional codes with many states. A drawback of the conventional trace-back is that it generally requires long latency to obtain the decoded data. In this paper, a method of the trace-back with source states instead of decision bits is proposed which reduces the number of memory accesses. The dedicated memory is also presented which supports the proposed trace-back method. The reduced memory accesses result in smaller power consumption and a shorer decode latency than the conventional method.
System-MSPA Design of H.263+ Video Encoder/Decoder LSI for Videotelephony Applications
Chawalit HONSAWEK Kazuhito ITO Tomohiko OHTSUKA Trio ADIONO Dongju LI Tsuyoshi ISSHIKI Hiroaki KUNIEDA

PAPER-VLSI Design

Vol:
E84-A No:11
Page(s):
2614-2622
In this paper, a LSI design for video encoder and decoder for H.263+ video compression is presented. LSI operates under clock frequency of 27 MHz to compress QCIF (176144 pixels) at the frame rate of 30 frame per second. The core size is 4.6 4.6 mm2 in a 0.35 µm process. The architecture is based on bus connected heterogeneous dedicated modules, named as System-MSPA architecture. It employs the fast and small-chip-area dedicated modules in lower level and controls them by employing the slow and flexible programmable device and an external DRAM. Design results in success to achieve real time encoder in quite compact size without losing flexibility and expand ability. Real time emulation and easy test capability with external PC is also implemented.
A Processor Accelerator for Software Decoding of Reed-Solomon Codes
Kazuhito ITO Keisuke NASU

PAPER-VLSI Design Technology and CAD

Vol:
E95-A No:5
Page(s):
884-893
Decoding of Reed-Solomon (RS) codes requires many arithmetic operations in the Galois field. While the software decoding of RS codes has the advantage of its flexibility to support RS codes of variable parameters, the speed of the software decoding is slower than dedicated hardware RS decoders because arithmetic operations in the Galois field on an ordinary processor require many instruction steps. To achieve fast software decoding of RS codes, it is effective to accelerate Galois operations by both dedicated circuitry and parallel processing. In this paper, an accelerator is proposed which is attached to the base processor to speed up the software decoding of RS codes by parallel execution of Galois operations.
Minimization of Vote Operations for Soft Error Detection in DMR Design with Error Correction by Operation Re-Execution
Kazuhito ITO Yuto ISHIHARA Shinichi NISHIZAWA

PAPER

Vol:
E101-A No:12
Page(s):
2271-2279
As LSI chips integrate more transistors and the operating power supply voltage decreases, LSI chips are becoming more vulnerable to the soft error caused by neutrons induced from cosmic rays. The soft error is detected by comparing the duplicated operation results in double modular redundancy (DMR) and the error is corrected by re-executing necessary operations. In this paper, based on the error recovery scheme of re-executing necessary operations, the minimization of the vote operations for error checking with respect to given resource constraints is considered. An ILP model for the optimal solution to the problem is presented and a heuristic algorithm is proposed to minimize the vote operations.

Author Search Result

[Author] Kazuhito ITO(19hit)

Energy Minimization of Double Modular Redundant Conditional Processing by Common Condition Dependency

A Processor Accelerator for Software Decoding of BCH Codes

Bits Truncation Adapteve Pyramid Algorithm for Motion Estimation of MPEG2

An Area-Time Efficient Key Equation Solver with Euclidean Algorithm for Reed-Solomon Decoders

Valid Digit and Overflow Information to Reduce Energy Dissipation of Functional Units in General Purpose Processors

Low Complexity Reed-Solomon Decoder Design with Pipelined Recursive Euclidean Algorithm

Hardware-Efficient Local Extrema Detection for Scale-Space Extrema Detection in SIFT Algorithm

Hardware Efficient and Low Latency Implementations of Look-Ahead ACS Computation for Viterbi Decoders

New Rate Control Method with Minimum Skipped Frames for Very Low Delay in H.263+ Codec

Modularization and Processor Placement for DSP Neo-Systolic Array

A Low Power and Hardware Efficient Syndrome Key Equation Solver Architecture and Its Folding with Pipelining

Reduction of LSI Maximum Power Consumption with Standard Cell Library of Stack Structured Cells

Register Minimization and its Application in Schedule Exploration for Area Minimization for Double Modular Redundancy LSI Design

An Overlapped Scheduling Method for an Iterative Processing Algorithm with Conditional Operations

Energy Minimization of Full TMR Design with Optimized Selection of Temporal/Spatial TMR Mode and Supply Voltage

A Trace-Back Method with Source States for Viterbi Decoding of Rate-1/n Convolutional Codes

System-MSPA Design of H.263+ Video Encoder/Decoder LSI for Videotelephony Applications

A Processor Accelerator for Software Decoding of Reed-Solomon Codes

Minimization of Vote Operations for Soft Error Detection in DMR Design with Error Correction by Operation Re-Execution

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles