The search functionality is under construction.

Author Search Result

[Author] Kazuhito ITO(19hit)

1-19hit
  • Energy Minimization of Double Modular Redundant Conditional Processing by Common Condition Dependency

    Kazuhito ITO  

     
    BRIEF PAPER-Integrated Electronics

      Vol:
    E103-C No:4
      Page(s):
    181-185

    Double modular redundancy (DMR) is to execute operations twice and detect soft error by comparing the operation results. The error is corrected by executing necessary operations again. For the DMR design of conditional processing, a method is proposed which makes the secondary executions of the duplicated operations be dependent on the primary execution of the condition operation, thereby widening the schedule solution space and allowing better results to be derived. The energy minimization with the proposed method is formulated as ILP models and the optimum solution is obtained by using an ILP solver.

  • A Processor Accelerator for Software Decoding of BCH Codes

    Kazuhito ITO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E93-A No:7
      Page(s):
    1329-1337

    The BCH code is one of the well-known error correction codes and its decoding contains many operations in Galois field. These operations require many instruction steps or large memory area for look-up tables on ordinary processors. While dedicated hardware BCH decoders achieves higher decoding speed than software, the advantage of software decoding is its flexibility to decode BCH codes of variable parameters. In this paper, an auxiliary circuit to be embedded in a pipelined processor is proposed which accelerates software decoding of various BCH codes.

  • Bits Truncation Adapteve Pyramid Algorithm for Motion Estimation of MPEG2

    Li JIANG  Kazuhito ITO  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E80-A No:8
      Page(s):
    1438-1445

    In this paper, a new bits truncation adaptive pyramid (BTAP) algorithm for motion estimation is presented. The method employs bits truncation of the gray level from 8bits to much less bits in the searching algorithm. Compared with conventional fast block matching algorithms, this method drastically improves speed for motion estimation of reduced gray-level images and preserves reasonable performance and algorithm reliability. Bits truncation concept is well combined with hierarchical pyramid algorithm in order to truncate adaptively according to image characteristics. The computation complexity is much less than that of pyramid algorithm and 3-Step motion estimation algorithm because of bit-truncated searbh and low overhead adaptation. Nevertheless, the PSNR property is also comparable with these two algorithms for various video sequences.

  • An Area-Time Efficient Key Equation Solver with Euclidean Algorithm for Reed-Solomon Decoders

    Kazuhito ITO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E96-A No:2
      Page(s):
    609-617

    Reed-Solomon (RS) code is one of the well-known and widely used error correction codes. Among the components of a hardware RS decoder, the key equation solver (KES) unit occupies a relatively large portion of the hardware. It is important to develop an efficient KES architecture to implement efficient RS decoders. In this paper, a novel polynomial division technique used in the Euclidean algorithm (EA) of the KES is presented which achieves the short critical path delay of one Galois multiplier and one Galois adder. Then a KES architecture with the EA is proposed which is efficient in the sense of the product of area and time.

  • Valid Digit and Overflow Information to Reduce Energy Dissipation of Functional Units in General Purpose Processors

    Kazuhito ITO  Takuya NUMATA  

     
    PAPER

      Vol:
    E96-C No:4
      Page(s):
    463-472

    In order to reduce the dynamic energy dissipation in CMOS LSIs, it is effective to reduce the frequency of value changes of the signals. In this paper, a data expression with the valid digit and lower digit overflow information is proposed to suppress unnecessary signal changes in integer functional units and registers of general purpose processors. Experimental results show that the proposed method reduces the energy dissipation by 9.8% for benchmark programs.

  • Low Complexity Reed-Solomon Decoder Design with Pipelined Recursive Euclidean Algorithm

    Kazuhito ITO  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2453-2462

    A Reed-Solomon (RS) decoder is designed based on the pipelined recursive Euclidean algorithm in the key equation solution. While the Euclidean algorithm uses less Galois multipliers than the modified Euclidean (ME) and reformulated inversionless Berlekamp-Massey (RiBM) algorithms, division between two elements in Galois field is required. By implementing the division with a multi-cycle Galois inverter and a serial Galois multiplier, the proposed key equation solver architecture achieves lower complexity than the conventional ME and RiBM based architectures. The proposed RS (255,239) decoder reduces the hardware complexity by 25.9% with 6.5% increase in decoding latency.

  • Hardware-Efficient Local Extrema Detection for Scale-Space Extrema Detection in SIFT Algorithm

    Kazuhito ITO  Hiroki HAYASHI  

     
    LETTER

      Vol:
    E99-A No:12
      Page(s):
    2507-2510

    In this paper a hardware-efficient local extrema detection (LED) method used for scale-space extrema detection in the SIFT algorithm is proposed. By reformulating the reuse of the intermediate results in taking the local maximum and minimum, the necessary operations in LED are reduced without degrading the detection accuracy. The proposed method requires 25% to 35% less logic resources than the conventional method when implemented in an FPGA with a slight increase in latency.

  • Hardware Efficient and Low Latency Implementations of Look-Ahead ACS Computation for Viterbi Decoders

    Kazuhito ITO  Ryoto SHIRASAKA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2680-2688

    The throughput rate of Viterbi decoding (VD) is not limited by the speed of functional units when look-ahead computation techniques are used. The disadvantages of the look-ahead computation in VD are the hardware complexity and the decode latency. In this paper, implementation methods of the look-ahead ACS computation are proposed to improve the hardware efficiency and reduce the latency where the hardware efficiency and the latency can be balanced with a single parameter.

  • New Rate Control Method with Minimum Skipped Frames for Very Low Delay in H.263+ Codec

    Trio ADIONO  Tsuyoshi ISSHIKI  Chawalit HONSAWEK  Kazuhito ITO  Dongju LI  Hiroaki KUNIEDA  

     
    PAPER-Image

      Vol:
    E85-A No:6
      Page(s):
    1396-1407

    A new H.263+ rate control method that has very low encoder-decoder delay, small buffer and low computational complexity for hardware realization is proposed in this paper. This method focuses on producing low encoder-decoder delay in order to solve the lip synchronization problem. Low encoder-decoder delay is achieved by improving target bit rate achievement and reducing processing delay. The target bit rate achievement is improved by allocating an optimum frame encoding bits, and employing a new adaptive threshold of zero vector motion estimation. The processing delay is reduced by simplifying quantization parameter computation, applying a new non-zero coefficient distortion measure and utilizing previous frame information in current frame encoding. The simulation results indicate very large number skipped frames reduction in comparison with the test model TMN8. There were 80 skipped frames less than that of TMN8 within a 380 frame sequence during encoding of a very high movement video sequence. The 27 kbps target bit rate is achieved with insignificant difference for various types of video sequences. The simulation results also show that our method successfully allocates encoding bits, maintains small data at the encoder buffer and avoids buffer from overflow and underflow.

  • Modularization and Processor Placement for DSP Neo-Systolic Array

    Kazuhito ITO  Kesami HAGIWARA  Takashi SHIMIZU  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E76-A No:3
      Page(s):
    349-361

    A further study on a VLSI system compiler, named VEGA (VLSI Embodiment for General Algorithms), is presented. It maps a general digital signal processing algorithm onto a neo-systolic array, which is a VLSI oriented multiprocessor array. Highly complicated mapping problem is divided into subproblems such as modularization, operation grouping, processor placement, scheduling, control logic synthesis, and mask pattern generation. In this paper, the modularization technique is proposed which homogenizes all the operations of the processing algorithm to multiply-add operations. The processor placement algorithm to map processing algorithm onto a neo-systolic array so as to minimize data transfer time is also proposed.

  • A Low Power and Hardware Efficient Syndrome Key Equation Solver Architecture and Its Folding with Pipelining

    Kazuhito ITO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E98-A No:5
      Page(s):
    1058-1066

    Syndrome key equation solution is one of the important processes in the decoding of Reed-Solomon codes. This paper proposes a low power key equation solver (KES) architecture where the power consumption is reduced by decreasing the required number of multiplications without degrading the decoding throughput and latency. The proposed method employs smaller number of multipliers than a conventional low power KES architecture. The critical path in the proposed KES circuit is minimized so that the operation at a high clock frequency is possible. A low power folded KES architecture is also proposed to further reduce the hardware complexity by executing folded operations in a pipelined manner with a slight increase in decoding latency.

  • Reduction of LSI Maximum Power Consumption with Standard Cell Library of Stack Structured Cells

    Yuki IMAI  Shinichi NISHIZAWA  Kazuhito ITO  

     
    PAPER

      Pubricized:
    2021/09/01
      Vol:
    E105-A No:3
      Page(s):
    487-496

    Environmental power generation devices such as solar cells are used as power sources for IoT devices. Due to the large internal resistance of such power source, LSIs in the IoT devices may malfunction when the LSI operates at high speed, a large current flows, and the voltage drops. In this paper, a standard cell library of stacked structured cells is proposed to increase the delay of logic circuits within the range not exceeding the clock cycle, thereby reducing the maximum current of the LSIs. We show that the maximum power consumption of LSIs can be reduced without increasing the energy consumption of the LSIs.

  • Register Minimization and its Application in Schedule Exploration for Area Minimization for Double Modular Redundancy LSI Design

    Yuya KITAZAWA  Kazuhito ITO  

     
    PAPER

      Pubricized:
    2021/09/01
      Vol:
    E105-A No:3
      Page(s):
    530-539

    Double modular redundancy (DMR) is to execute an operation twice and detect a soft error by comparing the duplicated operation results. The soft error is corrected by re-executing necessary operations. The re-execution requires error-free input data and registers are needed to store such necessary error-free data. In this paper, a method to minimize the required number of registers is proposed where an appropriate subgraph partitioning of operation nodes are searched. In addition, using the proposed register minimization method, a minimization of the area of functional units and registers required to implement the DMR design is proposed.

  • An Overlapped Scheduling Method for an Iterative Processing Algorithm with Conditional Operations

    Kazuhito ITO  Tatsuya KAWASAKI  

     
    PAPER

      Vol:
    E81-A No:3
      Page(s):
    429-438

    One of the ways to execute a processing algorithm in high speed is parallel processing on multiple computing resources such as processors and functional units. To identify the minimum number of computing resources, the most important is the scheduling to determine when each operation in the processing algorithm is executed. Among feasible schedules satisfying all the data dependencies in the processing algorithm, an overlapped schedule can achieve the fastest execution speed for an iterative processing algorithm. In the case of processing algorithms with operations which are executed on some conditions, computing resources can be shared by those conditional operations. In this paper, we propose a scheduling method which derives an overlapped schedule where the required number of computing resources is minimized by considering the sharing by conditional operations.

  • Energy Minimization of Full TMR Design with Optimized Selection of Temporal/Spatial TMR Mode and Supply Voltage

    Kazuhito ITO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E97-A No:12
      Page(s):
    2530-2539

    While Triple modular Redundancy (TMR) is effective in eliminating soft errors in LSIs, the overhead of the triplicated area as well as the triplicated energy consumption is the problem. In addition to the spatial TMR mode where executions are simply tripricated and the majority is taken, the temporal TMR mode is available where only two copies of an operation are executed and the results are compared, then if the results differ, the third copy is executed to get the correct result. Appropriately selecting the power supply voltage is also an effective technique to reduce the energy consumption. In this paper, a method to derive a TMR design is proposed which selects the TMR mode and supply voltage for each operation to minimize the energy consumption within the time and area constraints.

  • A Trace-Back Method with Source States for Viterbi Decoding of Rate-1/n Convolutional Codes

    Kazuhito ITO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:4
      Page(s):
    767-775

    The Viterbi algorithm is widely used for decoding of the convolutional codes. The trace-back method is preferable to the register exchange method because of lower power consumption especially for convolutional codes with many states. A drawback of the conventional trace-back is that it generally requires long latency to obtain the decoded data. In this paper, a method of the trace-back with source states instead of decision bits is proposed which reduces the number of memory accesses. The dedicated memory is also presented which supports the proposed trace-back method. The reduced memory accesses result in smaller power consumption and a shorer decode latency than the conventional method.

  • System-MSPA Design of H.263+ Video Encoder/Decoder LSI for Videotelephony Applications

    Chawalit HONSAWEK  Kazuhito ITO  Tomohiko OHTSUKA  Trio ADIONO  Dongju LI  Tsuyoshi ISSHIKI  Hiroaki KUNIEDA  

     
    PAPER-VLSI Design

      Vol:
    E84-A No:11
      Page(s):
    2614-2622

    In this paper, a LSI design for video encoder and decoder for H.263+ video compression is presented. LSI operates under clock frequency of 27 MHz to compress QCIF (176144 pixels) at the frame rate of 30 frame per second. The core size is 4.6 4.6 mm2 in a 0.35 µm process. The architecture is based on bus connected heterogeneous dedicated modules, named as System-MSPA architecture. It employs the fast and small-chip-area dedicated modules in lower level and controls them by employing the slow and flexible programmable device and an external DRAM. Design results in success to achieve real time encoder in quite compact size without losing flexibility and expand ability. Real time emulation and easy test capability with external PC is also implemented.

  • A Processor Accelerator for Software Decoding of Reed-Solomon Codes

    Kazuhito ITO  Keisuke NASU  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:5
      Page(s):
    884-893

    Decoding of Reed-Solomon (RS) codes requires many arithmetic operations in the Galois field. While the software decoding of RS codes has the advantage of its flexibility to support RS codes of variable parameters, the speed of the software decoding is slower than dedicated hardware RS decoders because arithmetic operations in the Galois field on an ordinary processor require many instruction steps. To achieve fast software decoding of RS codes, it is effective to accelerate Galois operations by both dedicated circuitry and parallel processing. In this paper, an accelerator is proposed which is attached to the base processor to speed up the software decoding of RS codes by parallel execution of Galois operations.

  • Minimization of Vote Operations for Soft Error Detection in DMR Design with Error Correction by Operation Re-Execution

    Kazuhito ITO  Yuto ISHIHARA  Shinichi NISHIZAWA  

     
    PAPER

      Vol:
    E101-A No:12
      Page(s):
    2271-2279

    As LSI chips integrate more transistors and the operating power supply voltage decreases, LSI chips are becoming more vulnerable to the soft error caused by neutrons induced from cosmic rays. The soft error is detected by comparing the duplicated operation results in double modular redundancy (DMR) and the error is corrected by re-executing necessary operations. In this paper, based on the error recovery scheme of re-executing necessary operations, the minimization of the vote operations for error checking with respect to given resource constraints is considered. An ILP model for the optimal solution to the problem is presented and a heuristic algorithm is proposed to minimize the vote operations.