IEICE global.ieice.org Site

Author Search Result

[Author] Shinji KIMURA(43hit)

1-20hit(43hit)

Dual-Stage Pseudo Power Gating with Advanced Clustering Algorithm for Gate Level Power Optimization
Yu JIN Zhe DU Shinji KIMURA

PAPER-Logic Synthesis, Test and Verification

Vol:
E96-A No:12
Page(s):
2568-2575
Pseudo Power Gating (Pseudo PG) is one of gate level power reduction methods for combinational circuits by stopping unnecessary input changes of gates. In Pseudo PG, an extra control signal might be added to a gate and other input changes of the gate are deactivated when the control signal takes the controlling value. To improve the power reduction capability, the paper newly introduces dual-stage Pseudo PG with advanced clustering algorithm where up to two extra control signals are added to a gate if effective. The advanced clustering algorithm selects the first control signal to be compatible with the second control signal based on the propagation of controlling condition via a path, with which candidates of controllable gates excluded by the maximum depth constraint can be controlled. Experimental results show that the proposed dual-stage Pseudo PG method has obtained 23.23% average power reduction with 5.28% delay penalty with respect to the original circuits, and has obtained 10.46% more power reduction with 2.75% delay penalty compared with respect to circuits applying the original single-stage Pseudo PG.
Hardware Synthesis from C Programs with Estimation of Bit Length of Variables
Osamu OGAWA Kazuyoshi TAKAGI Yasufumi ITOH Shinji KIMURA Katsumasa WATANABE

PAPER

Vol:
E82-A No:11
Page(s):
2338-2346
In the hardware synthesis methods with high level languages such as C language, optimization quality of the compilers has a great influence on the area and speed of the synthesized circuits. Among hardware-oriented optimization methods required in such compilers, minimization of the bit length of the data-paths is one of the most important issues. In this paper, we propose an estimation algorithm of the necessary bit length of variables for this aim. The algorithm analyzes the control/data-flow graph translated from C programs and decides the bit length of each variable. On several experiments, the bit length of variables can be reduced by half with respect to the declared length. This method is effective not only for reducing the circuit area but also for reducing the delay of the operation units such as adders.
Power Optimization of Sequential Circuits Using Switching Activity Based Clock Gating
Xin MAN Takashi HORIYAMA Shinji KIMURA

PAPER-Logic Synthesis, Test and Verification

Vol:
E93-A No:12
Page(s):
2472-2480
Clock gating is the insertion of control signal for registers to switch off unnecessary clock signals selectively without violating the functional correctness of the original design so as to reduce the dynamic power consumption. Commercial EDA tools usually have a mechanism to generate clock gating logic based on the structural method where the control signals specified by designers are used, and the effectiveness of the clock gating depends on the specified control signals. In the research, we focus on the automatic clock gating logic generation and propose a method based on the candidate extraction and control signal selection. We formalize the control signal selection using linear formulae and devise an optimization method based on BDD. The method is effective for circuits with a lot of shared candidates by different registers. The method is applied to counter circuits to check the co-relation with power simulation results and a set of benchmark circuits. 19.1-71.9% power reduction has been found on counter circuitsafter layout and 2.3-18.0% cost reduction on benchmark circuits.
Bit Length Optimization of Fractional Part on Floating to Fixed Point Conversion for High-Level Synthesis
Nobuhiro DOI Takashi HORIYAMA Masaki NAKANISHI Shinji KIMURA Katsumasa WATANABE

PAPER-Logic and High Level Synthesis

Vol:
E86-A No:12
Page(s):
3184-3191
In the hardware synthesis from a high-level language such as C, the bit length of variables is one of the key issues for the area and speed optimization. Usually, designers are required to optimize the bit-length of each variable manually using the time-consuming simulation on huge-data. In this paper, we propose an optimization method of the fractional bit length in the conversion from floating-point variables to fixed-point variables. The method is based on error propagation and the backward propagation of the accuracy limitation. The method is fully analytical and fast compared to simulation based methods.
Approximate FPGA-Based Multipliers Using Carry-Inexact Elementary Modules
Yi GUO Heming SUN Ping LEI Shinji KIMURA

PAPER

Vol:
E103-A No:9
Page(s):
1054-1062
Approximate multiplier design is an effective technique to improve hardware performance at the cost of accuracy loss. The current approximate multipliers are mostly ASIC-based and are dedicated for one particular application. In contrast, FPGA has been an attractive choice for many applications because of its high performance, reconfigurability, and fast development round. This paper presents a novel methodology for designing approximate multipliers by employing the FPGA-based fabrics (primarily look-up tables and carry chains). The area and latency are significantly reduced by applying approximation on carry results and cutting the carry propagation path in the multiplier. Moreover, we explore higher-order multipliers on architectural space by using our proposed small-size approximate multipliers as elementary modules. For different accuracy-hardware requirements, eight configurations for approximate 8×8 multiplier are discussed. In terms of mean relative error distance (MRED), the error of the proposed 8×8 multiplier is as low as 1.06%. Compared with the exact multiplier, our proposed design can reduce area by 43.66% and power by 24.24%. The critical path latency reduction is up to 29.50%. The proposed multiplier design has a better accuracy-hardware tradeoff than other designs with comparable accuracy. Moreover, image sharpening processing is used to assess the efficiency of approximate multipliers on application.
Robust Heuristics for Multi-Level Logic Simplification Considering Local Circuit Structure
Qiang ZHU Yusuke MATSUNAGA Shinji KIMURA Katsumasa WATANABE

PAPER-Logic Synthesis

Vol:
E83-A No:12
Page(s):
2520-2527
Combinational logic circuits are usually implemented as multi-level networks of logic nodes. Multi-level logic simplification using the don't cares on each node is widely used. Large don't cares give good simplification results, but suffer from huge memory area and computation time. Extraction of useful don't cares and reduction of the size of the don't cares are important problems on the simplification using don't cares. In the paper, we propose a new robust heuristic method for the selection of don't cares. We consider an adaptive subnetwork for each simplified node in the network and introduce a stepwise enhancement method of the subnetwork considering the memory area and the network structure. The don't cares extracted from the adaptive subnetworks are called the local don't cares. We have implemented our method for satisfiability don't cares and observability don't cares. We have applied the method on MCNC89 benchmarks, and compared the experimental results with those of the SIS system. The results demonstrate the superiority of our method on the quality of the results and on the size of applicable circuits.
Finite Input-Memory Automaton Based Checker Synthesis of SystemVerilog Assertions for FPGA Prototyping
Chengjie ZANG Shinji KIMURA

PAPER-VLSI Design Technology and CAD

Vol:
E92-A No:6
Page(s):
1454-1463
Checker synthesis for assertion based verification becomes popular because of the recent progress on the FPGA prototyping environment. In the paper, we propose a checker synthesis method based on the finite input-memory automaton suitable for embedded RAM modules in FPGA. There are more than 1 Mbit memories in medium size FPGA's and such embedded memory cells have the capability to be used as the shift registers. The main idea is to construct a checker circuit using the finite input-memory automata and implement shift register chain by logic elements or embedded RAM modules. When using RAM module, the method does not consume any logic element for storing the value. Note that the shift register chain of input memory can be shared with different assertions and we can reduce the hardware resource significantly. We have checked the effectiveness of the proposed method using several assertions.
A 7-Die 3D Stacked 3840×2160@120 fps Motion Estimation Processor
Shuping ZHANG Jinjia ZHOU Dajiang ZHOU Shinji KIMURA Satoshi GOTO

PAPER

Vol:
E100-C No:3
Page(s):
223-231
In this paper, a hamburger architecture with a 3D stacked reconfigurable memory is proposed for a 4K motion estimation (ME) processor. By positioning the memory dies on both the top and bottom sides of the processor die, the proposed hamburger architecture can reduce the usage of the signal through-silicon via (TSV), and balance the power delivery network and the clock tree of the entire system. It results in 1/3 reduction of the usage of signal TSVs. Moreover, a stacked reconfigurable memory architecture is proposed to reduce the fabrication complexity and further reduce the number of signal TSVs by more than 1/2. The reduction of signal TSVs in the entire design is 71.24%. Finally, we address unique issues that occur in electronic design automation (EDA) tools during 3D large-scale integration (LSI) designs. As a result, a 4K ME processor with 7-die stacking 3D system-on-chip design is implemented. The proposed design can support real time 3840 × 2160 @ 120 fps encoding at 130 MHz with less than 540 mW.
The Least-Fixed-Point of Feedback-Loops of Logic Circuits for a Set of Input Strings
Shinji KIMURA Hiromasa HANEDA

PAPER-VLSI Design Technology

Vol:
E72-E No:12
Page(s):
1344-1349
This paper discuses the simulation of logic circuits with feedback loops for a set of input strings. Logic circuits are modeled as Mealy machines which convert an input string set to an output string set. By the simulation, we obtain a set of output strings which shows the input-output relation of the simulated circuit. The behavior of feedback loops of a logic circuit is shown to be the least-fixed-point on a lattice of string sets. The characteristics of a lattice we have used is also shown in the paper.
Exact Minimization of Free BDDs and Its Application to Pass-Transistor Logic Optimization
Kazuyoshi TAKAGI Hiroshi HATAKEDA Shinji KIMURA Katsumasa WATANABE

PAPER

Vol:
E82-A No:11
Page(s):
2407-2413
In several design methods for Pass-transistor Logic (PTL) circuits, Boolean functions are expressed as OBDDs in decomposed form and then the component OBDDs are directly mapped to PTL cells. The total size of OBDDs (number of nodes) corresponds to the circuit size. In this paper, we investigate a method for PTL synthesis based on exact minimization of Free BDDs (FBDDs). FBDDs are well-studied extension of OBDDs with free variable ordering on each path. We present statistics showing that more than 56% of 616126 NPN-equivalence classes of 5-variable Boolean functions have minimum FBDDs with less size than their OBDDs. This result can be used for PTL synthesis as libraries. We also applied the exact minimization algorithm of FBDDs to the minimization of subcircuits in the synthesis for MCNC benchmarks and found up to 5% size reduction.
Timing Verification of Logic Circuits with Combined Delay Model
Shinji KIMURA Shigemi KASHIMA Hiromasa HANEDA

PAPER

Vol:
E75-A No:10
Page(s):
1230-1238
The paper proposes a combined delay model to manipulate the variance of the delay time of logic elements and a new timing verification method based on the theory of regular expressions. With the delay time of logic elements such as TTL SN7400, the minimum delay time (dm), the maximum delay time (dM), and the typical delay time are specified in the manual, and the delay time of an element is one in the interval between dm and dM. Here we assume a discrete time, and we manipulate the variance of the delay time as a set of output strings corresponding to each delay time. We call the model as the combined delay model. Since many output strings are generated with a single input string, the usual timing simulation method cannot be applied. We propose a timing verification method using a behavior extraction method of logic circuits with respect to a time string set: with respect to the specified input set, the method extracts the output string set of each element in the circuit. We devised (1) a mechanism to keep the correspondence between a primary input string and an output string with respect to the primary input string, (2) a mechanism to manipulate the nondeterminism included in the combined delay model, and (3) an event-driven like data compaction method in representing finite automata. We focused on the hazard detection problem and the verification of asynchronous circuits, and show the effectiveness of our method with medium sized circuit with 100 elements or so. The method includes the state explosion, but the data compaction method and the extraction for only the specified input set are useful to control the state explosion.
Multi-Cycle Path Detection Based on Propositional Satisfiability with CNF Simplification Using Adaptive Variable Insertion
Kazuhiro NAKAMURA Shinji MARUOKA Shinji KIMURA Katsumasa WATANABE

PAPER-Test

Vol:
E83-A No:12
Page(s):
2600-2607
Multi-cycle paths are paths between registers where 2 or more clock cycles are allowed to propagate signals, and the detection of multi-cycle paths is important in deciding proper clock period, timing verification and logic optimization. This paper presents a satisfiability-based multi-cycle path detection method, where the detection problems are reduced to CNF formulae and the satisfiability is checked using SAT provers. We also show heuristics on conversion from multi-level circuits into CNF formulae. We have applied our method to ISCAS'89 benchmarks and other sample circuits. Experimental results show the remarkable improvements on the size of manipulatable circuits.
A Selective Scan Chain Reconfiguration through Run-Length Coding for Test Data Compression and Scan Power Reduction
Youhua SHI Shinji KIMURA Masao YANAGISAWA Tatsuo OHTSUKI

PAPER-Test

Vol:
E87-A No:12
Page(s):
3208-3215
Test data volume and power consumption for scan-based designs are two major concerns in system-on-a-chip testing. However, test set compaction by filling the don't-cares will invariably increase the scan-in power dissipation for scan testing, then the goals of test data reduction and low-power scan testing appear to be conflicted. Therefore, in this paper we present a selective scan chain reconfiguration method for test data compression and scan-in power reduction. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. After the scan chain reconfiguration a dictionary is built to indicate the run-length of each compatible class and only the scan-in data for each class should be transferred from the ATE to the CUT so as to reduce test data volume. Experimental results for the larger ISCAS'89 benchmarks show that the proposed approach overcomes the limitations of traditional run-length coding techniques, and leads to highly reduced test data volume with significant power savings during scan testing in all cases.
Automatic Multi-Stage Clock Gating Optimization Using ILP Formulation
Xin MAN Takashi HORIYAMA Shinji KIMURA

PAPER-VLSI Design Technology and CAD

Vol:
E95-A No:8
Page(s):
1347-1358
Clock gating is supported by commercial tools as a power optimization feature based on the guard signal described in HDL (structural method). However, the identification of control signals for gated registers is hard and designer-intensive work. Besides, since the clock gating cells also consume power, it is imperative to minimize the number of inserted clock gating cells and their switching activities for power optimization. In this paper, we propose an automatic multi-stage clock gating algorithm with ILP (Integer Linear Programming) formulation, including clock gating control candidate extraction, constraints construction and optimum control signal selection. By multi-stage clock gating, unnecessary clock pulses to clock gating cells can be avoided by other clock gating cells, so that the switching activity of clock gating cells can be reduced. We find that any multi-stage control signals are also single-stage control signals, and any combination of signals can be selected from single-stage candidates. The proposed method can be applied to 3 or more cascaded stages. The multi-stage clock gating optimization problem is formulated as constraints in LP format for the selection of cascaded clock-gating order of multi-stage candidate combinations, and a commercial ILP solver (IBM CPLEX) is applied to obtain the control signals for each register with minimum switching activity. Those signals are used to generate a gate level description with guarded registers from original design, and a commercial synthesis and layout tools are applied to obtain the circuit with multi-stage clock gating. For a set of benchmark circuits and a Low Density Parity Check (LDPC) Decoder (6.6k gates, 212 F.F.s), the proposed method is applied and actual power consumption is estimated using Synopsys NanoSim after layout. On average, 31% actual power reduction has been obtained compared with original designs with structural clock gating, and more than 10% improvement has been achieved for some circuits compared with single-stage optimization method. CPU time for optimum multi-stage control selection is several seconds for up to 25k variables in LP format. By applying the proposed clock gating, area can also be reduced since the multiplexors controlling register inputs are eliminated.
Coverage Estimation Using Transition Perturbation for Symbolic Model Checking in Hardware Verification
Xingwen XU Shinji KIMURA Kazunari HORIKAWA Takehiko TSUCHIYA

PAPER-Simulation and Verification

Vol:
E89-A No:12
Page(s):
3451-3457
Lack of complete formal specification is one of the major obstacles to the deployment of model checking. Coverage estimation addresses this issue by revealing the unverified part of the design according to the specified properties. In this paper we propose a new transition-based coverage metric to evaluate the completeness of properties for symbolic model checking. Our coverage metric pinpoints the transitions through which the values of signals are checked. An efficient symbolic algorithm is presented for computing the transition coverage for a subset of ACTL. Our coverage estimator has been applied to the model checking of a cache coherence protocol. We uncovered several coverage holes including one that eventually led to the discovery of a design bug.
A Hybrid Dictionary Test Data Compression for Multiscan-Based Designs
Youhua SHI Shinji KIMURA Masao YANAGISAWA Tatsuo OHTSUKI

PAPER-Test

Vol:
E87-A No:12
Page(s):
3193-3199
In this paper, we present a test data compression technique to reduce test data volume for multiscan-based designs. In our method the internal scan chains are divided into equal sized groups and two dictionaries were build to encode either an entire slice or a subset of the slice. Depending on the codeword, the decompressor may load all scan chains or may load only a group of the scan chains, which can enhance the effectiveness of dictionary-based compression. In contrast to previous dictionary coding techniques, even for the CUT with a large number of scan chains, the proposed approach can achieve satisfied reduction in test data volume with a reasonable smaller dictionary. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.
Accelerating HEVC Inter Prediction with Improved Merge Mode Handling
Zhengxue CHENG Heming SUN Dajiang ZHOU Shinji KIMURA

PAPER-VIDEO CODING

Vol:
E100-A No:2
Page(s):
546-554
High Efficiency Video Coding (HEVC/H.265) obtains 50% bit rate reduction than H.264/AVC standard with comparable quality at the cost of high computational complexity. Merge mode is one of the most important new features introduced in HEVC's inter prediction. Merge mode and traditional inter mode consume about 90% of the total encoding time. To address this high complexity, this paper utilizes the merge mode to accelerate inter prediction by four strategies. 1) A merge candidate decision is proposed by the sum of absolute transformed difference (SATD) cost. 2) An early merge termination is presented with more than 90% accuracy. 3) Due to the compensation effect of merge candidates, symmetric motion partition (SMP) mode is disabled for non-8×8 coding units (CUs). 4) A fast coding unit filtering strategy is proposed to reduce the number of CUs which need to be fine-processed. Experimental results demonstrate that our fast strategies can achieve 35.4%-58.7% time reduction with 0.68%-1.96% BD-rate increment in RA case. Compared with similar works, the proposed strategies are not only among the best performing in average-case complexity reduction, but also notably outperforming in the worst cases.
A Built-in Reseeding Technique for LFSR-Based Test Pattern Generation
Youhua SHI Zhe ZHANG Shinji KIMURA Masao YANAGISAWA Tatsuo OHTSUKI

PAPER-Timing Verification and Test Generation

Vol:
E86-A No:12
Page(s):
3056-3062
Reseeding technique is proposed to improve the fault coverage in pseudo-random testing. However most of previous works on reseeding is based on storing the seeds in an external tester or in a ROM. In this paper we present a built-in reseeding technique for LFSR-based test pattern generation. The proposed structure can run both in pseudorandom mode and in reseeding mode. Besides, our method requires no storage for the seeds since in reseeding mode the seeds can be generated automatically in hardware. In this paper we also propose an efficient grouping algorithm based on simulated annealing to optimize test vector grouping. Experimental results for benchmark circuits indicate the superiority of our technique against other reseeding methods with respect to test length and area overhead. Moreover, since the theoretical properties of LFSRs are preserved, our method could be beneficially used in conjunction with any other techniques proposed so far.
Fine-Grained Power Gating Based on the Controlling Value of Logic Elements
Lei CHEN Takashi HORIYAMA Yuichi NAKAMURA Shinji KIMURA

PAPER-Logic Synthesis, Test and Verification

Vol:
E91-A No:12
Page(s):
3531-3538
Leakage power consumption of logic elements has become a serious problem, especially in the sub-100-nanometer process. In this paper, a novel power gating approach by using the controlling value of logic elements is proposed. In the proposed method, sleep signals of the power-gated blocks are extracted completely from the original circuits without any extra logic element. A basic algorithm and a probability-based heuristic algorithm have been developed to implement the basic idea. The steady maximum delay constraint has also been introduced to handle the delay issues. Experiments on the ISCAS'85 benchmarks show that averagely 15-36% of logic elements could be power gated at a time for random input patterns, and 3-31% of elements could be stopped under the steady maximum delay constraints. We also show a power optimization method for AND/OR tree circuits, in which more than 80% of gates can be power-gated.
Look Up Table Compaction Based on Folding of Logic Functions
Shinji KIMURA Atsushi ISHII Takashi HORIYAMA Masaki NAKANISHI Hirotsugu KAJIHARA Katsumasa WATANABE

PAPER-Logic Synthesis

Vol:
E85-A No:12
Page(s):
2701-2707
The paper describes the folding method of logic functions to reduce the size of memories to keep the functions. The folding is based on the relation of fractions of logic functions. If the logic function includes 2 or 3 same parts, then only one part should be kept and other parts can be omitted. We show that the logic function of 1-bit addition can be reduced to half size using the bit-wise NOT relation and the bit-wise OR relation. The paper also introduces 3-1 LUT's with the folding mechanism. A full adder can be implemented using only one 3-1 LUT with the folding. Multi-bit AND and OR operations can be mapped to our LUT's not using the extra cascading circuit but using the carry circuit for addition. We have also tested the mapping capability of 4 input functions to our 3-1 LUT's with folding and carry propagation mechanisms. We have shown the reduction of the area consumption when using our LUT's compared to the case using 4-1 LUT's on several benchmark circuits.

1-20hit(43hit)

Author Search Result

[Author] Shinji KIMURA(43hit)

Dual-Stage Pseudo Power Gating with Advanced Clustering Algorithm for Gate Level Power Optimization

Hardware Synthesis from C Programs with Estimation of Bit Length of Variables

Power Optimization of Sequential Circuits Using Switching Activity Based Clock Gating

Bit Length Optimization of Fractional Part on Floating to Fixed Point Conversion for High-Level Synthesis

Approximate FPGA-Based Multipliers Using Carry-Inexact Elementary Modules

Robust Heuristics for Multi-Level Logic Simplification Considering Local Circuit Structure

Finite Input-Memory Automaton Based Checker Synthesis of SystemVerilog Assertions for FPGA Prototyping

A 7-Die 3D Stacked 3840×2160@120 fps Motion Estimation Processor

The Least-Fixed-Point of Feedback-Loops of Logic Circuits for a Set of Input Strings

Exact Minimization of Free BDDs and Its Application to Pass-Transistor Logic Optimization

Timing Verification of Logic Circuits with Combined Delay Model

Multi-Cycle Path Detection Based on Propositional Satisfiability with CNF Simplification Using Adaptive Variable Insertion

A Selective Scan Chain Reconfiguration through Run-Length Coding for Test Data Compression and Scan Power Reduction

Automatic Multi-Stage Clock Gating Optimization Using ILP Formulation

Coverage Estimation Using Transition Perturbation for Symbolic Model Checking in Hardware Verification

A Hybrid Dictionary Test Data Compression for Multiscan-Based Designs

Accelerating HEVC Inter Prediction with Improved Merge Mode Handling

A Built-in Reseeding Technique for LFSR-Based Test Pattern Generation

Fine-Grained Power Gating Based on the Controlling Value of Logic Elements

Look Up Table Compaction Based on Folding of Logic Functions

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles