The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] register(102hit)

41-60hit(102hit)

  • Generalized Feed Forward Shift Registers and Their Application to Secure Scan Design

    Katsuya FUJIWARA  Hideo FUJIWARA  

     
    PAPER-Dependable Computing

      Vol:
    E96-D No:5
      Page(s):
    1125-1133

    In this paper, we introduce generalized feed-forward shift registers (GF2SR) to apply them to secure and testable scan design. Previously, we introduced SR-equivalents and SR-quasi-equivalents which can be used in secure and testable scan design, and showed that inversion-inserted linear feed-forward shift registers (I2LF2SR) are useful circuits for the secure and testable scan design. GF2SR is an extension of I2LF2SR and the class is much wider than that of I2LF2SR. Since the cardinality of the class of GF2SR is much larger than that of I2LF2SR, the security level of scan design with GF2SR is much higher than that of I2LF2SR. We consider how to control/observe GF2SR to guarantee easy scan-in/out operations, i.e., state-justification and state-identification problems are considered. Both scan-in and scan-out operations can be overlapped in the same way as the conventional scan testing, and hence the test sequence for the proposed scan design is of the same length as the conventional scan design. A program called WAGSR (Web Application for Generalized feed-forward Shift Registers) is presented to solve those problems.

  • Register Indirect Jump Target Forwarding

    Ryota SHIOYA  Naruki KURATA  Takashi TOYOSHIMA  Masahiro GOSHIMA  Shuichi SAKAI  

     
    PAPER-Computer System

      Vol:
    E96-D No:2
      Page(s):
    278-288

    Object-oriented languages have recently become common, making register indirect jumps more important than ever. In object-oriented languages, virtual functions are heavily used because they improve programming productivity greatly. Virtual function calls usually consist of register indirect jumps, and consequently, programs written in object-oriented languages contain many register indirect jumps. The prediction of the targets of register indirect jumps is more difficult than the prediction of the direction of conditional branches. Many predictors have been proposed for register indirect jumps, but they cannot predict the jump targets with high accuracy or require very complex hardware. We propose a method that resolves jump targets by forwarding execution results. Our proposal dynamically finds the producers of register indirect jumps in virtual function calls. After the execution of the producers, the execution results are forwarded to the processor's front-end. The jump targets can be resolved by the forwarded execution results without requiring prediction. Our proposal improves the performance of programs that include unpredictable register indirect jumps, because it does not rely on prediction but instead uses actual execution results. Our evaluation shows that the IPC improvement using our proposal is as high as 5.4% on average and 9.8% at maximum.

  • A Novel Change Detection Method for Unregistered Optical Satellite Images

    Wang LUO  Hongliang LI  Guanghui LIU  Guan GUI  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E95-B No:5
      Page(s):
    1890-1893

    In this letter, we propose a novel method for change detection in multitemporal optical satellite images. Unlike the tradition methods, the proposed method is able to detect changed region even from unregistered images. In order to obtain the change detection map from the unregistered images, we first compute the sum of the color difference (SCD) of a pixel to all pixels in an input image. Then we calculate the SCD of this pixel to all pixels in the other input image. Finally, we use the difference of the two SCDs to represent the change detection map. Experiments on the multitemporal images demonstrates the good performance of the proposed method on the unregistered images.

  • Performance-Driven Architectural Synthesis for Distributed Register-File Microarchitecture with Inter-Island Delay

    Juinn-Dar HUANG  Chia-I CHEN  Wan-Ling HSU  Yen-Ting LIN  Jing-Yang JOU  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:2
      Page(s):
    559-566

    In deep-submicron era, wire delay is becoming a bottleneck while pursuing higher system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this article, we propose the distributed register-file microarchitecture with inter-island delay (DRFM-IID). Though DRFM-IID is also one of the DR-based architectures, it is considered more practical than the previously proposed DRFM, in terms of delay model. With such delay consideration, the synthesis task is inherently more complicated than the one without inter-island delay concern since uncertain interconnect latency is very likely to seriously impact on the whole system performance. Therefore we also develop a performance-driven architectural synthesis framework targeting DRFM-IID. Several factors for evaluating the quality of results, such as number of inter-island transfers, timing-criticality of transfer, and resource utilization balancing, are adopted as the guidance while performing architectural synthesis for better optimization outcomes. The experimental results show that the latency and the number of inter-cluster transfers can be reduced by 26.9% and 37.5% on average; and the latter is commonly regarded as an indicator for power consumption of on-chip communication.

  • Variation-Tolerance of a 65-nm Error-Hardened Dual-Modular-Redundancy Flip-Flop Measured by Shift-Register-Based Monitor Structures

    Chikara HAMANAKA  Ryosuke YAMAMOTO  Jun FURUTA  Kanto KUBOTA  Kazutoshi KOBAYASHI  Hidetoshi ONODERA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E94-A No:12
      Page(s):
    2669-2675

    We show measurement results of variation-tolerance of an error-hardened dual-modular-redundancy flip-flop fabricated in a 65-nm process. The proposed error-hardened FF called BCDMR is very strong against soft errors and also robust to process variations. We propose a shift-register-based test structure to measure variations. The proposed test structure has features of constant pin count and fast measurement time. A 65 nm chip was fabricated including 40k FFs to measure variations. The variations of the proposed BCDMR FF are 74% and 55% smaller than those of the conventional BISER FF on the twin-well and triple-well structures respectively.

  • Differential Behavior Equivalent Classes of Shift Register Equivalents for Secure and Testable Scan Design

    Katsuya FUJIWARA  Hideo FUJIWARA  Hideo TAMAMOTO  

     
    PAPER-Dependable Computing

      Vol:
    E94-D No:7
      Page(s):
    1430-1439

    It is important to find an efficient design-for-testability methodology that satisfies both security and testability, although there exists an inherent contradiction between security and testability for digital circuits. In our previous work, we reported a secure and testable scan design approach by using extended shift registers that are functionally equivalent but not structurally equivalent to shift registers, and showed a security level by clarifying the cardinality of those classes of shift register equivalents (SR-equivalents). However, SR-equivalents are not always secure for scan-based side-channel attacks. In this paper, we consider a scan-based differential-behavior attack and propose several classes of SR-equivalent scan circuits using dummy flip-flops in order to protect the scan-based differential-behavior attack. To show the security level of those SR-equivalent scan circuits, we introduce a differential-behavior equivalent relation and clarify the number of SR-equivalent scan circuits, the number of differential-behavior equivalent classes and the cardinality of those equivalent classes.

  • Short Term Cell-Flipping Technique for Mitigating SNM Degradation Due to NBTI

    Yuji KUNITAKE  Toshinori SATO  Hiroto YASUURA  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    520-529

    Negative Bias Temperature Instability (NBTI) is one of the major reliability problems in advanced technologies. NBTI causes threshold voltage shift in a PMOS transistor. When the PMOS transistor is biased to negative voltage, threshold voltage shifts to negatively. On the other hand, the threshold voltage recovers if the PMOS transistor is positively biased. In an SRAM cell, due to NBTI, threshold voltage degrades in the load PMOS transistors. The degradation has the impact on Static Noise Margin (SNM), which is a measure of read stability of a 6-T SRAM cell. In this paper, we discuss the relationship between NBTI degradation in an SRAM cell and the dynamic stress and recovery condition. There are two important characteristics. One is a stress probability, which is defined as the rate that the PMOS transistor is negatively biased. The other is a stress and recovery cycle, which is defined as the switching interval of an SRAM value. In our observations, in order to mitigate the NBTI degradation, the stress probability should be small and the stress and recovery cycle should be shorter than 10 msec. Based on the observations, we propose a novel cell-flipping technique, which makes the stress probability close to 50%. In addition, we show results of the case studies, which apply the cell-flipping technique to register file and cache memories.

  • Backward-Data-Direction Clocking and Relevant Optimal Register Assignment in Datapath Synthesis

    Keisuke INOUE  Mineo KANEKO  Tsuyoshi IWAGAKI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E94-A No:4
      Page(s):
    1067-1081

    For recent and future nanometer-technology VLSIs, static and dynamic delay variations become a serious problem. In many cases, the hold timing constraint, as well as the setup timing constraint, becomes critical for latching a correct signal under delay variations. While the timing violation due to the fail of the setup timing constraint can be fixed by tuning a clock frequency or using a delayed latch, the timing violation due to the fail of the hold timing constraint cannot be fixed by those methods in general. Our approach to delay variations (in particular, the hold timing constraint) proposed in this paper is a novel register assignment strategy in high-level synthesis, which guarantees safe clocking by Backward-Data-Direction (BDD) clocking. One of the drawbacks of the proposed register assignment is the increase in the number of required registers. After the formulation of this new register minimization problem, we prove NP-hardness of the problem, and then derive an integer linear programming formulation for the problem. The proposed method receives a scheduled data flow graph, and generates a datapath having (1) robustness against delay variations, which is ensured by BDD-based register assignment, and (2) the minimum possible number of registers. Experimental results show the effectiveness of the proposed method for some benchmark circuits.

  • Communication Synthesis for Interconnect Minimization Targeting Distributed Register-File Microarchitecture

    Juinn-Dar HUANG  Chia-I CHEN  Yen-Ting LIN  Wan-Ling HSU  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E94-A No:4
      Page(s):
    1151-1155

    In deep-submicron era, wire delay is becoming a bottleneck while pursuing even higher system clock speed. Several distributed register (DR) architectures have been proposed to cope with this problem by keeping most wires local. In this article, we propose a new resource-constrained communication synthesis algorithm for optimizing both inter-island connections (IICs) and latency targeting on distributed register-file microarchitecture (DRFM). The experimental results show that up to 24.7% and 12.7% reduction on IIC and latency can be achieved respectively as compared to the previous work.

  • Register File Size Reduction through Instruction Pre-Execution Incorporating Value Prediction

    Yusuke TANAKA  Hideki ANDO  

     
    PAPER-Computer System

      Vol:
    E93-D No:12
      Page(s):
    3294-3305

    Two-step physical register deallocation (TSD) is an architectural scheme that enhances memory-level parallelism (MLP) by pre-executing instructions. Ideally, TSD allows exploitation of MLP under an unlimited number of physical registers, and consequently only a small register file is needed for MLP. In practice, however, the amount of MLP exploitable is limited, because there are cases where either 1) pre-execution is not performed; or 2) the timing of pre-execution is delayed. Both are due to data dependencies among the pre-executed instructions. This paper proposes the use of value prediction to solve these problems. This paper proposes the use of value prediction to solve these problems. Evaluation results using the SPECfp2000 benchmark confirm that the proposed scheme with value prediction for predicting addresses achieves equivalent IPC, with a smaller register file, to the previous TSD scheme. The reduction rate of the register file size is 21%.

  • Multi-Stage Threshold Decoding for Self-Orthogonal Convolutional Codes

    Muhammad AHSAN ULLAH  Kazuma OKADA  Haruo OGIWARA  

     
    PAPER-Coding Theory

      Vol:
    E93-A No:11
      Page(s):
    1932-1941

    This paper describes a least complex, high speed decoding method named multi-stage threshold decoding (MTD-DR). Each stage of MTD-DR is formed by the traditional threshold decoder with a special shift register, called difference register (DR). After flipping each information bit, DR helps to shorten the Hamming and the Euclidian distance between a received word and the decoded codeword for hard and soft decoding, respectively. However, the MTD-DR with self-orthogonal convolutional codes (SOCCs), type 1 in this paper, makes an unavoidable error group, which depends on the tap connection patterns in the encoder, and limits the error performance. This paper introduces a class of SOCCs type 2 which can breakdown that error group, as a result, MTD-DR gives better error performance. For a shorter code (code length = 4200), hard and soft decoding MTD-DR achieves 4.7 dB and 6.5 dB coding gain over the additive white Gaussian noise (AWGN) channel at the bit error rate (BER) 10-5, respectively. In addition, hard and soft decoding MTD-DR for a longer code (code length = 80000) give 5.3 dB and 7.1 dB coding gain under the same condition, respectively. The hard and the soft decoding MTD-DR experiences error flooring at high Eb/N0 region. For improving overall error performance of MTD-DR, this paper proposes parity check codes concatenation with soft decoding MTD-DR as well.

  • On Feedback Functions of Maximum Length Nonlinear Feedback Shift Registers

    Çağdaş ÇALIK  Meltem SÖNMEZ TURAN  Ferruh ÖZBUDAK  

     
    PAPER-Cryptography and Information Security

      Vol:
    E93-A No:6
      Page(s):
    1226-1231

    Feedback shift registers are basic building blocks for many cryptographic primitives. Due to the insecurities of Linear Feedback Shift Register (LFSR) based systems, the use of Nonlinear Feedback Shift Registers (NFSRs) became more popular. In this work, we study the feedback functions of NFSRs with period 2n. First, we provide two new necessary conditions for feedback functions to be maximum length. Then, we consider NFSRs with k-monomial feedback functions and focus on two extreme cases where k=4 and k=2n-1. We study construction methods for these special cases.

  • Floorplan-Aware High-Level Synthesis for Generalized Distributed-Register Architectures

    Akira OHCHI  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E92-A No:12
      Page(s):
    3169-3179

    As device feature size decreases, interconnection delay becomes the dominating factor of circuit total delay. Distributed-register architectures can reduce the influence of interconnection delay. They may, however, increase circuit area because they require many local registers. Moreover original distributed-register architectures do not consider control signal delay, which may be the bottleneck in a circuit. In this paper, we propose a high-level synthesis method targeting generalized distributed-register architecture in which we introduce shared/local registers and global/local controllers. Our method is based on iterative improvement of scheduling/binding and floorplanning. First, we prepare shared-register groups with global controllers, each of which corresponds to a single functional unit. As iterations proceed, we use local registers and local controllers for functional units on a critical path. Shared-register groups physically located close to each other are merged into a single group. Accordingly, global controllers are merged. Finally, our method obtains a generalized distributed-register architecture where its scheduling/binding as well as floorplanning are simultaneously optimized. Experimental results show that the area is decreased by 4.7% while maintaining the performance of the circuit equal with that using original distributed-register architectures.

  • Optimal Register Assignment with Minimum-Path Delay Compensation for Variation-Aware Datapaths

    Keisuke INOUE  Mineo KANEKO  Tsuyoshi IWAGAKI  

     
    PAPER

      Vol:
    E92-A No:4
      Page(s):
    1096-1105

    For recent and future nanometer-technology VLSIs, static and dynamic delay variations become a serious problem. In many cases, the hold constraint, as well as the setup constraint, becomes critical for latching a correct signal under delay variations. This paper treats the hold constraint in a datapath circuit, and discusses a register assignment in high level synthesis considering delay variations. Our approach to ensure the hold constraint under delay variations is to enlarge the minimum-path delay between registers, which is called minimum-path delay compensation (MDC) in this paper. MDC can be done by inserting delay elements mainly in non-critical paths of a functional unit (FU). One of our contributions is to show that the minimization of the number of minimum-path delay compensated FUs is NP-hard in general, and it is in the class P if the number of FUs is a constant. A polynomial time algorithm for the latter is also shown in this paper. In addition, an integer linear programming (ILP) formulation is also presented. The proposed method generates a datapath having (1) robustness against delay variations, which is ensured partly by MDC technique and partly by SRV-based register assignment, and (2) the minimum possible numbers of MDCs and registers.

  • A Fast Gate-Level Register Relocation Method for Circuit Size Reduction in General-Synchronous Framework

    Yukihide KOHIRA  Atsushi TAKAHASHI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:10
      Page(s):
    3030-3037

    Under the assumption that the clock can be inputted to each register at an arbitrary timing, the minimum feasible clock period might be reduced by register relocation while maintaining the circuit behavior and topology. However, if the minimum feasible clock period is reduced, then the number of registers tends to be increased. In this paper, we propose a gate-level register relocation method that reduces the number of registers while keeping the target clock period. In experiments, the proposed method reduces the number of registers in the practical time in most circuits.

  • Novel Register Sharing in Datapath for Structural Robustness against Delay Variation

    Keisuke INOUE  Mineo KANEKO  Tsuyoshi IWAGAKI  

     
    PAPER

      Vol:
    E91-A No:4
      Page(s):
    1044-1053

    As the feature size of VLSI becomes smaller, delay variations become a serious problem in VLSI. In this paper, we propose a novel class of robustness for a datapath against delay variations, which is named structural robustness against delay variation (SRV), and propose sufficient conditions for a datapath to have SRV. A resultant circuit designed under these conditions has a larger timing margin to delay variations than previous designs without sacrificing effective computation time. In addition, under any degree of delay variations, we can always find an available clock frequency for a datapath having SRV property to operate correctly, which could be a preferable characteristic in IP-based design.

  • Clock Driver Design for Low-Power High-Speed 90-nm CMOS Register Array

    Tadayoshi ENOMOTO  Suguru NAGAYAMA  Hiroaki SHIKANO  Yousuke HAGIWARA  

     
    PAPER

      Vol:
    E91-C No:4
      Page(s):
    553-561

    The delay time (tdT), power dissipation (PT) and circuit volume of a CMOS register array were minimized. Seven test circuits, each of which had a register array and a single clock tree that generated a pair of complement clock pulses, and a conventional register were fabricated using 90-nm CMOS technology. The register array was constructed with M delay flip-flops (FFs) and the clock tree, which consisted of 2 driver stages. Each driver stage had m inverters, each of which drove M/m FFs where M was fixed at 40 and m varied from 1 to 40. The minimum values of tdT and PT were 0.25 ns and 17.88 µW, respectively, and were both obtained when m was 10. These values were 71.4% and 70.4% of tdT and PT for the conventional register, for which m is 40, respectively. The number of inverters in the clock tree when m was 10 was 21 which was only 25.9% that for the conventional register. The measured results agreed well with SPICE-simulated results. Furthermore, for values of M from 20 to 320, both the minimum tdT and the minimum PT were obtained when m was approximately 1.5 times the square root of M.

  • Low Power LDPC Code Decoder Architecture Based on Intermediate Message Compression Technique

    Kazunori SHIMIZU  Nozomu TOGAWA  Takeshi IKENAGA  Satoshi GOTO  

     
    PAPER

      Vol:
    E91-A No:4
      Page(s):
    1054-1061

    Reducing the power dissipation for LDPC code decoder is a major challenging task to apply it to the practical digital communication systems. In this paper, we propose a low power LDPC code decoder architecture based on an intermediate message-compression technique which features as follows: (i) An intermediate message compression technique enables the decoder to reduce the required memory capacity and write power dissipation. (ii) A clock gated shift register based intermediate message memory architecture enables the decoder to decompress the compressed messages in a single clock cycle while reducing the read power dissipation. The combination of the above two techniques enables the decoder to reduce the power dissipation while keeping the decoding throughput. The simulation results show that the proposed architecture improves the power efficiency up to 52% and 18% compared to that of the decoder based on the overlapped schedule and the rapid convergence schedule without the proposed techniques respectively.

  • Low Power Gated Clock Tree Driven Placement

    Weixiang SHEN  Yici CAI  Xianlong HONG  Jiang HU  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:2
      Page(s):
    595-603

    As power consumption of the clock tree dominates over 40% of the total power in modern high performance VLSI designs, measures must be taken to keep it under control. One of the most effective methods is based on clock gating to shut off the clock when the modules are idle. However, previous works on gated clock tree power minimization are mostly focused on clock routing and the improvements are often limited by the given registers placement. The purpose of this work is to navigate the registers during placement to further reduce the clock tree power based on clock gating. Our method performs activity-aware register clustering that reduces the clock tree power not only by clumping the registers into a smaller area, but also by pulling the registers with the similar activity patterns closely to shut off the clock more time for the resultant subtrees. In order to reduce the impact of signal nets wirelength and power due to register clustering, we apply the timing and activity based net weighting in [14], which reduces the nets switching power by assigning a combination of activity and timing weights to the nets with higher switching rates or more critical timing. To tradeoff the power dissipated by the clock tree and the control signal, we extend the idea of local ungating in [6] and propose an algorithm of gate control signal optimization, which still sets the gate enable signal high if a register is active for a number of consecutive clock cycles. Experimental results on a set of MCNC benchmarks show that our approach is able to reduce the power and total wirelength of clock tree greatly with minimal overheads.

  • Maximal-Period Sequences Generated by Feedback-Limited Nonlinear Shift Registers

    Akio TSUNEDA  Kunihiko KUDO  Daisaburo YOSHIOKA  Takahiro INOUE  

     
    PAPER-Communications and Sequences

      Vol:
    E90-A No:10
      Page(s):
    2079-2084

    We propose feedback-limited NFSRs (nonlinear feedback shift registers) which can generate periodic sequences of period 2k-1, where k is the length of the register. We investigate some characteristics of such periodic sequences. It is also shown that the scale of such NFSRs can be reduced by the feedback limitation. Some simulation and experimental results are shown including comparison with LFSRs (linear feedback shift registers) for conventional M-sequences and Gold sequences.

41-60hit(102hit)