The search functionality is under construction.

Keyword Search Result

[Keyword] dual-rail(19hit)

1-19hit
  • A SOI Cache-Tag Memory with Dual-Rail Wordline Scheme

    Nobutaro SHIBATA  Takako ISHIHARA  

     
    PAPER-Integrated Electronics

      Vol:
    E99-C No:2
      Page(s):
    316-330

    Cache memories are the major application of high-speed SRAMs, and they are frequently installed in high performance logic VLSIs including microprocessors. This paper presents a 4-way set-associative, SOI cache-tag memory. To obtain higher operating speed with less power dissipation, we devised an I/O-separated memory cell with a dual-rail wordline, which is used to transmit complementary selection signals. The address decoding delay was shortened using CMOS dual-rail logic. To enhance the maximum operating frequency, bitline's recovery operations after writing data were eliminated using a memory array configuration without half-selected cells. Moreover, conventional, sensitive but slow differential amplifiers were successfully removed from the data I/O circuitry with a hierarchical bitline scheme. As regards the stored data management, we devised a new hardware-oriented LRU-data replacement algorithm on the basis of 6-bit directed graph. With the experimental results obtained with a test chip fabricated with a 0.25-µm CMOS/SIMOX process, the core of the cache-tag memory with a 1024-set configuration can achieve a 1.5-ns address access time under typical conditions of a 2-V power supply and 25°C. The power dissipation during standby was less than 14 µW, and that at the 500-MHz operation was 13-83 mW, depending on the bit-stream data pattern.

  • High-Throughput Partially Parallel Inter-Chip Link Architecture for Asynchronous Multi-Chip NoCs

    Naoya ONIZAWA  Akira MOCHIZUKI  Hirokatsu SHIRAHAMA  Masashi IMAI  Tomohiro YONEDA  Takahiro HANYU  

     
    PAPER-Dependable Computing

      Vol:
    E97-D No:6
      Page(s):
    1546-1556

    This paper introduces a partially parallel inter-chip link architecture for asynchronous multi-chip Network-on-Chips (NoCs). The multi-chip NoCs that operate as a large NoC have been recently proposed for very large systems, such as automotive applications. Inter-chip links are key elements to realize high-performance multi-chip NoCs using a limited number of I/Os. The proposed asynchronous link based on level-encoded dual-rail (LEDR) encoding transmits several bits in parallel that are received by detecting the phase information of the LEDR signals at each serial link. It employs a burst-mode data transmission that eliminates a per-bit handshake for a high-speed operation, but the elimination may cause data-transmission errors due to cross-talk and power-supply noises. For triggering data retransmission, errors are detected from the embedded phase information; error-detection codes are not used. The throughput is theoretically modelled and is optimized by considering the bit-error rate (BER) of the link. Using delay parameters estimated for a 0.13 µm CMOS technology, the throughput of 8.82 Gbps is achieved by using 10 I/Os, which is 90.5% higher than that of a link using 9 I/Os without an error-detection method operating under negligible low BER (<10-20).

  • Data Convertors Design for Optimization of the DDPL Family

    Song JIA  Li LIU  Xiayu LI  Fengfeng WU  Yuan WANG  Ganggang ZHANG  

     
    PAPER-Electronic Circuits

      Vol:
    E96-C No:9
      Page(s):
    1195-1200

    Information security has been seriously threatened by the differential power analysis (DPA). Delay-based dual-rail precharge logic (DDPL) is an effective solution to resist these attacks. However, conventional DDPL convertors have some shortcomings. In this paper, we propose improved convertor pairs based on dynamic logic and a sense amplifier (SA). Compared with the reference CMOS-to-DDPL convertor, our scheme could save 69% power consumption. As to the comparison of DDPL-to-CMOS convertor, the speed and power performances could be improved by 39% and 54%, respectively.

  • Design of High-Performance Asynchronous Pipeline Using Synchronizing Logic Gates

    Zhengfan XIA  Shota ISHIHARA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E95-C No:8
      Page(s):
    1434-1443

    This paper introduces a novel design method of an asynchronous pipeline based on dual-rail dynamic logic. The overhead of handshake control logic is greatly reduced by constructing a reliable critical datapath, which offers the pipeline high throughput as well as low power consumption. Synchronizing Logic Gates (SLGs), which have no data dependency problem, are used in the design to construct the reliable critical datapath. The design targets latch-free and extremely fine-grain or gate-level pipeline, where the depth of every pipeline stage is only one dual-rail dynamic logic. HSPICE simulation results, in a 65 nm design technology, indicate that the proposed design increases the throughput by 120% and decreases the power consumption by 54% compared with PS0, a classic dual-rail asynchronous pipeline implementation style, in 4-bit wide FIFOs. Moreover, this method is applied to design an array style multiplier. It shows that the proposed design reduces power by 37.9% compared to classic synchronous design when the workloads are 55%. A chip has been fabricated with a 44 multiplier function, which works well at 2.16G data-set/s (Post-layout simulation).

  • Implementation of a Low-Power FPGA Based on Synchronous/Asynchronous Hybrid Architecture

    Shota ISHIHARA  Ryoto TSUCHIYA  Yoshiya KOMATSU  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Electronic Circuits

      Vol:
    E94-C No:10
      Page(s):
    1669-1679

    This paper presents a low-power FPGA based on mixed synchronous/asynchronous design. The proposed FPGA consists of several sections which consist of logic blocks, and each section can be used as either a synchronous circuit or an asynchronous circuit according to its workload. An asynchronous circuit is power-efficient for a low-workload section since it does not require the clock tree which always consumes the power. On the other hand, a synchronous circuit is power-efficient for a high-workload section because of its simple hardware. The major consideration is designing an area-efficient synchronous/asynchronous hybrid logic block. This is because the hardware amount of the asynchronous circuit is about double that of the synchronous circuit, and the typical implementation wastes half of the hardware in synchronous mode. To solve this problem, we propose a hybrid logic block that can be used as either a single asynchronous logic block or two synchronous logic blocks. The proposed FPGA is fabricated using a 65-nm CMOS process. When the workload of a section is below 22%, asynchronous mode is more power-efficient than synchronous mode. Otherwise synchronous mode is more power-efficient.

  • An Asynchronous FPGA Based on LEDR/4-Phase-Dual-Rail Hybrid Architecture

    Shota ISHIHARA  Yoshiya KOMATSU  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Electronic Circuits

      Vol:
    E93-C No:8
      Page(s):
    1338-1348

    This paper presents an asynchronous FPGA that combines 4-phase dual-rail encoding and LEDR (Level-Encoded Dual-Rail) encoding. 4-phase dual-rail encoding is employed to achieve small area and low power for function units, while LEDR encoding is employed to achieve high throughput and low power for the data transfer using programmable interconnection resources. Area-efficient protocol converters and their control circuits are also proposed in transistor-level implementation. The proposed FPGA is designed using the e-Shuttle 65nm CMOS process. Compared to the 4-phase-dual-rail-based FPGA, the throughput is increased by 69% with almost the same transistor count. Compared to the LEDR-based FPGA, the transistor count is reduced by 47% with almost the same throughput. In terms of power consumption, the proposed FPGA achieves the lowest power compared to the 4-phase-dual-rail-based and the LEDR-based FPGAs. Compared to the synchronous FPGA, the proposed FPGA has lower power consumption when the workload is below 35%.

  • Evaluation of a Field-Programmable VLSI Based on an Asynchronous Bit-Serial Architecture

    Masanori HARIYAMA  Shota ISHIHARA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E91-C No:9
      Page(s):
    1419-1426

    This paper presents a novel asynchronous architecture of Field-programmable gate arrays (FPGAs) to reduce the power consumption. In the dynamic power consumption of the conventional FPGAs, the power consumed by the switch blocks and clock distribution is dominant since FPGAs have complex switch blocks and the large number of registers for high programmability. To reduce the power consumption of switch blocks and clock distribution, asynchronous bit-serial architecture is proposed. To ensure the correct operation independent of data-path lengths, we use the level-encoded dual-rail encoding and propose its area-efficient implementation. The proposed field-programmable VLSI is implemented in a 90 nm CMOS technology. The delay and the power consumption of the proposed FPVLSI are respectively 61% and 58% of those of 4-phase dual-rail encoding which is the most common encoding in delay insensitive encoding.

  • Power-Aware Asynchronous Peer-to-Peer Duplex Communication System Based on Multiple-Valued One-Phase Signaling

    Kazuyasu MIZUSAWA  Naoya ONIZAWA  Takahiro HANYU  

     
    PAPER

      Vol:
    E91-C No:4
      Page(s):
    581-588

    This paper presents a design of an asynchronous peer-to-peer half-duplex/full-duplex-selectable data-transfer system on-chip interconnected. The data-transfer method between channels is based on a 1-phase signaling scheme realized by using multiple-valued current-mode (MVCM) circuits and encoding, which performs high-speed communication. A data transmission is selectable by adding a mode-detection circuit that observes data-transmission modes; full-duplex, half-duplex and standby modes. Especially, since current sources are completely cut off during the standby mode, the power dissipation can be greatly reduced. Moreover, both half-duplex and full-duplex communication can be realized by sharing a common circuit except a signal-level conversion circuit. The proposed interface is implemented using 0.18-µm CMOS, and its performance improvement is discussed in comparison with those of the other ordinary asynchronous methods.

  • An Analysis of Leakage Factors for Dual-Rail Pre-Charge Logic Style

    Daisuke SUZUKI  Minoru SAEKI  

     
    PAPER-Side Channel Attacks

      Vol:
    E91-A No:1
      Page(s):
    184-192

    In recent years, certain countermeasures against differential power analysis (DPA) at the logic level have been proposed. Recently, Popp and Mangard proposed a new countermeasure-masked dual-rail pre-charge logic (MDPL); this countermeasure combines dual-rail circuits with random masking to improve the wave dynamic differential logic (WDDL). They claimed that it could implement secure circuits using a standard CMOS cell library without special constraints for the place-and-route method because the difference between the loading capacitances of all the pairs of complementary logic gates in MDPL can be compensated for by the random masking. In this paper, we particularly focus on the signal transition of MDPL gates and evaluate the DPA-resistance of MDPL in detail. Our evaluation results reveal that when the input signals have different delay times, leakage occurs in the MDPL as well as WDDL gates, even if MDPL is effective in reducing the leakage caused by the difference in loading capacitances. Furthermore, in order to validate our evaluation, we demonstrate a problem with different input signal delays by conducting measurements for an FPGA.

  • Security Evaluations of MRSL and DRSL Considering Signal Delays

    Minoru SAEKI  Daisuke SUZUKI  

     
    PAPER-Side Channel Attacks

      Vol:
    E91-A No:1
      Page(s):
    176-183

    In recent years, some countermeasures have been proposed against differential power analysis (DPA) at the basic composition element level of logic circuits. We propose a countermeasure named random switching logic (RSL). RSL involves computation with data masking using a single logic gate and suppression of transient transitions using ENABLE signals generated independently of input data. Recently, some countermeasures that were proposed against DPA, such as MRSL and DRSL, adopted the concept of RSL. Although MRSL is based on RSL, it uses a different method to suppress the transient transitions. DRSL uses RSL to avoid the possibility of leakage caused by a difference in delays occurring in MDPL that combines dual-rail circuits with random masking. The important difference between these countermeasures and RSL is that they can vary the output transition timing depending on the input data patterns. In this paper, we focus on this feature to evaluate the DPA resistance of MRSL and DRSL. Experiments are also conducted on DPA resistance by using an FPGA to verify the evaluation results. It is confirmed that in both MRSL and DRSL, there is a possibility of leakage if a sufficient difference in delays exists in input signals.

  • Logic Synthesis Method for Dual-Rail RSFQ Digital Circuits Using Root-Shared Binary Decision Diagrams

    Koji OBATA  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E90-A No:1
      Page(s):
    257-266

    We propose a new method of logic synthesis for dual-rail RSFQ (rapid single-flux-quantum) digital circuits. RSFQ circuit technology is one of the strongest candidates for the next generation technology of digital circuits. For representing logic functions, we use a root-shared binary decision diagram (RSBDD) which is a directed acyclic graph constructed from binary decision diagrams. In the method, first we construct an RSBDD from given logic functions, and then reduce the number of nodes in the constructed RSBDD by variable re-ordering. Finally, we synthesize a dual-rail RSFQ circuit from the reduced RSBDD. We have implemented the method and have synthesized benchmark circuits. We have synthesized dual-rail circuits that consist of about 27% fewer logic elements than those synthesized by a Transduction-based method on average.

  • Design Method of High Performance and Low Power Functional Units Considering Delay Variations

    Kouichi WATANABE  Masashi IMAI  Masaaki KONDO  Hiroshi NAKAMURA  Takashi NANYA  

     
    PAPER-Circuit Synthesis

      Vol:
    E89-A No:12
      Page(s):
    3519-3528

    As VLSI technology advances, delay variations will become more serious. Delay-insensitive asynchronous dual-rail circuits tolerate any delay variation, but their energy consumption is more than double that of the single-rail circuits because signal transitions occur every cycle in all bits regardless of the input bit pattern. However, in functional units, a significant number of input bits may not change from the previous input in many cases. In such a situation, calculation of these bits is not required. Thus, we propose a method, called unflip-bits control, makes use of the above situation, to reduce energy consumption. We evaluate the energy consumption and performance penalty for the method using HSPICE and the verilog-XL simulator, and compare the method with the conventional dual-rail circuit and a synchronous circuit. Our evaluation results reveal that the proposed asynchronous dual-rail circuit has a 12-60% lower energy consumption compared with a conventional asynchronous dual-rail circuit.

  • Implementation of a High-Speed Asynchronous Data-Transfer Chip Based on Multiple-Valued Current-Signal Multiplexing

    Tomohiro TAKAHASHI  Takahiro HANYU  

     
    PAPER

      Vol:
    E89-C No:11
      Page(s):
    1598-1604

    This paper presents an asynchronous multiple-valued current-mode data-transfer controller chip based on a 1-phase dual-rail encoding technique. The proposed encoding technique enables "one-way delay" asynchronous data transfer because request and acknowledge signals can be transmitted simultaneously and valid states are detected by calculating the sum of dual-rail codewords. Since a key component, a current-to-voltage conversion circuit in a valid-state detector, is tuned so as to obtain a sufficient voltage range to improve switching speed of a comparator, signal detection can be performed quickly in spite of using 6-level signals. It is evaluated using HSPICE simulation with a 0.18-µm CMOS that the throughput of the proposed circuit based on the 1-phase dual-rail scheme attains 435 Mbps/wire which is 2.9 times faster than that of a CMOS circuit based on a conventional 4-phase dual-rail scheme. The test chip is fabricated, and the asynchronous data-transfer behavior of the proposed scheme is confirmed.

  • Differential Operation Oriented Multiple-Valued Encoding and Circuit Realization for Asynchronous Data Transfer

    Tomohiro TAKAHASHI  Naoya ONIZAWA  Takahiro HANYU  

     
    PAPER

      Vol:
    E87-C No:11
      Page(s):
    1928-1934

    This paper presents an asynchronous data transfer scheme using 2-color 2-phase dual-rail encoding based on a differential operation and its circuit realization. The proposed encoding enables seamless asynchronous data transfer without inserting a spacer, because each logic value is represented by two kinds of codewords with dual-rail, called "color" data. Since the difference x-x between components of a codeword (x,x) becomes constant in every valid state, the data-arrival state can be detected by calculating the difference x-x. From the viewpoint of circuit implementation, during the state transition, since the dual-rail x and x are defined so as to transit differentially, the compatibility with a comparator using a differential amplifier becomes high, which results in reduction of the cycle time. It is evaluated using HSPICE simulation with a 0.18 µm CMOS technology that communication speed using the proposed dual-rail encoding becomes 1.4 times faster than that using conventional dual-rail encoding.

  • A High-Speed and Area-Efficient Dual-Rail PLA Using Divided and Interdigitated Column Circuits

    Hiroaki YAMAOKA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-Integrated Electronics

      Vol:
    E87-C No:6
      Page(s):
    1069-1077

    This paper presents a new high-speed and area-efficient dual-rail PLA. The proposed circuit includes three schemes: 1) a divided column scheme (DCS), 2) a programmable sense-amplifier activation scheme (PSAS), and 3) an interdigitated column scheme (ICS). In the DCS, a column circuit of a PLA is divided and each circuit operates in parallel. This enhances the performance of the PLA, and this scheme becomes more effective as input data bandwidth increases. The PSAS is used to generate an activation pulse for sense amplifiers in the PLA. In this scheme, the proposed delay generators enable to minimize a timing margin depending on process variations and operating conditions. The ICS is used to enhance the area-efficiency of the PLA, where a method of physical compaction is employed. This scheme is effective for circuits which have the regularity in logic function such as arithmetic circuits. As applications of the proposed PLA, a comparator, a priority encoder, and an incrementor for 128-bit data processing were designed. The proposed circuit design schemes achieved a 22.2% delay reduction and a 37.5% area reduction on average over the conventional high-speed and low-power PLA in a 0.13-µm CMOS technology with a supply voltage of 1.2 V.

  • A Logic-Cell-Embedded PLA (LCPLA): An Area-Efficient Dual-Rail Array Logic Architecture

    Hiroaki YAMAOKA  Hiroaki YOSHIDA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-Integrated Electronics

      Vol:
    E87-C No:2
      Page(s):
    238-245

    This paper describes an area-efficient dual-rail array logic architecture, a logic-cell-embedded PLA (LCPLA), which has 2-input logic cells in the structure. The 2-input logic cells composed of pass-transistors can realize any 2-input Boolean function and are embedded in a dual-rail PLA. The logic cells can be designed by connecting some local wires and do not require additional transistors over logic cells of the conventional dual-rail PLA. By using the logic cells, some classes of logic functions can be implemented efficiently, so that high-speed and low-power operations are also achieved. The advantages over the conventional PLAs and standard-cell-based designs were demonstrated by using benchmark circuits, and the LCPLA is shown to be effective to reduce the number of product terms. In a structure with a 64-bit input and a 1-bit output including 220 product terms, the LCPLA achieved an area reduction by 35% compared to the conventional high-speed dual-rail PLA, and the power-delay product was reduced by 74% and 46% compared to the conventional high-speed single-rail PLA and the conventional high-speed dual-rail PLA, respectively. A test chip of this configuration was fabricated using a 0.35-µm, 3-metal-layer CMOS technology, and was verified with a functional test using a logic tester and an electron-beam tester at frequencies of up to 100 MHz with a supply voltage of 3.3 V.

  • A Fine Grain Cooled Logic Architecture for Low-Power Processors

    Hiroyuki MATSUBARA  Takahiro WATANABE  Tadao NAKAMURA  

     
    PAPER

      Vol:
    E84-A No:3
      Page(s):
    735-740

    In this paper, we propose a fine grain Cooled Logic architecture for low-power oriented processors. Cooled Logic detects, in novel hardware method with dual-rail logic, functional blocks to be active, and stops clocks to each of the functional blocks in order to make it inactive at certain periods. To confirm the effectiveness of our approach, we design a 4-bit and a 16-bit event-driven array multipliers, and analyze their power consumption by the HSPICE simulator. As a result, it is shown that Cooled Logic has a tendency to reduce power consumptions in both the functional blocks and the clock drivers of the multipliers.

  • Boolean Single Flux Quantum Circuits

    Yoichi OKABE  Chen Kong TEH  

     
    INVITED PAPER-Digital Applications

      Vol:
    E84-C No:1
      Page(s):
    9-14

    This paper reviews the recent development of the Boolean Single Flux Quantum (BSFQ) circuits. BSFQ circuits perform Boolean operation based on the superconducting flux level, and let digital bits propagate in the form of 'set' and 'reset' pulses using dual-rail Josephson transmission line (JTL). Just the same as CMOS circuits BSFQ circuits do not require any local clock system for the operation gates, and thus are delay insensitive, and comparably simple in terms of the number of Josephson junctions. Implementation of basic BSFQ circuits, namely 'NOT,' 'AND,' 'OR,' 'XOR' gate, is described. These circuits have been experimentally tested, and their workability has been proven.

  • Regenerative Pass-Transistor Logic: A Circuit Technique for High Speed Digital Design

    Tsz Shing CHEUNG  Kunihiro ASADA  

     
    PAPER-Integrated Electronics

      Vol:
    E79-C No:9
      Page(s):
    1274-1284

    Regenerative Pass-transistor Logic (RPL), a modular dual-rail circuit technique for high speed logic design that gives reasonably low power consumption, was developed. The technique can be applied to basic logic gates, full adders, multiplier units, and more complicated arithmetic logics like Conditional Carry Select (CCS) circuit. The magnitude of propagation delay time of RPL is smaller than the conventional CPL(Complementary Pass-transistor Logic), or DPL (Double Pass-transistor Logic). Low power consumption can also be achieved by reduced number of transistors and metal interconnections. Simulation and layout data also proved that RPL is advantageous over existing dual-rail logics while considering speed, power consumption and layout area.