An energy-efficient nonvolatile FPGA with assuring highly-reliable backup operation using a self-terminated power-gating scheme is proposed. Since the write current is automatically cut off just after the temporal data in the flip-flop is successfully backed up in the nonvolatile device, the amount of write energy can be minimized with no write failure. Moreover, when the backup operation in a particular cluster is completed, power supply of the cluster is immediately turned off, which minimizes standby energy due to leakage current. In fact, the total amount of energy consumption during the backup operation is reduced by 66% in comparison with that of a conventional worst-case-based approach where the long time write current pulse is used for the reliable write.
Akira MOCHIZUKI Hirokatsu SHIRAHAMA Takahiro HANYU
A new static storage component, a quaternary flip-flop which consists of two-bit storage elements and three four-level voltage comparators, is proposed for a high-performance multiple-valued VLSI-processor datapath. A key circuit, a differential-pair circuit (DPC), is used to realize a high-speed multi-level voltage comparator. Since PMOS cross-coupled transistors are utilized as not only active loads of the DPC-based comparator but also parts of each storage element, the critical delay path of the proposed flip-flop can be shortened. Moreover, a dynamic logic style is also used to cut steady current paths through current sources in DPCs, which results in great reduction of its power dissipation. It is evaluated with HSPICE simulation in 0.18 µm CMOS that the power dissipations of the proposed quaternary flip-flop is reduced to 50 percent in comparison with that of a corresponding binary CMOS one.
Akira MOCHIZUKI Takashi TAKEUCHI Takahiro HANYU
A new common-bus architecture with temporal and spatial parallel access capabilities under wire-resource constraint is proposed to transfer vast quantities of data between modules inside a VLSI chip. Since bus controllers are distributed into modules, the proposed bus architecture can directly transfer data from one module to another without any central bus control unit like a Direct Memory Access (DMA) controller, which enables to reduce communication steps for data transfer between modules. Moreover, when a start address and the number of block data in both source/destination modules are determined at the first step of a data-transfer scheme, no additional address setting for the data transfer is required in the rest of the scheme, which allows us to use all the wire resources as only the "data bus." Therefore, the bus function is dynamically programmed, which results in achieving high throughput of bus communication. For example, in case of a 64-line common bus, it is evaluated that the maximum data throughput in the proposed architecture with dynamic bus-function programming is four times higher than that in the conventional DMA bus architecture with fixed 32-bit-address/32-bit-data buses.
A nonvolatile field-programmable gate array (NV-FPGA), where the circuit-configuration information still remains without power supply, offers a powerful solution against the standby power issue. In this paper, an NV-FPGA is proposed where the programmable logic and interconnect function blocks are described in a hardware description language and are pushed through a standard-cell-based design flow with nonvolatile flip-flops. The use of the standard-cell-based design flow makes it possible to migrate any arbitrary process technology and to perform architecture-level simulation with physical information. As a typical example, the proposed NV-FPGA is designed under 55nm CMOS/100nm magnetic tunnel junction (MTJ) technologies, and the performance of the proposed NV-FPGA is evaluated in comparison with that of a CMOS-only volatile FPGA.
Shunsuke KOSHITA Naoya ONIZAWA Masahide ABE Takahiro HANYU Masayuki KAWAMATA
This paper presents FIR digital filters based on stochastic/binary hybrid computation with reduced hardware complexity and high computational accuracy. Recently, some attempts have been made to apply stochastic computation to realization of digital filters. Such realization methods lead to significant reduction of hardware complexity over the conventional filter realizations based on binary computation. However, the stochastic digital filters suffer from lower computational accuracy than the digital filters based on binary computation because of the random error fluctuations that are generated in stochastic bit streams, stochastic multipliers, and stochastic adders. This becomes a serious problem in the case of FIR filter realizations compared with the IIR counterparts because FIR filters usually require larger number of multiplications and additions than IIR filters. To improve the computational accuracy, this paper presents a stochastic/binary hybrid realization, where multipliers are realized using stochastic computation but adders are realized using binary computation. In addition, a coefficient-scaling technique is proposed to further improve the computational accuracy of stochastic FIR filters. Furthermore, the transposed structure is applied to the FIR filter realization, leading to reduction of hardware complexity. Evaluation results demonstrate that our method achieves at most 40dB improvement in minimum stopband attenuation compared with the conventional pure stochastic design.
Hirokatsu SHIRAHAMA Takashi MATSUURA Masanori NATSUI Takahiro HANYU
A multiple-valued current-mode (MVCM) circuit using current-flow control is proposed for a power-greedy sequential linear-array system. Whenever operation is completed in processing element (PE) at the present stage, every possible current source in the PE at the previous stage is cut off, which greatly reduces the wasted power dissipation due to steady current flows during standby states. The completion of the operation can be easily detected using "operation monitor" that observes input and output signals at latches, and that generates control signal immediately at the time completed. Since the wires of data and control signals are shared in the proposed MVCM circuit, no additional wires are required for current-flow control. In fact, it is demonstrated that the power consumption of the MVCM circuit using the proposed method is reduced to 53 percent in comparison with that without current-source control.
Akira MOCHIZUKI Daisuke NISHINOHARA Takahiro HANYU
A new circuit technique based on pass-gate logic with dynamic supply-voltage and clock-frequency control is proposed for a low-power motion-vector detection VLSI processor. Since the pass-gate logic style has potential advantages that have small equivalent stray capacitance and small number of short-circuit paths, its circuit implementation makes it possible to reduce the power dissipation with maintaining high-speed switching capability. In case the calculation result is obtained on the way of calculation steps, additional power saving is also achieved by combining the pass-gate logic circuitry with a mechanism that dynamically scales down the supply voltage and the clock frequency while maintaining the calculation throughput. As a typical example, a sum of absolute differences (SAD) unit in a motion-vector detection VLSI processor is implemented and its efficiency in power saving is demonstrated.
Tomohiro TAKAHASHI Naoya ONIZAWA Takahiro HANYU
This paper presents an asynchronous data transfer scheme using 2-color 2-phase dual-rail encoding based on a differential operation and its circuit realization. The proposed encoding enables seamless asynchronous data transfer without inserting a spacer, because each logic value is represented by two kinds of codewords with dual-rail, called "color" data. Since the difference x-x between components of a codeword (x,x) becomes constant in every valid state, the data-arrival state can be detected by calculating the difference x-x. From the viewpoint of circuit implementation, during the state transition, since the dual-rail x and x are defined so as to transit differentially, the compatibility with a comparator using a differential amplifier becomes high, which results in reduction of the cycle time. It is evaluated using HSPICE simulation with a 0.18 µm CMOS technology that communication speed using the proposed dual-rail encoding becomes 1.4 times faster than that using conventional dual-rail encoding.
Masashi KAMIYANAGI Fumitaka IGA Shoji IKEDA Katsuya MIURA Jun HAYAKAWA Haruhiro HASEGAWA Takahiro HANYU Hideo OHNO Tetsuo ENDOH
In this paper, it is shown that our fabricated MTJ of 60180 nm2, which is connected to the MOSFET in series by 3 levels via and 3 levels metal line, can dynamically operate with the programming current driven by 0.14 µm CMOSFET. In our measurement of transient characteristic of fabricated MTJ, the pulse current, which is generated by the MOSFET with an applied pulse voltage of 1.5 V to its gate, injected to the fabricated MTJ connected to the MOSFET in series. By using the current measurement technique flowing in MTJ with sampling period of 10 nsec, for the first time, we succeeded in monitor that the transition speed of the resistance change of 60180 nm2 MTJ is less than 30 ns with its programming current of 500 µA and the resistance change of 1.2 kΩ.
Fumitaka IGA Masashi KAMIYANAGI Shoji IKEDA Katsuya MIURA Jun HAYAKAWA Haruhiro HASEGAWA Takahiro HANYU Hideo OHNO Tetsuo ENDOH
In this paper, we have succeeded in the fabrication of high performance Magnetic Tunnel Junction (MTJ) which is integrated in CMOS circuit with 4-Metal/ 1-poly Gate 0.14 µm CMOS process. We have measured the DC characteristics of the MTJ that is fabricated on via metal of 3rd layer metal line. This MTJ of 60180 nm2 achieves a large change in resistance of 3.52 kΩ (anti-parallel) with TMR ratio of 151% at room temperature, which is large enough for sensing scheme of standard CMOS logic. Furthermore, the write current is 320 µA that can be driven by a standard MOS transistor. As the results, it is shown that the DC performance of our fabricated MTJ integrated in CMOS circuits is very good for our novel spin logic (MTJ-based logic) device.
A NULL-convention circuit based on dual-rail current-mode differential logic is proposed for a high-performance asynchronous VLSI. Since input/output signals are mapped to dual-rail current signals, the NULL-convention circuit can be directly implemented based on the dual-rail differential logic, which results in the reduction of the device counts. As a typical example, a NULL-convention logic based full adder using the proposed circuit is implemented by a 0.18 µm CMOS technology. Its delay, power dissipation and area are reduced to 61 percent, 60 percent and 62 percent, respectively, in comparison with those of a corresponding CMOS implementation.
Xiaowei DENG Takahiro HANYU Michitaka KAMEYAMA
The investigation of device functions required from the systems point of view will be important for the development of the next generation of VLSI devices and systems. In this paper, a super pass transistor (SPT) model is presented as a quantum device candidate for future VLSI systems based on multiple-valued logic. A possible quantum device structure for the SPT model is also described, which employs the concepts of a lateral-resonant-tunneling quantum-dot transistor and a heterostructure field-effect transistor. Since it has the powerful capability of detecting multiple signal levels, the SPT will be useful for the implementation of highly compact multiple-valued VLSI systems. To exploit the functionality of the SPT, a super pass gate (SP-gate) corresponding to a single SPT is proposed as a multiple-valued universal logic module. The mathematical properties of the SP-gate are discussed. A design method for a multiple-valued SP-gate network is presented. An application of SP-gates to a multiple-valued image processing system is also demonstrated. The SP-gate network for the multiple-valued image processing system is evaluated in comparison with the corresponding NMOS implementation in terms of the number of transistors, interconnections and cascaded transistor stages. The size of a generalized series-parallel SP-gate network is also evaluated in comparison with a functionally equivalent multiple-valued series-parallel MOS pass transistor network. The results show that highly compact multiple-valued VLSI systems can be achieved if the SPT-model can be realized by an actual quantum device.
Naoya ONIZAWA Warren J. GROSS Takahiro HANYU Vincent C. GAUDET
Stochastic decoding provides ultra-low-complexity hardware for high-throughput parallel low-density parity-check (LDPC) decoders. Asynchronous stochastic decoding was proposed to demonstrate the possibility of low power dissipation and high throughput in stochastic decoders, but decoding might stop before convergence due to “lock-up”, causing error floors that also occur in synchronous stochastic decoding. In this paper, we introduce a wire-delay dependent (WDD) scheduling algorithm for asynchronous stochastic decoding in order to reduce the error floors. Instead of assigning the same delay to all computation nodes in the previous work, different computation delay is assigned to each computation node depending on its wire length. The variation of update timing increases switching activities to decrease the possibility of the “lock-up”, lowering the error floors. In addition, the WDD scheduling algorithm is simplified for the hardware implementation in order to eliminate time-averaging and multiplication functions used in the original WDD scheduling algorithm. BER performance using a regular (1024, 512) (3,6) LDPC code is simulated based on our timing model that has computation and wire delay estimated under ASPLA 90nm CMOS technology. It is demonstrated that the proposed asynchronous decoder achieves a 6.4-9.8× smaller latency than that of the synchronous decoder with a 0.25-0.3 dB coding gain.
Akira MOCHIZUKI Hirokatsu SHIRAHAMA Yuma WATANABE Takahiro HANYU
An energy-efficient intra-chip communication link circuit with ternary current signaling is proposed for an asynchronous Network-on-Chip. The data signal encoded by an asynchronous three-state protocol is represented by a small-voltage-swing three-level intermediate signal, which results in the reduction of transition delay and achieving energy-efficient data transfer. The three-level voltage is generated by using a combination of dynamically controlled current sources with feedback loop mechanism. Moreover, the proposed circuit contains a power-saving scheme where the dynamically controlled transistors also are utilized. By cutting off the current paths when the data transfer on the communication link is inactive, the power dissipation can be greatly reduced. It is demonstrated that the average data-transfer speed is about 1.5 times faster than that of a binary CMOS implementation using a 130nm CMOS technology at the supply voltage of 1.2V.
Satoshi ARAGAKI Takahiro HANYU Tatsuo HIGUCHI
This paper presents a high-density multiple-valued content-addressable memory (MVCAM) based on a floating-gate MOS device. In the proposed CAM, a basic operation performed in each cell is a threshold function that is a kind of inverter whose threshold value is programmable. Various multiple-valued operations for data retrieval can be easily performed using threshold functions. Moreover, each cell circuit in the MVCAM can be implemented using only a single floating-gate MOS transistor. As a result, the cell area of the four-valued CAM are reduced to 37% in comparison with that of the conventional dynamic CAM cell.
In this paper, a multiple-valued current-mode (MVCM) circuit based on active-load dual-rail differential logic is proposed for a high-performance arithmetic VLSI system with crosstalk-noise immunity. The use of dual-rail complementary differential-pair circuits (DPCs), whose outputs are summed up by wiring makes it possible to reduce the common-mode noise, and yet enhance the switching speed. By using the diode-connected cross-coupled PMOS active loads, the rapid transition of switching in the DPC is relaxed appropriately, which can also eliminate spiked input noise. It is demonstrated that the noise reduction ratio and the switching delay of the proposed MVCM circuit in a 90 nm CMOS technology is superior to those of the corresponding ordinary implementation.
Takahiro HANYU Hiroto ISHII Tatsuo HIGUCHI
This paper presents a design of a new high-density multiple-valued associative memory with incomplete information proessing capability. The degree of similarity between an input data and each memory data is evaluated by several discrete values (called multi-level matching), so that any incomplete input data can be recognized surely as a certain memory data in the associateve memory. The multiple-valued associative processing can be performmed systematically by the superposition of a new multiple-valued logic function, called multi-level matching function. The multiple-valued data is directly performmed using floatin-gate MOS device whose threshold voltage is programmable, so that the multi-level matching function can be simply implemented. It is demonstrated that the chip area and the processing time of an 8-level matching function circuit can be reduced to 3.2 % and 25 %, respectively in comparison with the corresponding binary implementation using 2-µm CMOS process.
Tomohiro TAKAHASHI Takahiro HANYU
This paper presents an asynchronous multiple-valued current-mode data-transfer controller chip based on a 1-phase dual-rail encoding technique. The proposed encoding technique enables "one-way delay" asynchronous data transfer because request and acknowledge signals can be transmitted simultaneously and valid states are detected by calculating the sum of dual-rail codewords. Since a key component, a current-to-voltage conversion circuit in a valid-state detector, is tuned so as to obtain a sufficient voltage range to improve switching speed of a comparator, signal detection can be performed quickly in spite of using 6-level signals. It is evaluated using HSPICE simulation with a 0.18-µm CMOS that the throughput of the proposed circuit based on the 1-phase dual-rail scheme attains 435 Mbps/wire which is 2.9 times faster than that of a CMOS circuit based on a conventional 4-phase dual-rail scheme. The test chip is fabricated, and the asynchronous data-transfer behavior of the proposed scheme is confirmed.
Takahiro HANYU Michitaka KAMEYAMA
A new logic-in-memory VLSI architecture based on multiple-valued floating-gate-MOS pass-transistor logic is proposed to solve the communication bottleneck between memory and logic modules. Multiple-valued stored data are represented by the threshold voltage of a floating-gate MOS transistor, so that a single floating-gate MOS transistor is effectively employed to merge multiple-valued threshold-literal and pass-switch functions. As an application, a four-valued logic-in-memory VLSI for high-speed pattern recognition is also presented. The proposed VLSI detects a stored reference word with the minimum Manhattan distance between a 16-bit input word and 16-bit stored reference words. The effective chip area, the switching delay and the power dissipation of a new four-valued full adder, which is a key component of the proposed logic-in-memory VLSI, are reduced to about 33 percent, 67 percent and 24 percent, respectively, in comparison with those of the corresponding binary CMOS implementation under a 0.5-µm flash EEPROM technology.
Akira MOCHIZUKI Hiromitsu KIMURA Mitsuru IBUKI Takahiro HANYU
A tunneling magnetoresistive(TMR)-based logic-in- memory circuit, where storage functions are distributed over a logic-circuit plane, is proposed for a low-power VLSI system. Since the TMR device is regarded as a variable resistor with a non-volatile storage capability, any logic functions with external inputs and stored inputs can be performed by using the TMR-based resistor/transistor network. The combination of dynamic current-mode circuitry and a TMR-based logic network makes it possible to perform any switching operations without steady current, which results in power saving. A design example of an SAD unit for MPEG encoding is discussed, and its advantages are demonstrated.