1-10hit |
Naoya ONIZAWA Warren J. GROSS Takahiro HANYU Vincent C. GAUDET
Stochastic decoding provides ultra-low-complexity hardware for high-throughput parallel low-density parity-check (LDPC) decoders. Asynchronous stochastic decoding was proposed to demonstrate the possibility of low power dissipation and high throughput in stochastic decoders, but decoding might stop before convergence due to “lock-up”, causing error floors that also occur in synchronous stochastic decoding. In this paper, we introduce a wire-delay dependent (WDD) scheduling algorithm for asynchronous stochastic decoding in order to reduce the error floors. Instead of assigning the same delay to all computation nodes in the previous work, different computation delay is assigned to each computation node depending on its wire length. The variation of update timing increases switching activities to decrease the possibility of the “lock-up”, lowering the error floors. In addition, the WDD scheduling algorithm is simplified for the hardware implementation in order to eliminate time-averaging and multiplication functions used in the original WDD scheduling algorithm. BER performance using a regular (1024, 512) (3,6) LDPC code is simulated based on our timing model that has computation and wire delay estimated under ASPLA 90nm CMOS technology. It is demonstrated that the proposed asynchronous decoder achieves a 6.4-9.8× smaller latency than that of the synchronous decoder with a 0.25-0.3 dB coding gain.
Naoya ONIZAWA Takahiro HANYU Vincent C. GAUDET
This paper presents a high-throughput bit-serial low-density parity-check (LDPC) decoder that uses an asynchronous interleaver. Since consecutive log-likelihood message values on the interleaver are similar, node computations are continuously performed by using the most recently arrived messages without significantly affecting bit-error rate (BER) performance. In the asynchronous interleaver, each message's arrival rate is based on the delay due to the wire length, so that the decoding throughput is not restricted by the worst-case latency, which results in a higher average rate of computation. Moreover, the use of a multiple-valued data representation makes it possible to multiplex control signals and data from mutual nodes, thus minimizing the number of handshaking steps in the asynchronous interleaver and eliminating the clock signal entirely. As a result, the decoding throughput becomes 1.3 times faster than that of a bit-serial synchronous decoder under a 90 nm CMOS technology, at a comparable BER.
Naoya ONIZAWA Akira MOCHIZUKI Hirokatsu SHIRAHAMA Masashi IMAI Tomohiro YONEDA Takahiro HANYU
This paper introduces a partially parallel inter-chip link architecture for asynchronous multi-chip Network-on-Chips (NoCs). The multi-chip NoCs that operate as a large NoC have been recently proposed for very large systems, such as automotive applications. Inter-chip links are key elements to realize high-performance multi-chip NoCs using a limited number of I/Os. The proposed asynchronous link based on level-encoded dual-rail (LEDR) encoding transmits several bits in parallel that are received by detecting the phase information of the LEDR signals at each serial link. It employs a burst-mode data transmission that eliminates a per-bit handshake for a high-speed operation, but the elimination may cause data-transmission errors due to cross-talk and power-supply noises. For triggering data retransmission, errors are detected from the embedded phase information; error-detection codes are not used. The throughput is theoretically modelled and is optimized by considering the bit-error rate (BER) of the link. Using delay parameters estimated for a 0.13 µm CMOS technology, the throughput of 8.82 Gbps is achieved by using 10 I/Os, which is 90.5% higher than that of a link using 9 I/Os without an error-detection method operating under negligible low BER (<10-20).
Kazuyasu MIZUSAWA Naoya ONIZAWA Takahiro HANYU
This paper presents a design of an asynchronous peer-to-peer half-duplex/full-duplex-selectable data-transfer system on-chip interconnected. The data-transfer method between channels is based on a 1-phase signaling scheme realized by using multiple-valued current-mode (MVCM) circuits and encoding, which performs high-speed communication. A data transmission is selectable by adding a mode-detection circuit that observes data-transmission modes; full-duplex, half-duplex and standby modes. Especially, since current sources are completely cut off during the standby mode, the power dissipation can be greatly reduced. Moreover, both half-duplex and full-duplex communication can be realized by sharing a common circuit except a signal-level conversion circuit. The proposed interface is implemented using 0.18-µm CMOS, and its performance improvement is discussed in comparison with those of the other ordinary asynchronous methods.
This paper presents highly reliable multiple-valued one-phase signalling for an asynchronous on-chip communication link under process, supply-voltage and temperature variations. New multiple-valued dual-rail encoding, where each code is represented by the minimum set of three values, makes it possible to perform asynchronous communication between modules with just two wires. Since an appropriate current level is individually assigned to the logic value, a sufficient dynamic range between adjacent current signals can be maintained in the proposed multiple-valued current-mode (MVCM) circuit, which improves the robustness against the process variation. Moreover, as the supply-voltage and the temperature variations in smaller dimensions of circuit elements are dominated as the common-mode variation, a local reference voltage signal according to the variations can be adaptively generated to compensate characteristic change of the MVCM-circuit component. As a result, the proposed asynchronous on-chip communication link is correctly operated in the operation range from 1.1 V to 1.4 V of the supply voltage and that from -50 to 75 under the process variation of 3σ. In fact, it is demonstrated by HSPICE simulation in a 0.13-µm CMOS process that the throughput of the proposed circuit is enhanced to 435% in comparison with that of the conventional 4-phase asynchronous communication circuit under a comparable energy dissipation.
Naoya ONIZAWA Atsushi MATSUMOTO Takahiro HANYU
This paper introduces open-wire fault-resilient multiple-valued codes for reliable asynchronous point-to-point global communication links. In the proposed encoding, two communication modules assign complementary codewords that change between two valid states without an open-wire fault. Under an open-wire fault, at each module, the codewords don't reach to one of the two valid states and remains as “invalid” states. The detection of the invalid states makes it possible to stop sending wrong codewords caused by an open-wire fault. The detectability of the open-wire fault based on the proposed encoding is proven for m-of-n codes. The proposed code used in the multiple-valued asynchronous global communication link is capable of detecting a single open-wire fault with 3.08-times higher coding efficiency compared with a conventional multiple-valued code used in a triple-modular redundancy (TMR) link that detects an open-wire fault under the same dynamic range of logical values.
Naoya ONIZAWA Atsushi MATSUMOTO Takahiro HANYU
We have developed a long-range asynchronous on-chip data-transmission link based on multiple-valued single-track signaling for a highly reliable asynchronous Network-on-Chip. In the proposed signaling, 1-bit data with control information is represented by using a one-digit multi-level signal, so serial data can be transmitted asynchronously using only a single wire. The small number of wires alleviates the routing complexity of wiring long-range interconnects. The use of current-mode signaling makes it possible to transmit data at high speed without buffers or repeaters over a long interconnect wire because of the low-voltage swing of signaling, and it leads to low-latency data transmission. We achieve a latency of 0.45 ns, a throughput of 1.25 Gbps, and energy dissipation of 0.58 pJ/bit with a 10-mm interconnect wire under a 0.13 µm CMOS technology. This represents an 85% decrease in latency, a 150% increase in throughput, and a 90% decrease in energy dissipation compared to a conventional serial asynchronous data-transmission link.
Shunsuke KOSHITA Naoya ONIZAWA Masahide ABE Takahiro HANYU Masayuki KAWAMATA
This paper presents FIR digital filters based on stochastic/binary hybrid computation with reduced hardware complexity and high computational accuracy. Recently, some attempts have been made to apply stochastic computation to realization of digital filters. Such realization methods lead to significant reduction of hardware complexity over the conventional filter realizations based on binary computation. However, the stochastic digital filters suffer from lower computational accuracy than the digital filters based on binary computation because of the random error fluctuations that are generated in stochastic bit streams, stochastic multipliers, and stochastic adders. This becomes a serious problem in the case of FIR filter realizations compared with the IIR counterparts because FIR filters usually require larger number of multiplications and additions than IIR filters. To improve the computational accuracy, this paper presents a stochastic/binary hybrid realization, where multipliers are realized using stochastic computation but adders are realized using binary computation. In addition, a coefficient-scaling technique is proposed to further improve the computational accuracy of stochastic FIR filters. Furthermore, the transposed structure is applied to the FIR filter realization, leading to reduction of hardware complexity. Evaluation results demonstrate that our method achieves at most 40dB improvement in minimum stopband attenuation compared with the conventional pure stochastic design.
Tomohiro TAKAHASHI Naoya ONIZAWA Takahiro HANYU
This paper presents an asynchronous data transfer scheme using 2-color 2-phase dual-rail encoding based on a differential operation and its circuit realization. The proposed encoding enables seamless asynchronous data transfer without inserting a spacer, because each logic value is represented by two kinds of codewords with dual-rail, called "color" data. Since the difference x-x between components of a codeword (x,x) becomes constant in every valid state, the data-arrival state can be detected by calculating the difference x-x. From the viewpoint of circuit implementation, during the state transition, since the dual-rail x and x are defined so as to transit differentially, the compatibility with a comparator using a differential amplifier becomes high, which results in reduction of the cycle time. It is evaluated using HSPICE simulation with a 0.18 µm CMOS technology that communication speed using the proposed dual-rail encoding becomes 1.4 times faster than that using conventional dual-rail encoding.
A NULL-convention circuit based on dual-rail current-mode differential logic is proposed for a high-performance asynchronous VLSI. Since input/output signals are mapped to dual-rail current signals, the NULL-convention circuit can be directly implemented based on the dual-rail differential logic, which results in the reduction of the device counts. As a typical example, a NULL-convention logic based full adder using the proposed circuit is implemented by a 0.18 µm CMOS technology. Its delay, power dissipation and area are reduced to 61 percent, 60 percent and 62 percent, respectively, in comparison with those of a corresponding CMOS implementation.