1-13hit |
Shin'ichi HATAKENAKA Takashi NANYA
Strongly Fault-Secure (SFS) circuits are known to achieve the TSC goal of producing a non-codeword as the first erroneous output due to a fault. Strongly Code-Disjoint (SCD) circuits always map non-codeword inputs to non-codeword outputs even in the presence of faults so long as the faults are undetectable. This paper presents a new generalized design method for the SFS and SCD realization of combinational circuits. The proposed design is simple, and always gives an SFS and SCD combinational circuit which implements any given logic function. The resulting SFS/SCD circuits can be connected in cascade with each other to construct a larger SFS/SCD circuit if each interface is fully exercised.
This paper proposes a synthesis method to obtain speed-independent asynchronous circuits directly from signal transition graph (STG) specifications with single cycle signals which can be non-persistent and have free-choice operations. The resulting circuits are implemented with basic gates and asynchronous latches, and operate correctly under finite but unbounded gate delays and the zero wire delay assumptions. The proposed method introduces 5 types of lock relations to implement a non-persistent STG. A non-persistent STG can be implemented if every non-persistent signal to a signal t is super-locked with t. The resulting circuits are optimized by extracting of literals, mapping onto asymmetric C-elements, etc. Experimental results show that the proposed synthesis method outperforms the existing synthesis systems such as SYN and SIS.
In the data path circuits of asynchronous systems, logical faults may first manifest as undetectable, transient wrong codewords, in spite of encoding the inputs and the outputs and proper organization which enables the faults to be propagated to the primary outputs in the form of non-codewords. Due to this, the conventional methods of concurrent error detection (CED) using the logic (voltage) monitoring is not effective. In this paper, we suggest a mixed-signal approach to achieve CED for a class of asynchronous circuits, known as self-timed circuits. First, we show that it is impossible to guarantee the CED using logic monitoring of the primary outputs in spite of proper encoding and organization of self-timed circuits. Then, we discuss different manifestations of single stuck-at faults occurring during normal operation in these circuits. Finally, we present the feasibility of achieving CED using a built-in current sensor (BICS) along with encoding techniques.
Nattha SRETASEREEKUL Hiroshi SAITO Euiseok KIM Metehan OZCAN Masashi IMAI Hiroshi NAKAMURA Takashi NANYA
Asynchronous controllers effectively control high concurrence of datapath operations for high speed. Signal Transition Graphs (STGs) can effectively represent these concurrent events. However, highly concurrent STGs cause the state explosion problem in asynchronous synthesis tools. Many small but highly concurrent STGs cannot be synthesized to obtain control circuits. Moreover, STGs also lead to some control-time overhead of the four-phase handshake protocol. In this paper, we propose a method for deriving the serial control nodes from Control Data Flow Graphs (CDFGs) such that the concurrence of datapath operations is still preserved. The STGs derived from the serialized control nodes are serial STGs which are simpler for synthesis than the concurrent STGs. We also propose an implementation using these serialized controllers to generate local clocks at any necessary times. The implementation results in very small control-time overhead. The experimental results show that the number of synthesis states is proportional to the number of control signals, and the circuits with satisfiable small control-time overhead are obtained.
Kouichi WATANABE Masashi IMAI Masaaki KONDO Hiroshi NAKAMURA Takashi NANYA
As VLSI technology advances, delay variations will become more serious. Delay-insensitive asynchronous dual-rail circuits tolerate any delay variation, but their energy consumption is more than double that of the single-rail circuits because signal transitions occur every cycle in all bits regardless of the input bit pattern. However, in functional units, a significant number of input bits may not change from the previous input in many cases. In such a situation, calculation of these bits is not required. Thus, we propose a method, called unflip-bits control, makes use of the above situation, to reduce energy consumption. We evaluate the energy consumption and performance penalty for the method using HSPICE and the verilog-XL simulator, and compare the method with the conventional dual-rail circuit and a synchronous circuit. Our evaluation results reveal that the proposed asynchronous dual-rail circuit has a 12-60% lower energy consumption compared with a conventional asynchronous dual-rail circuit.
Nattha SRETASEREEKUL Takashi NANYA
The Quasi-Delay-Insensitive (QDI) model assumes that all the forks are isochronic. The isochronic-fork assumption requires uniform wire delays and uniform switching thresholds of the gates associated with the forking branches. This paper presents a method for determining such forks that do not have to satisfy the isochronic fork requirements, and presents experimental results that show many isochronic forks assumed for existing QDI circuits do not actually have to be "isochronic" or can be even ignored.
Masaaki KONDO Takuro HAYASHIDA Masashi IMAI Hiroshi NAKAMURA Takashi NANYA Atsushi HORI
Cluster systems are getting widely used because of good performance / cost ratio. However, their reliability has not been well discussed in practical environment so far. As the number of commodity components in a cluster system gets increased, it is indispensable to support reliability by system software. SCore cluster system software is a parallel programming environment for High Performance Computing (HPC). SCore provides checkpointing and rollback-recovery mechanism for high availability. In this paper, we analyze and evaluate the checkpointing and rollback-recovery mechanisms of SCore quantitively. The experimental results reveal that the required time for checkpointing scales very well in respect to the number of computing nodes. However, the required time is quite long due to the low effective network bandwidth. Based on the results, we modify SCore and successfully make checkpointing and recovery 1.8 2.8 times and 3.7 5.0 times faster respectively. This is very helpful for cluster systems to achieve high performance and high availability.
Motokazu OZAWA Masashi IMAI Yoichiro UENO Hiroshi NAKAMURA Takashi NANYA
Wire delays, instead of gate delays, are moving into dominance in modern VLSI design. Current synchronous processors have the critical path not in the ALU function but in the cache access. Since the cache performance enhancement is limited by the memory access delay which mainly consists of wire delays, a reduction in gate delays may no longer imply any enhancement in processor performance. To solve this problem, this paper presents a novel architecture, called the Cascade ALU. The Cascade ALU allows super-scalar processors with future technologies to move the critical path into the ALU part. Therefore the Cascade ALU can enjoy the expected progress in future device speed. Since the delay of the Cascade ALU varies depending on the executed instructions, an asynchronous system is shown to be suitable for implementing the Cascade ALU. However an asynchronous system may have a large handshake overhead, this paper also presents an asynchronous Fine Grain Pipeline technique that hides the handshake overhead. Finally, this paper presents results of performance and area evaluation for an asynchronous implementation of the cascade ALU. The results show that the cascade ALU architecture has a good performance scalability on the reduction of the ALU latency and imposes little area penalty compared with current synchronous processors.
Hiroshi SAITO Alex KONDRATYEV Jordi CORTADELLA Luciano LAVAGNO Alex YAKOVLEV Takashi NANYA
Deep submicron technology calls for new design techniques, in which wire and gate delays are accounted to have equal or nearly equal effect on circuit behavior. Asynchronous speed-independent (SI) circuits, whose behavior is only robust to gate delay variations, may be too optimistic. On the other hand, building circuits totally delay-insensitive (DI), for both gates and wires, is impractical because of the lack of effective synthesis methods. The paper presents a new approach for synthesis of globally DI and locally SI circuits. The method, working in two possible design scenarios, either starts from a behavioral specification called Signal Transition Graph (STG) or from the SI implementation of the STG specification. The method locally modifies the initial model in such a way that the resultant behavior of the system does not depend on delays in the input wires. This guarantees delay-insensitivity of the system-environment interface. The suggested approach was successfully tested on a set of benchmarks. Experimental results show that DI interfacing is realized with a relatively moderate cost in area and speed (costs about 40% area penalty and 20% speed penalty).
Hiroshi SAITO Naohiro HAMADA Nattha JINDAPETCH Tomohiro YONEDA Chris MYERS Takashi NANYA
This paper proposes new scheduling methods for asynchronous circuits with bundled-data implementations. Since operations in asynchronous circuits start after the completion of a previous operation, this method approximates the set of start times for each operation using the delay of the resources. Next, this method decides on control steps from the approximated sets of start times, which are used in scheduling algorithms. This paper extends two scheduling algorithms used for synchronous circuits so that the approximated sets of start times and the decided control steps are used. Finally, this paper shows the effectiveness of our proposed methods by comparing scheduling results with ones obtained by the original two scheduling algorithms.
Rafael K. MORIZAWA Takashi NANYA
A known problem of the four-phase handshaking protocol is that a return-to-zero phase of the signals involved in the handshake is necessary before starting another cycle, in which no useful work is usually done. In this paper we first define an easy-to-write specification style to specify four-phase handshaking asynchronous controllers that can be translated to an STG to obtain a gate-level implementation using existing synthesis methods. Then, we propose an algorithm that takes the specification written using our specification style and finds an optimized timing in which the idle-phase overhead of its gate-level implementation is reduced.