1-8hit |
Guihai YAN Yinhe HAN Xiaowei LI Hui LIU
Crosstalk delay within an on-chip bus can induce severe transmission performance penalties. The Bus-grouping Asynchronous Transmission (BAT) scheme is proposed to mitigate the performance degradation. Furthermore, considering the distinct spatial locality of transition distribution on some types of buses, we use the locality to optimize the BAT. In terms of the implementation, we propose the Differential Counter Cluster (DCC) synchronous mechanism to synchronize the data transmission, and the Delay Active Shielding (DAS) to protect some critical signals from crosstalk and optimize the routing area overhead. The BAT is scalable with the variation of bus width with little extra implementation complexity. The effectiveness of the BAT is evaluated by focusing on the on-chip buses of a superscalar microprocessor simulator using the SPEC CPU2000 benchmarks. When applied to a 64-bit on-chip instruction bus, the BAT scheme, compared with the conservative approach, Codec and Variable Cycle Transmission (DYN) approaches, improves performance by 55+%, 10+%, 30+%, respectively, at the expense of 13% routing area overhead.
Yu HU Jing YE Zhiping SHI Xiaowei LI
Process variation has become prominent in the advanced CMOS technology, making the timing of fabricated circuits more uncertain. In this paper, we propose a Layout-Aware Path Selection (LAPS) technique to accurately estimate the circuit timing variation from a small set of paths. Three features of paths are considered during the path selection. Experiments conducted on benchmark circuits with process variation simulated with VARIUS show that, by selecting only hundreds of paths, the fitting errors of timing distribution are kept below 5.3% when both spatial correlated and spatial uncorrelated process variations exist.
Yinhe HAN Yu HU Xiaowei LI Huawei LI Anshuman CHANDRA Xiaoqing WEN
Connection of internal scan chains in core wrapper design (CWD) is necessary to handle the width match of TAM and internal scan chains. However, conventional serial connection of internal scan chains incurs power and time penalty. Study shows that the distribution and high density of don't care bits (X-bits) in test patterns make scan slices overlapping and partial overlapping possible. A novel parallel CWD (pCWD) approach is presented in this paper for lowering test power by shortening wrapper scan chains and adjusting test patterns. In order to achieve shift time reduction from overlapping in pCWD, a two-phase process on test pattern: partition and fill, is presented. Experimental results on d695 of ITC2002 benchmark demonstrated the shift time and test power have been decreased by 1.5 and 15 times, respectively. In addition, the proposed pCWD can be used as a stand-alone time reduction technique, which has better performance than previous techniques.
Critical path selection is very important in delay testing. Critical paths found by conventional static timing analysis (STA) tools are inadequate to represent the real timing of the circuit, since neither the testability of paths nor the statistical variation of cell delays caused by process variation is considered. This paper proposed a novel path selection method considering process variation. The circuit is firstly simplified by eliminating non-critical edges under statistical timing model, and then divided into sub-circuits, while each sub-circuit has only one prime input (PI) and one prime output (PO). Critical paths are selected only in critical sub-circuits. The concept of partially critical edges (PCEs) and completely critical edges (CCEs) are introduced to speed up the path selection procedure. Two path selection strategies are also presented to search for a testable critical path set to cover all the critical edges. The experimental results showed that the proposed circuit division approach is efficient in path number reduction, and PCEs and CCEs play an important role as a guideline during path selection.
Jianliang GAO Yinhe HAN Xiaowei LI
Bugs are becoming unavoidable in complex integrated circuit design. It is imperative to identify the bugs as soon as possible through post-silicon debug. For post-silicon debug, observability is one of the biggest challenges. Scan-based debug mechanism provides high observability by reusing scan chains. However, it is not feasible to scan dump cycle-by-cycle during program execution due to the excessive time required. In fact, it is not necessary to scan out the error-free states. In this paper, we introduce Suspect Window to cover the clock cycle in which the bug is triggered. Then, we present an efficient approach to determine the suspect window. Based on Suspect Window, we propose a novel debug mechanism to locate the bug both temporally and spatially. Since scan dumps are only taken in the suspect window with the proposed mechanism, the time required for locating the bug is greatly reduced. The approaches are evaluated using ISCAS'89 and ITC'99 benchmark circuits. The experimental results show that the proposed mechanism can significantly reduce the overall debug time compared to scan-based debug mechanism while keeping high observability.
Test data volume and test power are two major concerns when testing modern large circuits. Recently, selective encoding of scan slices is proposed to compress test data. This encoding technique, unlike many other compression techniques encoding all the bits, only encodes the target-symbol by specifying a single bit index and copying group data. In this paper, we propose an extended selective encoding which presents two new techniques to optimize this method: a flexible grouping strategy, X bits exploitation and filling strategy. Flexible grouping strategy can decrease the number of groups which need to be encoded and improve test data compression ratio. X bits exploitation and filling strategy can exploit a large number of don't care bits to reduce testing power with no compression ratio loss. Experimental results show that the proposed technique needs less test data storage volume and reduces average weighted switching activity by 25.6% and peak weighted switching activity by 9.68% during scan shift compared to selective encoding.
Binzhang FU Yinhe HAN Huawei LI Xiaowei LI
The Network-on-Chip (NoC) is limited by the reliability constraint, which impels us to exploit the fault-tolerant routing. Generally, there are two main design objectives: tolerating more faults and achieving high network performance. To this end, we propose a new multiple-round dimension-order routing (NMR-DOR). Unlike existing solutions, besides the intermediate nodes inter virtual channels (VCs), some turn-legally intermediate nodes inside each VC are also utilized. Hence, more faults are tolerated by those new introduced intermediate nodes without adding extra VCs. Furthermore, unlike the previous solutions where some VCs are prioritized, the NMR-DOR provides a more flexible manner to evenly distribute packets among different VCs. With extensive simulations, we prove that the NMR-DOR maximally saves more than 90% unreachable node pairs blocked by faults in previous solutions, and significantly reduces the packet latency compared with existing solutions.
Yu HU Yinhe HAN Xiaowei LI Huawei LI Xiaoqing WEN
LSI testing is critical to guarantee chips are fault-free before they are integrated in a system, so as to increase the reliability of the system. Although full-scan is a widely adopted design-for-testability technique for LSI design and testing, there is a strong need to reduce the test data Volume, scan-in Power dissipation, and test application Time (VPT) of full-scan testing. Based on the analysis of the characteristics of the variable-to-fixed run-length coding technique and the random access scan architecture, this paper presents a novel design scheme to tackle all VPT issues simultaneously. Experimental results on ISCAS'89 benchmarks have shown on average 51.2%, 99.5%, 99.3%, and 85.5% reduction effects in test data volume, average scan-in power dissipation, peak scan-in power dissipation, and test application time, respectively.