Jiaxin WU Bing LI Li ZHAO Xinzhou XU
Maaki SAKAI Kanon HOKAZONO Yoshiko HANADA
Xuecheng SUN Zheming LU
Yuanhe WANG Chao ZHANG
Jinfeng CHONG Niu JIANG Zepeng ZHUO Weiyu ZHANG
Xiangrun LI Qiyu SHENG Guangda ZHOU Jialong WEI Yanmin SHI Zhen ZHAO Yongwei LI Xingfeng LI Yang LIU
Meiting XUE Wenqi WU Jinfeng LUO Yixuan ZHANG Bei ZHAO
Rong WANG Changjun YU Zhe LYU Aijun LIU
Huijuan ZHOU Zepeng ZHUO Guolong CHEN
Feifei YAN Pinhui KE Zuling CHANG
Manabu HAGIWARA
Ziqin FENG Hong WAN Guan GUI
Sungryul LEE
Feng WANG Xiangyu WEN Lisheng LI Yan WEN Shidong ZHANG Yang LIU
Yanjun LI Jinjie GAO Haibin KAN Jie PENG Lijing ZHENG Changhui CHEN
Ho-Lim CHOI
Feng WEN Haixin HUANG Xiangyang YIN Junguang MA Xiaojie HU
Shi BAO Xiaoyan SONG Xufei ZHUANG Min LU Gao LE
Chen ZHONG Chegnyu WU Xiangyang LI Ao ZHAN Zhengqiang WANG
Izumi TSUNOKUNI Gen SATO Yusuke IKEDA Yasuhiro OIKAWA
Feng LIU Helin WANG Conggai LI Yanli XU
Hongtian ZHAO Hua YANG Shibao ZHENG
Kento TSUJI Tetsu IWATA
Yueying LOU Qichun WANG
Menglong WU Jianwen ZHANG Yongfa XIE Yongchao SHI Tianao YAO
Jiao DU Ziwei ZHAO Shaojing FU Longjiang QU Chao LI
Yun JIANG Huiyang LIU Xiaopeng JIAO Ji WANG Qiaoqiao XIA
Qi QI Liuyi MENG Ming XU Bing BAI
Nihad A. A. ELHAG Liang LIU Ping WEI Hongshu LIAO Lin GAO
Dong Jae LEE Deukjo HONG Jaechul SUNG Seokhie HONG
Tetsuya ARAKI Shin-ichi NAKANO
Shoichi HIROSE Hidenori KUWAKADO
Yumeng ZHANG
Jun-Feng Liu Yuan Feng Zeng-Hui Li Jing-Wei Tang
Keita EMURA Kaisei KAJITA Go OHTAKE
Xiuping PENG Yinna LIU Hongbin LIN
Yang XIAO Zhongyuan ZHOU Mingjie SHENG Qi ZHOU
Kazuyuki MIURA
Yusaku HIRAI Toshimasa MATSUOKA Takatsugu KAMATA Sadahiro TANI Takao ONOYE
Ryuta TAMURA Yuichi TAKANO Ryuhei MIYASHIRO
Nobuyuki TAKEUCHI Kosei SAKAMOTO Takuro SHIRAYA Takanori ISOBE
Shion UTSUMI Kosei SAKAMOTO Takanori ISOBE
You GAO Ming-Yue XIE Gang WANG Lin-Zhi SHEN
Zhimin SHAO Chunxiu LIU Cong WANG Longtan LI Yimin LIU Zaiyan ZHOU
Xiaolong ZHENG Bangjie LI Daqiao ZHANG Di YAO Xuguang YANG
Takahiro IINUMA Yudai EBATO Sou NOBUKAWA Nobuhiko WAGATSUMA Keiichiro INAGAKI Hirotaka DOHO Teruya YAMANISHI Haruhiko NISHIMURA
Takeru INOUE Norihito YASUDA Hidetomo NABESHIMA Masaaki NISHINO Shuhei DENZUMI Shin-ichi MINATO
Zhan SHI
Hakan BERCAG Osman KUKRER Aykut HOCANIN
Ryoto Koizumi Xiaoyan Wang Masahiro Umehira Ran Sun Shigeki Takeda
Hiroya Hachiyama Takamichi Nakamoto
Chuzo IWAMOTO Takeru TOKUNAGA
Changhui CHEN Haibin KAN Jie PENG Li WANG
Pingping JI Lingge JIANG Chen HE Di HE Zhuxian LIAN
Ho-Lim CHOI
Akira KITAYAMA Goichi ONO Hiroaki ITO
Koji NUIDA Tomoko ADACHI
Yingcai WAN Lijin FANG
Yuta MINAMIKAWA Kazumasa SHINAGAWA
Sota MORIYAMA Koichi ICHIGE Yuichi HORI Masayuki TACHI
Sendren Sheng-Dong XU Albertus Andrie CHRISTIAN Chien-Peng HO Shun-Long WENG
Zhikui DUAN Xinmei YU Yi DING
Hongbo LI Aijun LIU Qiang YANG Zhe LYU Di YAO
Yi XIONG Senanayake THILAK Yu YONEZAWA Jun IMAOKA Masayoshi YAMAMOTO
Feng LIU Qian XI Yanli XU
Yuling LI Aihuang GUO
Mamoru SHIBATA Ryutaroh MATSUMOTO
Haiyang LIU Xiaopeng JIAO Lianrong MA
Ruixiao LI Hayato YAMANA
Riaz-ul-haque MIAN Tomoki NAKAMURA Masuo KAJIYAMA Makoto EIKI Michihiro SHINTANI
Kundan LAL DAS Munehisa SEKIKAWA Tadashi TSUBONE Naohiko INABA Hideaki OKAZAKI
Kenichi OKADA Kento YAMAOKA Hidetoshi ONODERA
This paper proposes a model to calculate statistical gate-delay variation caused by intra-chip and inter-chip variabilities. The variation of each gate delay directly influences the circuit-delay variation, so it is important to characterize each gate-delay variation accurately. Every transistor in a gate affects transient characteristics of the gate, so it is indispensable to consider an intra-gate variability for the modeling of gate-delay variation. This effect is not captured in a statistical delay analysis reported so far. Our model considers the intra-gate variability by sensitivity constants. We evaluate our modeling accuracy, and we show some simulated results of a circuit delay variation.
Sadahiro TANI Yoshihiro UCHIDA Makoto FURUIE Shuji TSUKIYAMA BuYeol LEE Shuji NISHI Yasushi KUBOTA Isao SHIRAKAWA Shigeki IMAI
The problem of calculating parasitic capacitances between two interconnects is investigated dedicatedly for liquid crystal displays, with the main focus put on the approximate expressions of the capacitances caused at the intersection and the parallel running of two interconnects. To derive simple and accurate approximate expressions, the interconnects in these structures are divided into a few basic coupling regions in such a way that the electro-magnetic field in each region can be calculated by a 2-D capacitance model. Then the capacitance in such a region is represented by a simple expression adjusted to the results computed by an electro-magnetic field solver. The total capacitance obtained by summing the capacitances in all regions is evaluated in comparison with the one obtained by using a 3-D field solver, resulting in a relative error of less than 5%.
Atsushi KUROKAWA Takashi SATO Hiroo MASUDA
We present a new and efficient approach for extracting on-chip mutual inductances of VLSI interconnects by applying approximation formulae. The equations are based on the assumption of filaments or bars of finite width and zero thickness and are derived through Taylor's expansion of the exact formula for mutual inductance between filaments. Despite the assumption of uniform current density in each of the bars, the model is sufficiently accurate for the interconnections of current and future LSIs because the skin and proximity effects do not affect most wires. Expression of the equations in polynomial form provides a balance between accuracy and computational complexity. These equations are mapped according to the geometric structures for which they are most suitable in minimizing the runtime of inductance calculation while retaining the required accuracy. Within geometrical constraints, the wires are of arbitrary specification. Results of a comprehensive evaluation based on the ITRS-specified global wiring structure for 2003 shows that the inductance values were extracted by using the proposed approach, and they were within several percent of the values obtained by using commercial three-dimensional (3-D) field solvers. The efficiency of the proposed approach is also demonstrated by extraction from a real layout design that has 300-k interconnecting segments.
Akira TSUCHIYA Masanori HASHIMOTO Hidetoshi ONODERA
This paper discusses the frequency to extract RLC values from interconnects. In circuit design, frequency-independent equivalent circuit is widely used, and many design and analysis techniques based on this equivalent circuit are proposed so far. However in reality, characteristics of interconnects are frequency-dependent. Also pulse waveforms in digital circuits contain multiple frequency components. The frequency used for RLC extraction affects the accuracy of interconnect characterization, and hence careful determination of extraction frequency is critical. We propose a representative frequency for RLC extraction. Conventionally, representative frequencies are determined by input pulse. The proposed method decides the representative frequency based on the interconnect length, whereas conventional representative frequencies are determined by input pulse shape, period and patterns. We verify that the extraction at the proposed frequency provides the most accurate transition waveform against various input signals and interconnect structures in digital circuits.
Herng-Jer LEE Chia-Chi CHU Wu-Shiung FENG
A novel method is presented to compute moments of high-speed VLSI interconnects, which are modeled as coupled RLC trees. Recursive formulae of moments of coupled RC trees are extended to those for coupled RLC trees by considering both self inductances and mutual inductances. Analytical formulae for voltage moments at each node are derived explicitly. The formulae can be efficiently used for estimating delay and crosstalk noise. The inductive crosstalk noise waveform can be accurately and efficiently estimated using the moment computation technique in conjunction with the projection-based order reduction method. Fundamental aspects of the proposed approach are described in details. Experimental results show the increased accuracy of the proposed method over that of the traditional ones.
Masanori HASHIMOTO Masao TAKAHASHI Hidetoshi ONODERA
We propose an estimation method of crosstalk noise for generic RC trees. The proposed method derives an analytic waveform of crosstalk noise in a 2-π equivalent circuit. The peak voltage is calculated from the closed-form expression. We also develop a transformation method from generic RC trees with branches into the 2-π model circuit. The proposed method can hence estimate crosstalk noise for any RC trees. Our estimation method is evaluated in a 0.13 µm technology. The peak noise of two partially-coupled interconnects is estimated with the average error of 11%. Our method transforms generic RC interconnects with branches into the 2-π model with 14% error on average.
Hiroyuki TSUJIKAWA Shozo HIRANO Kenji SHIMAZAKI
Large-scale integration (LSI) microchips are widely used in many types of modern electronic products including electric appliances, cellular phones, toys, electronic games, and automobiles. The electromagnetic interference (EMI) noise produced by these micro devices can cause significant operational problems in other devices in the system. Some methods that have been proposed for such analysis estimates the EMI noise characteristic through transistor-level power simulation. However, in these methods, transistor-level circuit simulation is performed by combining the power-supply impedance model and the power-supply source model. In general, transistor-level simulators are too slow for practical application-specific integrated circuit (ASIC) design. In this paper, a total solution for reducing EMI noise in LSI microchips was presented. The proposed design methodology integrates fast and accurate estimation, reduction, and verification. The method was successfully applied to the design of a 32-bit microprocessor, achieving a 2-dB noise reduction in the FM frequency band and 10-dB reduction at 1 GHz. The proposed design methodology is a powerful solution for LSI designers as a tool for minimizing EMI noise and achieve higher levels of reliability for the microelectronic products.
Akihiko HYODO Masanori MUROYAMA Hiroto YASUURA
This paper presents a variable pipeline depth processor, which can dynamically adjust its pipeline depth and operating voltage at run-time, we call dynamic pipeline and voltage scaling (DPVS), depending on the workload characteristics under timing constraints. The advantage of adjusting pipeline depth is that it can eliminate the useless energy dissipation of the additional stalls, or NOPs and wrong-path instructions which would increase as the pipeline depth grow deeper in excess of the inherent parallelism. Although dynamic voltage scaling (DVS) is a very effective technique in itself for reducing energy dissipation, lowering supply voltage also causes performance degradation. By combining with dynamic pipeline scaling (DPS), it would be possible to retain performance at required level while reducing energy dissipation much further. Experimental results show the effectiveness of our DPVS approach for a variety of benchmarks, reducing total energy dissipation by up to 64.90% with an average of 27.42% without any effect on performance, compared with a processor using only DVS.
Takeshi FUJINO Akira YAMAZAKI Yasuhiko TAITO Mitsuya KINOSHITA Fukashi MORISHITA Teruhiko AMANO Masaru HARAGUCHI Makoto HATAKENAKA Atsushi AMO Atsushi HACHISUKA Kazutami ARIMOTO Hideyuki OZAKI
A low power 16 Mb embedded DRAM (eDRAM) macro is fabricated using 0.15 µm logic -based embedded DRAM process technology. A 0.5 µm2 CUB (
Satoshi KOMATSU Masahiro FUJITA
The power dissipation at the off-chip bus has become a significant part of the overall power dissipation in micro-processor based digital systems. This paper presents irredundant address bus encoding methods which reduce signal transitions on the instruction address buses by using adaptive codebook methods. These methods are based on the temporal locality and spatial locality of instruction address. Since applications tend to JUMP/BRANCH to limited sets of addresses, proposed encoding methods assign the least signal transition codes to the addresses of JUMP/BRANCH operations in the past. In addition, our methods can be easily applicable for conventional digital systems since they are irredundant encoding methods. Our encoding methods reduce the signal transitions on the instruction address buses, which results in the reduction of total power dissipation of digital systems. Experimental results show that our methods can reduce the signal transition by an average of 88%.
Jun SAKIYAMA Naofumi HOMMA Takafumi AOKI Tatsuo HIGUCHI
This paper presents a unified representation of fast addition algorithms based on Counter Tree Diagrams (CTDs). By using CTDs, we can describe and analyze various adder architectures in a systematic way without using specific knowledge about underlying arithmetic algorithms. Examples of adder architectures that can be handled by CTDs include Redundant-Binary (RB) adders, Signed-Digit (SD) adders, Positive-Digit (PD) adders, carry-save adders, parallel counters (e.g., 3-2 counters and 4-2 counters) and networks of such basic adders/counters. This paper also discusses the CTD-based analysis of carry-propagation-free adders using various number representations.
Hiroyuki OCHI Tatsuya SUZUKI Sayaka MATSUNAGA Yoichi KAWANO Takao TSUDA
Floating-point units (FPUs) are indispensable in processors, 3D-graphic engines, etc. To improve design productivity of these LSIs, FPU IPs are strongly desired. However, it is impossible to cover wide range of needs by an FPU IP, because there are various kind of options in specifications (e.g., operating frequency, latency, and ability of pipeline operation) and implementations (e.g., hardware algorithms). Thus, multiple IPs are needed even for the same functionality. In this paper, we propose to build an IP Library which consists of large number of FPU IPs with various kind of specifications and implementations, and which has catalogue data that shows not only specifications but also post-layout area and power dissipation of each IP. As the first step of the project, we have developed an IP Library targeted to Rohm 0.35 µm triple-metal process, which consists of 20 IPs for IEEE-754-standard single-precision floating-point division with 5 operating frequencies (50 MHz, 75 MHz, 100 MHz, 125 MHz, and 150 MHz), with two options whether pipelined or not, and with two hardware algorithms (the restoring method and the SRT method). We have also developed a catalogue for the IP Library, which shows post-layout area and power dissipation as well as specification of each IP. We have introduced two metrics "performance-area ratio (MFLOPS/mm2)" and "performance-power ratio (MFLOPS/W)" to afford a good insight into efficiency of implementations. From the catalogue data, the restoring method is, on the average, 1.4 times and 2.3 times better than the SRT method in terms of performance-area ratio and performance-power ratio, respectively. The developed catalogue is usable not only for selection of the optimal IP for a specific application, but also for quantitative analysis at the early stage of architecture design. It is also expected that the catalogue data based on an actual process technology is valuable for education.
Nattha SRETASEREEKUL Hiroshi SAITO Euiseok KIM Metehan OZCAN Masashi IMAI Hiroshi NAKAMURA Takashi NANYA
Asynchronous controllers effectively control high concurrence of datapath operations for high speed. Signal Transition Graphs (STGs) can effectively represent these concurrent events. However, highly concurrent STGs cause the state explosion problem in asynchronous synthesis tools. Many small but highly concurrent STGs cannot be synthesized to obtain control circuits. Moreover, STGs also lead to some control-time overhead of the four-phase handshake protocol. In this paper, we propose a method for deriving the serial control nodes from Control Data Flow Graphs (CDFGs) such that the concurrence of datapath operations is still preserved. The STGs derived from the serialized control nodes are serial STGs which are simpler for synthesis than the concurrent STGs. We also propose an implementation using these serialized controllers to generate local clocks at any necessary times. The implementation results in very small control-time overhead. The experimental results show that the number of synthesis states is proportional to the number of control signals, and the circuits with satisfiable small control-time overhead are obtained.
Jing-Jia LIOU Li-C. WANG Angela KRSTIĆ Kwang-Ting (Tim) CHENG
Critical path selection is an indispensable step for AC delay test and timing validation. Traditionally, this step relies on the construction of a set of worse-case paths based upon discrete timing models. However, the assumption of discrete timing models can be invalidated by timing defects and process variation in the deep sub-micron domain, which are often continuous in nature. As a result, critical paths defined in a traditional timing analysis approach may not be truly critical in reality. In this paper, we propose using a statistical delay evaluation framework for estimating the quality of a path set. Based upon the new framework, we demonstrate how the traditional definition of a critical path set may deviate from the true critical path set in the deep sub-micron domain. To remedy the problem, we discuss improvements to the existing path selection strategies by including new objectives. We then compare statistical approaches with traditional approaches based upon experimental analysis of both defect-free and defect-injected cases.
Yasuo SATO Motoyuki SATO Koki TSUTSUMIDA Kazumi HATAYAMA Kazuyuki NOMOTO
We analyze the timing design methodology for testing chips using a multiple-clock domain scheme. We especially focus on the layout design of the design-for-test (DFT) circuits and the clock network. First, we demonstrate the built-in-self-testing (BIST) scheme for multiple-clock domains. Then, we discuss the layout method that achieves a low clock-skew between different clock domains with a small modification of the original user logic layout. Finally, we evaluate the fault coverage of our large ASIC chips designed using our new methodology. The short design period and high fault coverage of our methodology are confirmed using actual industrial designs. We introduce a viable approach for industrial designs because designers don't have to pay much attention to DFT. Our approach also provides designers with an easy method for LSI debugging and diagnostics.
Youhua SHI Zhe ZHANG Shinji KIMURA Masao YANAGISAWA Tatsuo OHTSUKI
Reseeding technique is proposed to improve the fault coverage in pseudo-random testing. However most of previous works on reseeding is based on storing the seeds in an external tester or in a ROM. In this paper we present a built-in reseeding technique for LFSR-based test pattern generation. The proposed structure can run both in pseudorandom mode and in reseeding mode. Besides, our method requires no storage for the seeds since in reseeding mode the seeds can be generated automatically in hardware. In this paper we also propose an efficient grouping algorithm based on simulated annealing to optimize test vector grouping. Experimental results for benchmark circuits indicate the superiority of our technique against other reseeding methods with respect to test length and area overhead. Moreover, since the theoretical properties of LFSRs are preserved, our method could be beneficially used in conjunction with any other techniques proposed so far.
Kenichi ICHINO Ko-ichi WATANABE Masayuki ARAI Satoshi FUKUMOTO Kazuhiko IWASAKI
We propose a technique of selecting seeds for the LFSR-based test pattern generators that are used in VLSI BISTs. By setting the computed seed as an initial value, target fault coverage, for example 100%, can be accomplished with minimum test length. We can also maximize fault coverage for a given test length. Our method can be used for both test-per-clock and test-per-scan BISTs. The procedure is based on vector representations over GF(2m), where m is the number of LFSR stages. The results indicate that test lengths derived through selected seeds are about sixty percent shorter than those derived by simple seeds, i.e. 00
A test generation method with time-expansion model can achieve high fault efficiency for acyclic sequential circuits, which can be obtained by partial scan design. This method, however, requires combinational test pattern generation algorithm that can deal with multiple stuck-at faults, even if the target faults are single stuck-at faults. In this paper, we propose a test generation method for acyclic sequential circuits with a circuit model, called MS-model, which can express multiple stuck-at faults in time-expansion model as single stuck-at faults. Our procedure can generate test sequences for acyclic sequential circuits with just combinational test pattern generation algorithm for single stuck-at faults. Experimental results show that test sequences for acyclic sequential circuits with high fault efficiency are generated in small computational effort.
Motoki KIMURA Morgan Hirosuke MIKI Takao ONOYE Isao SHIRAKAWA
A Java execution environment is implemented, in which a hardware engine is operated in parallel with an embedded processor. This pair of hardware facilities together with an additional software kernel are devised for existing embedded systems, so as to execute Java applications more efficiently in such a way that 39 instructions are added to the original Java Virtual Machine to implement the software kernel. The exploration of design parameters is also attempted to attain a low hardware cost and high performance. The proposed hardware engine of a 6-stage pipeline can be integrated in a single chip using 30 k gates together with the instruction and data cache memories. The proposed approach improves the execution speed by a factor of 5 in comparison with the J2ME software implementation.
Jun Kyoung KIM Ho Young KIM Tag Gon KIM
This paper proposes a retargetable framework for rapid evaluation of processor architecture, which represents abstraction levels of architecture in a hierarchical manner. The basis for such framework is a hierarchical architecture description language, called XR2, which describes architecture at three abstraction levels: instruction set architecture, pipeline architecture and micro-architecture. In addition, a token-level computational model for fast pipeline simulation is proposed, which considers the minimal information required for the given performance measurement of the pipeline. Experimental result shows that token-level simulation is faster than the traditional cycle-accurate one by 50% to 80% in pipeline architecture evaluation.
Nozomu TOGAWA Kyosuke KASAHARA Yuichiro MIYAOKA Jinku CHOI Masao YANAGISAWA Tatsuo OHTSUKI
A packed SIMD type operation or a SIMD operation is n-parallel b/n-bit sub-operations executed by the modified n-bit functional unit. Such a functional unit is called a SIMD functional unit and a processor core which can execute SIMD operations is called a SIMD processor core. SIMD operations can be effectively applied to image processing applications. This paper focuses on hardware/software cosynthesis of SIMD processor cores and particularly proposes a new simulator generator which simulates pipelined instructions for a SIMD processor. Generally, a SIMD functional unit has many options and then we can have so many different SIMD functional unit instances. However, since our hardware/software cosynthesis system synthesizes a special-purpose processor core for an input application program, it uses very limited SIMD functional unit instances. In the proposed approach, we consider a SIMD operation to be a set of SIMD sub-operations. By adding up the appropriate SIMD sub-operations, we construct a single SIMD operation. Then a SIMD functional unit behavior can be characterized by a collection of SIMD operations. This approach has the advantage that: if we have a small number of behavior libraries for SIMD sub-operations, we can instantiate a particular SIMD functional unit behavior. Experimental results demonstrate the effectiveness of the proposed approach.
Sheldon X.-D. TAN C.-J. Richard SHI
A systematic and efficient approach is presented to generating simple yet accurate symbolic expressions for transfer functions and characteristics of large linear analog circuits. The approach is based on a compact determinant decision diagram (DDD) representation of exact transfer functions and characteristics. Several key tasks of generating interpretable symbolic expressions--DDD graph simplification, term de-cancellation, and dominant-term generation--are shown to be able to perform linearly by means of DDD graph operations. An efficient algorithm for generating dominant terms is presented based on the concepts of finding the k-shortest paths in a DDD graph. Experimental results show that our approach outperforms other start-of-the-art approaches, and is capable of generating interpretable expressions for typical analog blocks in minutes on modern computer workstations.
Montree SIRIPRUCHYANUN Paramote WARDKEIN
A simple circuit scheme, able to generate square/triangular wave, is proposed. Its advantages are that oscillation frequency and amplitudes of the proposed circuit do have a small range of temperature drift. Electronic adjustments of them can be obtained with a wide sweep range and DC offset adjustment available. In addition, the proposed scheme can produce frequency-constant derivative of PWM signal without an additional device requirement. It is very appropriate for, with simple circuit details, not only circuit implementation but also monolithic fabrication. The PSPICE simulation results through bipolar technology are given here, they show good performance of the proposed circuit.
Sarat C. MARUVADA Karthik KRISHNAMOORTHY Florin BALASA Lucian M. IONESCU
The traditional way of approaching device-level placement problems for analog layout is to explore a huge search space of absolute placement representations, where cells are allowed to illegally overlap during their moves. This paper presents a novel exploration technique for analog placement, operating on a subset of tree representations of the layout, where the typical presence of an arbitrary number of symmetry groups of devices is directly taken into account during the search of the solution space. The efficiency of the novel approach is due to the use of red-black interval trees, data structures employed to support operations on dynamic sets of intervals.
Sheqin DONG Xianlong HONG Song CHEN Xin QI Ruijie WANG Jun GU
Solution space smoothing allows a local search heuristic to escape from a poor, local minimum. In this paper, we propose a technique that can smooth the rugged terrain surface of the solution space of a placement problem. We test the smoothing heuristics for MCNC benchmarks, and for VLSI placement with pre-placed modules and placement with consideration of congestion. Experiment results demonstrated that solution space smoothing is very efficient for VLSI module placement, and it can be applied to all floorplanning representations proposed so far.
Kazuya WAKATA Hiroaki SAITO Kunihiro FUJIYOSHI Keishi SAKANUSHI Takayuki OBATA Chikaaki KODAMA
In this paper, for convex rectilinear block packing problem, we propose 1) a novel algorithm to obtain a packing based on a given sequence-pair in O(n2) time (conventional method needs O(n3) time), where n is the number of rectangle sub-blocks made from convex blocks, 2) a move operation for Simulated Annealing which is symmetric and can guarantee reachability for the first time, and 3) a method to generate a random adjacent sequence-pair in O(n2) time. By using 1), 2) and 3) together, the time complexity of the inner loop in Simulated Annealing becomes surely O(n2) time. Experimental results show that the proposed algorithm is faster than the conventional ones in practical and the wire length as well as packing area is taken into consideration in the proposed method.
Jingyu XU Xianlong HONG Tong JING Yici CAI Jun GU
As the CMOS technology enters the very deep submicron era, inter-wire coupling capacitance becomes the dominant part of load capacitance. The coupling effects have brought new challenges to routing algorithms on both delay estimation and optimization. In this paper, we propose a timing-driven global routing algorithm with consideration of coupling effects. Our two-phase algorithm based on timing-relax method includes a heuristic Steiner tree algorithm to guarantee the timing performance of the initial solution and an optimization algorithm based on coupling-effect-transference. Experimental results are given to demonstrate the efficiency and accuracy of the algorithm.
Shinobu NAGAYAMA Tsutomu SASAO
In this paper, we propose a compact representation of logic functions using Multi-valued Decision Diagrams (MDDs) called heterogeneous MDDs. In a heterogeneous MDD, each variable may take a different domain. By partitioning binary input variables and representing each partition as a single multi-valued variable, we can produce a heterogeneous MDD with 16% smaller memory size than a Reduced Ordered Binary Decision Diagram (ROBDD), and with comparable memory size to Free Binary Decision Diagrams (FBDDs). And also, heterogeneous MDDs have shorter Average Path Length (APL) than ROBDDs and FBDDs. We minimized a large number of benchmark functions to show the compactness of heterogeneous MDDs.
This paper proposes a fast multi-cycle path detection method for large sequential circuits. The proposed method is based on ATPG techniques, especially on implication techniques, to use circuit structures and multi-cycle path conditions directly. The method also checks whether or not a multi-cycle path may be invalidated by static hazards at the inputs of flip-flops. Then we explain how to apply the proposed algorithm to real industrial designs. Experimental results show that our method is much faster than conventional ones and that it is efficient enough to handle large industrial designs.
Nobuhiro DOI Takashi HORIYAMA Masaki NAKANISHI Shinji KIMURA Katsumasa WATANABE
In the hardware synthesis from a high-level language such as C, the bit length of variables is one of the key issues for the area and speed optimization. Usually, designers are required to optimize the bit-length of each variable manually using the time-consuming simulation on huge-data. In this paper, we propose an optimization method of the fractional bit length in the conversion from floating-point variables to fixed-point variables. The method is based on error propagation and the backward propagation of the accuracy limitation. The method is fully analytical and fast compared to simulation based methods.
Thanyapat SAKUNKONCHAK Satoshi KOMATSU Masahiro FUJITA
SpecC language is designated to handle the design of entire system from specification to implementation and of hardware/software co-design. Concurrency is one of the features of SpecC which expresses the parallel execution of processes. Describing the systems which contain concurrent behaviors would have some data exchanging or transferring among them. Therefore, the synchronization semantics (notify/wait) of events should be incorporated. The actual design, which is usually sophisticated by its characteristic and functionalities, may contain a bunch of event synchronization codes. This will make the design difficult and time-consuming to verify. In this paper, we introduce a technique which helps verifying the synchronization of events in SpecC. The original SpecC code containing synchronization semantics is parsed and translated into a Boolean SpecC code. The difference decision diagrams (DDDs) is used to verify for event synchronization on Boolean SpecC code. The counter examples for tracing back to the original source are given when the verification results turn out to be unsatisfied. Here we also introduce idea on automatically refinement when the results are unsatisfied and preset some preliminary results.
This paper addresses bitwidth optimization focusing on leakage power reduction for system-level low-power design. By means of tuning the design parameter, bitwidth tailored to a given application requirements, the datapath width of processors and size of memories are optimized resulting in significant leakage power reduction besides dynamic power reduction. Experimental results for several real embedded applications, show power reduction without performance penalty range from about 21.5% to 66.2% of leakage power, and 14.5% to 59.2% of dynamic power.
Masanori HASHIMOTO Yoshiteru HAYASHI Hidetoshi ONODERA
This paper experimentally investigates the effectiveness of regularly-placed bit-slice layout and transistor-level optimization to datapath circuit performance. We focus on cell-base design flows with transistor-level circuit optimization. We examine the effectiveness through design experiments of 32-bit carry select adder and 16-bit tree-style multiplier in a 0.35 µm technology. From the experimental results, we can scarcely observe that manual cell placement contributes to improve circuit performance. On the other hand, transistor-level circuit optimization is so effective that circuit delay is reduced by 11-20% and power dissipation decreases to 42-62%. We can see that, in the case of cell-base design, transistor-level optimization is also important as well as in the case of custom design, whereas cell-base bit-slice layout has less importance to circuit performance.
Masayasu FUKUNAGA Seiji KAJIHARA Sadami TAKEOKA Shinichi YOSHIMURA
Since a logic circuit often has too many paths to test delay of all paths, it is necessary for path delay testing to limit the number of paths to be tested. The paths to be tested should have large delay because such paths more likely cause a fault. Additionally, a test set for the paths are required to detect other models of faults as many as possible. In this paper, we investigate two typical criteria of path selection for path delay testing. From our experiments, we observe that test patterns for the longest paths cannot cover many local delay defects such as transition faults.
This work proposed a high-performance charge-pump circuit for phase-locked-loop (PLL) applications. The proposed charge-pump circuit is composed of a pair of wide-swing current mirror and symmetric pump circuits which can provide wide output range and have no jump phenomenon. The proposed charge-pump circuit has been designed and simulated by using the TSMC 0.35 µm 1P4M CMOS technology. Simulation results show the feasibility of proposed structure for low-voltage high-frequency applications.
Hiroshi INOUE Takahiro IWASAKI Toshifumi SUGANE Masahiro NUMA Keisuke YAMAMOTO
In an LSI design process, Engineering Change Orders (ECO's) are often given even after the layout process. This letter presents an approach to change the design to satisfy the new specification with ECO's by employing an error diagnosis technique. Our approach performs incremental synthesis using spare cells embedded on the original layout. Experimental results show that applying the error diagnosis technique to incremental synthesis is effective to suppress increase in delay time caused by ECO's.
Nozomu TOGAWA Koichi TACHIKAKE Yuichiro MIYAOKA Masao YANAGISAWA Tatsuo OHTSUKI
This letter proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume for each operation type a super SIMD functional unit which can execute all the SIMD instructions. Secondly we reduce a SIMD instruction or "sub-function" of each super functional unit, one by one, while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally find SIMD functional unit configuration as well as a processor core architecture. The promising experimental results are also shown.
This paper presents the implementation of sfl2vl, a new free tool for SFL to Verilog conversion. Also this paper will discuss the performance of the conversion and the logic simulation of the sfl2vl+Icarus Verilog (free-ware compiler) versus PARTHENON with some MPU designs.
Hack-Soo OH Chang-Gene WOO Pyung CHOI Geunbae LIM Jang-Kyoo SHIN Jong-Hyun LEE
Delta-sigma modulators (DSMs) are commonly use in high-resolution analog-to-digital converters, and band-pass delta-sigma modulators have recently been used to convert IF signals into digital signals. In particular, a quadrature band-pass delta-sigma modulator can achieve a lower total order, higher signal-to-noise ratio (SNR), and higher bandwidth when compared with conventional band-pass modulators. The current paper proposes a second-order three-bit quadrature band-pass delta-sigma modulator that can achieve a lower power consumption and better performance with a similar die size to a conventional fourth-order quadrature band-pass delta-sigma modulator (QBPDSM). The proposed system is integrated using CMOS 0.35 µm, double-poly, four-metal technology. The system operates at 13 MHz and can digitize a 200 kHz bandwidth signal centered at 4.875 MHz with an SNR of 85 dB. The power consumption is 35 mW at 3.3 V and 38 mW at 5 V, and the die size is 2
Suppose that we need to design a controller for the system x
Hiroshi NAGAMOCHI Yukihiro NISHIDA Toshihide IBARAKI
Given an edge-weighted graph G, the minimum maximal matching problem asks to find a minimum weight maximal matching. The problem is known to be NP-hard even if the graph is planar and unweighted. In this paper, we consider the problem in planar graphs. First, we prove a strong inapproximability for the problem in weighted planar graphs. Second, in contrast with the first result, we show that a polynomial time approximation scheme (PTAS) for the problem in unweighted planar graphs can be obtained by a divide-and-conquer method based on the planar separator theorem. For a given ε > 0, our scheme delivers in
Mira KIM Junji SHIKATA Hirofumi MURATANI Hideki IMAI
In this paper, we deal with c-secure codes in a fingerprinting scheme, which encode user ID to be embedded into the contents. If a pirate copy appears, c-secure codes allow the owner of the contents to trace the source of the illegal redistribution under collusion attacks. However, when dealing in practical applications, most past proposed codes are failed to obtain a good efficiency, i.e. their codeword length are too large to be embedded into digital contents. In this paper, we propose a construction method of c-secure CRT codes based on polynomials over finite fields and it is shown that the codeword length in our construction is shorter than that of Muratani's scheme. We compare the codeword length of our construction and that of Muratani's scheme by numerical experiments and present some theoretical results which supports the results obtained by numerical experiments. As a result, we show that our construction is especially efficient in respect to a large size of any coalition c. Furthermore, we discuss the influence of the random error on the traceability and formally define the Weak IDs in respect to our construction.
Minoru KURIBAYASHI Hatsukazu TANAKA
One of the important topics of watermarking technique is a robustness against geometrical transformations. In the previous schemes, a template matching is performed or an additional signal is embedded for the recovery of a synchronization loss. However, the former requires the original template, and the latter degrades the quality of image because both a watermark and a synchronization signal must be embedded. In the proposed scheme only a synchronization signal is embedded for the recovery of both a watermark and a synchronization loss. Then the embedded information depends on the distance between two embedded signal positions. The distance is not changed seriously by random geometrical transformations like StirMark attack unless the embedded signal is disturbed. Therefore, a watermark can be extracted correctly from such geometrically transformed image if the synchronization signal can be recovered.
Mitsuru YAMADA Akinori NISHIHARA
We propose a stochastic model for signals generated through the electron multiplying effect of detectors in charged particle beam equipments. This model is based on a stochastic variable characterized by a log-normal type distribution. The model is simple and can be used to represent a wide dynamic range of signals from pulse-like signals when the primary beam current is small to continuous signals when the primary beam current is large. For the model base reference a normalization of actual signal detectors is presented. This base reference yields the unique stochastic parameter used in our model. The proposed model better approximates the actual signals in the power spectrum distribution as compared to the filtered Poisson method presented elsewhere.
Chulmin CHOI Lae-Hoon KIM Yangki OH Sejin DOO Koeng-Mo SUNG
The measurement of the 3-dimensional behavior of early reflections in a sound field has been an important issue in auditorium acoustics since the reflection profile has been found to be strongly correlated with the subjective responsiveness of a listener. In order to detect the incidence angle and relative amplitude of reflections, a 4-point microphone system has conventionally been used. A new measurement system is proposed in this paper, which has 5 microphones. Microphones are located on each four apex of a tetrahedron and at the center of gravity. Early reflections, including simultaneously incident reflections,which previous 4-point microphone system could not discriminate as individual wavefronts, were successfully found with the new system. In order to calculate accurate image source positions, it is necessary to determine the exact peak positions from measured impulse responses composed of highly deformed and overlapped impulse trains. For this purpose, a peak-detecting algorithm, which finds dominant peaks in the impulse response by an iteration method, is introduced. In this paper, the theoretical background and features of the 5-microphone system are described. Also, some results of experiments using this system are described.
Sung-Kyo JUNG Hong-Goo KANG Dae-Hee YOUN
This letter presents the advantages of a cascaded algebraic codebook structure at relatively high bit-rates. The cascaded structure that consists of two stages provides flexible pulse combinations due to an additional gain term in the second stage. The perceptual quality of the cascaded structure can be further improved by using a gain re-estimation scheme. Experiments confirm that the cascaded structure has a big advantage in terms of quality and complexity as the bit-rate becomes higher.
One conventional technique for source localization is to utilize the time-difference-of-arrival (TDOA) measurements of a signal received at spatially separated sensors. A simple TDOA-based location algorithm that combines the advantages of two efficient positioning methods is developed. It is demonstrated that the proposed approach can give optimum performance in geolocation via satellites at different noise conditions.
Muneo KUSHIMA Koichi TANNO Okihiko ISHIZUKA
In this letter, a linear variable resistor circuit using an FG-MOSFET (floating-gate MOSFET) is proposed. This is based on Schlarmann's variable resistor and is very simple. The advantage of the proposed circuit is a wide-input range. The utility of the proposed circuit was confirmed by HSPICE simulation with 1.2 µm CMOS process parameters. The simulation results are reported in this letter.
This paper proposes an efficient method for design space exploration of the global optimum configuration for parameterized ASIPs. The method not only guarantees the optimum configuration, but also provides robust speedup for a wide range of processor architectures such as SoC, ASIC as well as ASIP. The optimization procedure within this method takes a two-steps approach. Firstly, design parameters are partitioned into clusters of inter-dependent parameters using parameter dependency information. Secondly, parameters are optimized for each cluster, the results of which are merged for global optimum. In such optimization, inferior configurations are extensively pruned with a detailed optimality mapping between dependent parameters. Experimental results with mediabench applications show an optimization speedup of 4.1 times faster than the previous work on average, which is significant improvement for practical use.
A novel audio watermarking based on wavelet modulation is presented. The watermark signals are constructed by M-band wavelet modulation that can increase redundancy to improve the detection performance. In order to maximize the watermarking strength within the perceptual constraints, the watermark signals synthesized from different subbands are separately masked using a frequency auditory model. CDMA technique is implemented to achieve watermarking capacity. Experimental results show that this method is very robust.
A new algorithm for the LogMAP decoding of linear block codes is considered. The decoding complexity is evaluated analytically and by computer simulation. The proposed algorithm is an improvement of the recursive LogMAP algorithm proposed by the authors. The recursive LogMAP algorithm is more efficient than the BCJR algorithm for low-rate codes, but the complexity grows considerably large for high-rate codes. The aim of the proposed algorithm is to solve the complexity explosion of the recursive LogMAP algorithm for high-rate codes. The proposed algorithm is more efficient than the BCJR algorithm for well-known linear block codes.
Young-Hwan YOU Won-Gi JEON Jeong-Wook SEO Byoung-Chul SONG Hyeok-Koo JUNG
In this letter, a simple peak-to-average power ratio (PAPR) reduction scheme by using a cyclic-shifted sequence mapping is addressed in OFDM-CDMA systems. The PAPR reduction approach is very simple because of no additional complexity and no side information. Also, this simple approach can be easily combined with a modified selective mapping (SLM) approach, which outperforms the original SLM approach at the expense of one additional side information, guaranteeing approximately same transmitter complexity.
Toshimichi SAITO Hiroshi IMAMURA Masaaki NAKA
This letter presents a simple A/D converter based on the circle map. The converter encodes a dc input into a binary output sequence and has the trapping window that extracts an available part of the output sequence. Using the available part, the decoder provides an estimation by a fraction with variable denominator: it can realize higher resolution. Theoretical evidences for the estimation characteristics are given.