Zhenhai TAN Yun YANG Xiaoman WANG Fayez ALQAHTANI
Chenrui CHANG Tongwei LU Feng YAO
Takuma TSUCHIDA Rikuho MIYATA Hironori WASHIZAKI Kensuke SUMOTO Nobukazu YOSHIOKA Yoshiaki FUKAZAWA
Shoichi HIROSE Kazuhiko MINEMATSU
Toshimitsu USHIO
Yuta FUKUDA Kota YOSHIDA Takeshi FUJINO
Qingping YU Yuan SUN You ZHANG Longye WANG Xingwang LI
Qiuyu XU Kanghui ZHAO Tao LU Zhongyuan WANG Ruimin HU
Lei Zhang Xi-Lin Guo Guang Han Di-Hui Zeng
Meng HUANG Honglei WEI
Yang LIU Jialong WEI Shujian ZHAO Wenhua XIE Niankuan CHEN Jie LI Xin CHEN Kaixuan YANG Yongwei LI Zhen ZHAO
Ngoc-Son DUONG Lan-Nhi VU THI Sinh-Cong LAM Phuong-Dung CHU THI Thai-Mai DINH THI
Lan XIE Qiang WANG Yongqiang JI Yu GU Gaozheng XU Zheng ZHU Yuxing WANG Yuwei LI
Jihui LIU Hui ZHANG Wei SU Rong LUO
Shota NAKAYAMA Koichi KOBAYASHI Yuh YAMASHITA
Wataru NAKAMURA Kenta TAKAHASHI
Chunfeng FU Renjie JIN Longjiang QU Zijian ZHOU
Masaki KOBAYASHI
Shinichi NISHIZAWA Masahiro MATSUDA Shinji KIMURA
Keisuke FUKADA Tatsuhiko SHIRAI Nozomu TOGAWA
Yuta NAGAHAMA Tetsuya MANABE
Baoxian Wang Ze Gao Hongbin Xu Shoupeng Qin Zhao Tan Xuchao Shi
Maki TSUKAHARA Yusaku HARADA Haruka HIRATA Daiki MIYAHARA Yang LI Yuko HARA-AZUMI Kazuo SAKIYAMA
Guijie LIN Jianxiao XIE Zejun ZHANG
Hiroki FURUE Yasuhiko IKEMATSU
Longye WANG Lingguo KONG Xiaoli ZENG Qingping YU
Ayaka FUJITA Mashiho MUKAIDA Tadahiro AZETSU Noriaki SUETAKE
Xingan SHA Masao YANAGISAWA Youhua SHI
Jiqian XU Lijin FANG Qiankun ZHAO Yingcai WAN Yue GAO Huaizhen WANG
Sei TAKANO Mitsuji MUNEYASU Soh YOSHIDA Akira ASANO Nanae DEWAKE Nobuo YOSHINARI Keiichi UCHIDA
Kohei DOI Takeshi SUGAWARA
Yuta FUKUDA Kota YOSHIDA Takeshi FUJINO
Mingjie LIU Chunyang WANG Jian GONG Ming TAN Changlin ZHOU
Hironori UCHIKAWA Manabu HAGIWARA
Atsuko MIYAJI Tatsuhiro YAMATSUKI Tomoka TAKAHASHI Ping-Lun WANG Tomoaki MIMOTO
Kazuya TANIGUCHI Satoshi TAYU Atsushi TAKAHASHI Mathieu MOLONGO Makoto MINAMI Katsuya NISHIOKA
Masayuki SHIMODA Atsushi TAKAHASHI
Yuya Ichikawa Naoko Misawa Chihiro Matsui Ken Takeuchi
Katsutoshi OTSUKA Kazuhito ITO
Rei UEDA Tsunato NAKAI Kota YOSHIDA Takeshi FUJINO
Motonari OHTSUKA Takahiro ISHIMARU Yuta TSUKIE Shingo KUKITA Kohtaro WATANABE
Iori KODAMA Tetsuya KOJIMA
Yusuke MATSUOKA
Yosuke SUGIURA Ryota NOGUCHI Tetsuya SHIMAMURA
Tadashi WADAYAMA Ayano NAKAI-KASAI
Li Cheng Huaixing Wang
Beining ZHANG Xile ZHANG Qin WANG Guan GUI Lin SHAN
Sicheng LIU Kaiyu WANG Haichuan YANG Tao ZHENG Zhenyu LEI Meng JIA Shangce GAO
Kun ZHOU Zejun ZHANG Xu TANG Wen XU Jianxiao XIE Changbing TANG
Soh YOSHIDA Nozomi YATOH Mitsuji MUNEYASU
Ryo YOSHIDA Soh YOSHIDA Mitsuji MUNEYASU
Nichika YUGE Hiroyuki ISHIHARA Morikazu NAKAMURA Takayuki NAKACHI
Ling ZHU Takayuki NAKACHI Bai ZHANG Yitu WANG
Toshiyuki MIYAMOTO Hiroki AKAMATSU
Yanchao LIU Xina CHENG Takeshi IKENAGA
Kengo HASHIMOTO Ken-ichi IWATA
Shota TOYOOKA Yoshinobu KAJIKAWA
Kyohei SUDO Keisuke HARA Masayuki TEZUKA Yusuke YOSHIDA
Hiroshi FUJISAKI
Tota SUKO Manabu KOBAYASHI
Akira KAMATSUKA Koki KAZAMA Takahiro YOSHIDA
Tingyuan NIE Jingjing NIE Kun ZHAO
Xinyu TIAN Hongyu HAN Limengnan ZHOU Hanzhou WU
Shibo DONG Haotian LI Yifei YANG Jiatianyi YU Zhenyu LEI Shangce GAO
Kengo NAKATA Daisuke MIYASHITA Jun DEGUCHI Ryuichi FUJIMOTO
Jie REN Minglin LIU Lisheng LI Shuai LI Mu FANG Wenbin LIU Yang LIU Haidong YU Shidong ZHANG
Ken NAKAMURA Takayuki NOZAKI
Yun LIANG Degui YAO Yang GAO Kaihua JIANG
Guanqun SHEN Kaikai CHI Osama ALFARRAJ Amr TOLBA
Zewei HE Zixuan CHEN Guizhong FU Yangming ZHENG Zhe-Ming LU
Bowen ZHANG Chang ZHANG Di YAO Xin ZHANG
Zhihao LI Ruihu LI Chaofeng GUAN Liangdong LU Hao SONG Qiang FU
Kenji UEHARA Kunihiko HIRAISHI
David CLARINO Shohei KURODA Shigeru YAMASHITA
Qi QI Zi TENG Hongmei HUO Ming XU Bing BAI
Ling Wang Zhongqiang Luo
Zongxiang YI Qiuxia XU
Donghoon CHANG Deukjo HONG Jinkeon KANG
Xiaowu LI Wei CUI Runxin LI Lianyin JIA Jinguo YOU
Zhang HUAGUO Xu WENJIE Li LIANGLIANG Liao HONGSHU
Seonkyu KIM Myoungsu SHIN Hanbeom SHIN Insung KIM Sunyeop KIM Donggeun KWON Deukjo HONG Jaechul SUNG Seokhie HONG
Manabu HAGIWARA
Yukihiro NAKAMURA Takashi KAMBE
Kazutoshi KOBAYASHI Masanao YAMAOKA Yukifumi KOBAYASHI Hidetoshi ONODERA Keikichi TAMARU
We propose a functional memory for addition (FMA), which is a memory-merged logic LSI. It is a memory as well as a SIMD parallel processor. To minimize the area, a precessing element (PE) consists of several DRAM words and a bit-serial ALU. The ALU has a functionality of addition bit by bit. This paper describes two FMA experimental LSIs. One is for general purpose, and the other is for full search block matching of image compression. We estimate that a 0.18 µm process realizes 57,000 PEs in a 50 mm2 die, achieving 205 GOPS under 1.36 W power.
Norbert IMLIG Tsunemichi SHIOZAWA Ryusuke KONISHI Kiyoshi OGURI Kouichi NAGAMI Hideyuki ITO Minoru INAMORI Hiroshi NAKADA
This paper introduces a flexible, stream-oriented dataflow processing model based on the "Communicating Logic (CL)" framework. As the target architecture, we adopt the dual layered "Plastic Cell Architecture (PCA). " Datapath processing functionality is encapsulated in asynchronous hardware objects with variable graining and implemented using look-up tables. Communication (i.e. connectivity and control) between the distributed processing objects is achieved by means of inter-object message passing. The key point of the CL approach is that it offers the merits of scalable performance, low power hardware implementation with the user friendly compilation and linking capabilities unique to software.
Abderazek BEN ABDALLAH Mudar SAREM Masahiro SOWA
Superscalar processors can achieve increased performance by issuing instructions Out-of-Order (OoO) from the original instruction stream. Implementing an OoO instruction scheme requires a hardware mechanism to prevent incorrectly executed instructions from updating registers values. In addition, performance decreases if data dependencies, a branch or a trap among instructions appears. To this end we propose a new mechanism named Dynamic Fast Issue (DFI) mechanism to issue instructions in an OoO fashion to multiple parallel functional units without considerable hardware complexity. The above system, which will be implemented in our Superscalar Functional Assignments Register Microprocessor(FARM), solves data dependencies, supports precise interrupt and branch prediction, which are the main problems associated with the dynamic scheduling of instructions in superscalar machines. Results are written only once,Write-once, directly into the register file (RF). To ensure that results are written in order in their appropriate output registers, a record of instruction order and state is maintained by a status buffer (STB). A 64 entries integrated register file is implemented to hold both renamed and logical registers. To recover the processor state from an interrupt or a branch miss-prediction, a status buffer (STB) and a recovery list table (RLT) are implemented. Novel aspects of the above system architecture as well as the principle underlying this process and the constraints that must be met is presented. Performance evaluation results are performed through full-pipelined-level architectural simulator and SPECint95 benchmark programs.
This paper describes a new algorithm for configuring the array of adders used to add the partial products in a multiplier circuit. The new algorithm reduces not only the number of half adders in an adder tree, but also the number of operands passed to the block generating the final product in a multiplier. The arrays obtained with this algorithm are smaller than Wallace's ones and have fewer outputs than Dadda's arrays. We show some evaluation figures and preliminary simulation results of 4, 8 and 16-bit tree configurations.
Heng-Liang HUANG Jiing-Yuan LIN Wen-Zen SHEN Jing-Yang JOU
As the function of a system getting more complex, IP (Intellectual Property) reusing is the trend of system design style. Designers need to evaluate the performance and features of every candidate IP block that can be used in their design, while IP providers hope to keep the structure of their IP blocks a secret. An IP level power model is a model that takes only the primary input statistics as parameters and does not reveal any information about the sizes of the transistors or the structure of the circuit. This paper proposes a new method for constructing power model that is suitable for IP level circuit blocks. It is a nominal point selection method for power models based on power sensitivities. By analyzing the relationship between the dynamic power consumption of CMOS circuits and their input signal statistics, a guideline of selecting the nominal point is proposed. From our analysis, the first nominal point is selected to minimize the average estimation error and two other nominal points are selected to minimize the maximum estimation error. Our experimental results on a number of benchmark circuits show the effectiveness of the proposed method. Average estimation accuracy within 5.78% of transistor level simulations is achieved. The proposed method can be applied to build a system level power estimation environment without revealing the contents of the IP blocks inside. Thereby, it is a promising method for IP level power model construction.
Kazuyoshi TAKEMURA Masanobu MIZUNO Akira MOTOHARA
This paper presents a system-level bus architecture validation technique and shows its application to a consumer product design. This technique enables the entire system to be validated with bus cycle accuracy using bus architecture level models derived from their corresponding behavioral level models. Experimental results from a digital still camera (DSC) system design show that our approach offers much faster simulation speed than register transfer level (RTL) simulators. Using this fast and accurate validation technique, bus architecture designs, validations and optimizations can be effectively carried out at system-level and total turn around time of system designs can be reduced dramatically.
Rafael K. MORIZAWA Takashi NANYA
A known problem of the four-phase handshaking protocol is that a return-to-zero phase of the signals involved in the handshake is necessary before starting another cycle, in which no useful work is usually done. In this paper we first define an easy-to-write specification style to specify four-phase handshaking asynchronous controllers that can be translated to an STG to obtain a gate-level implementation using existing synthesis methods. Then, we propose an algorithm that takes the specification written using our specification style and finds an optimized timing in which the idle-phase overhead of its gate-level implementation is reduced.
Mizuki TAKAHASHI Nagisa ISHIURA Akihisa YAMADA Takashi KAMBE
This paper presents a method of thread composition in a hardware compiler Bach. Bach synthesizes RT level circuits from a system description written in Bach-C language, where a system is modeled as communicating processes running in parallel. The system description is decomposed into threads, i.e., strings of sequential processes, by grouping processes which are not executed in parallel. The set of threads are then converted into behavioral VHDL models and passed to a behavioral synthesizer. The proposed method attempts to find a thread configuration that maximize resource sharing among processes in the threads. Experiments on two real designs show that the circuit sizes were reduced by 3.7% and 14.7%. We also show the detailed statistics and analysis of the size of the resulting gate level circuits.
Nozomu TOGAWA Tatsuhiko WAKUI Tatsuhiko YODEN Makoto TERAJIMA Masao YANAGISAWA Tatsuo OHTSUKI
CAM (Content Addressable Memory) units are generally designed so that they can be applied to variety of application programs. However, if a particular application runs on CAM units, some functions in CAM units may be often used and other functions may never be used. We consider that appropriate design for CAM units is required depending on the requirements for a given application program. This paper proposes a CAM processor synthesis system based on behavioral descriptions. The input of the system is an application program written in C including CAM functions, and its output is hardware descriptions of a synthesized processor and a binary code executed on it. Since the system determines functions in CAM units and synthesizes a CAM processor depending on the requirements of an application program, we expect that a synthesized CAM processor can execute the application program with small processor area and delay. Experimental results demonstrate its efficiency and effectiveness.
In the beginning of the new century, many information appliance (IA) products will replace traditional electronic appliances to help people in smart, efficient, and low-cost ways. These successful products must be capable of communicating multimedia information, which is embedded into the electronic appliances with high integration, innovation, and power-throughput tradeoff. In this paper, we develop a codesign procedure to analyze, compare, and emulate the multimedia communication applications to find the candidate implementations under different criteria. The experimental results demonstrate that in general, memory technology dominates the optimal tradeoff and ALU improvements impact greatly on particular applications. The results also show that the proposed procedure is effective and quite efficient.
Tae-Suh PARK Chong-Ho LEE Duck-Jin CHUNG
This paper presents an evolutionary technique to build and maintain fault-recoverable digital circuits. As the synthesis of a circuit by genetic algorithm is progressed according to the circuit behavioral objectives and interactions with the environments, the knowledge regarding the architecture as well as the placement and routing processes is not the major concern of the proposed method. The evolutionary behavior of the circuit also prevents the circuit from stuck-at faults by continuously modifying the neighboring circuit blocks accordingly. This is done without the prior knowledge of where and how the faults occur because of the evolutionary nature. Thus, the overhead circuit blocks for fault diagnosis and redundancy are minimized with this design. The fault-recoverable evolvable hardware circuits are synthesized to build a few combinational logics by evolution and the fault recovery capabilities are shown with the reconfigurable FPGA.
Hafiz Md. HASAN BABU Tsutomu SASAO
In this paper, we propose a method to minimize multiple-valued decision diagrams (MDDs) for multiple-output functions. We consider the following: (1) a heuristic for encoding the 2-valued inputs; and (2) a heuristic for ordering the multiple-valued input variables based on sampling, where each sample is a group of outputs. We first generate a 4-valued input 2-valued multiple-output function from the given 2-valued input 2-valued functions. Then, we construct an MDD for each sample and find a good variable ordering. Finally, we generate a variable ordering from the orderings of MDDs representing the samples, and minimize the entire MDDs. Experimental results show that the proposed method is much faster, and for many benchmark functions, it produces MDDs with fewer nodes than sifting. Especially, the proposed method generates much smaller MDDs in a short time for benchmark functions when several 2-valued input variables are grouped to form multiple-valued variables.
Tetsushi KATAYAMA Hiroyuki OCHI Takao TSUDA
Binary Decision Diagrams (BDDs) are graph representation of Boolean functions. In particular, Ordered BDDs (OBDDs) are useful in many situations, because they provide canonical representation and they are manipulated efficiently. BDD packages which automatically generate OBDDs have been developed, and they are now widely used in logic design area, including formal verification and logic synthesis. Synthesis of pass-transistor circuits is one of successful applications of such BDD packages. Pass-transistor circuits are generated from BDDs by mapping each node to a selector which consists of two or four pass transistors. If circuits are generated from smaller BDDs, generated circuits have smaller number of transistors and hence save chip area and power consumption. In this paper, more generic BDDs which have no restrictions in variable ordering and variable appearance count on its paths are called Generic BDDs (GBDDs), and an algorithm for generating GBDDs is proposed for the purpose of synthesis of pass-transistor circuits. The proposed algorithm consists of two steps. At the first step, parse trees (PTs) for given Boolean formulas are generated, where a PT is a directed tree representation of Boolean formula(s) and it consists of literal nodes and operation nodes. In this step, our algorithm attempts to reduce the number of literal nodes of PTs. At the second step, a GBDD is generated for the PTs using Concatenation Method, where Concatenation Method generates a GBDD by connecting GBDDs vertically. In this step, our algorithm attempts to share isomorphic subgraphs. In experiments on ISCAS'89 and MCNC benchmark circuits, our program successfully generated 32 GBDDs out of 680 single-output functions and 4 GBDDs out of 49 multi-output functions whose sizes are smaller than OBDDs. GBDD size is reduced by 23.1% in the best case compared with OBDD.
Hiroshi SAWADA Shigeru YAMASHITA Akira NAGOYA
This paper presents a new method that efficiently generates all of the kernels of a sum-of-products expression. Its main feature is the memorization of the kernel generation process by using a graph structure and implicit cube set representations. We also show its applications for common logic extraction. Our extraction method produces smaller circuits through several extensions than the extraction method based on two-cube divisors known as best ever.
Qiang ZHU Yusuke MATSUNAGA Shinji KIMURA Katsumasa WATANABE
Combinational logic circuits are usually implemented as multi-level networks of logic nodes. Multi-level logic simplification using the don't cares on each node is widely used. Large don't cares give good simplification results, but suffer from huge memory area and computation time. Extraction of useful don't cares and reduction of the size of the don't cares are important problems on the simplification using don't cares. In the paper, we propose a new robust heuristic method for the selection of don't cares. We consider an adaptive subnetwork for each simplified node in the network and introduce a stepwise enhancement method of the subnetwork considering the memory area and the network structure. The don't cares extracted from the adaptive subnetworks are called the local don't cares. We have implemented our method for satisfiability don't cares and observability don't cares. We have applied the method on MCNC89 benchmarks, and compared the experimental results with those of the SIS system. The results demonstrate the superiority of our method on the quality of the results and on the size of applicable circuits.
Barry SHACKLEFORD Etsuko OKUSHI Mitsuhiro YASUDA Hisao KOIZUMI Katsuhiko SEO Hiroto YASUURA
The problem of synthesizing a minimum-cost logic network is formulated for a genetic algorithm (GA). When benchmarked against a commercial logic synthesis tool, an odd parity circuit required 24 basic cells (BCs) versus 28 BCs for the design produced by the commercial system. A magnitude comparator required 20 BCs versus 21 BCs for the commercial system's design. Poor temporal performance, however, is the main disadvantage of the GA-based approach. The design of a hardware-based cost function that would accelerate the GA by several thousand times is described.
Tomonori IZUMI Ryuji KAN Yukihiro NAKAMURA
Recently, Plastic Cell Architecture (PCA) has been proposed as a hard-wired general-purpose autonomously reconfigurable processor. PCA consists of two layers, the plastic part on which sequential logic circuits are implemented, and the built-in part which induces the plastic part to dynamically reconfigure the circuits and transports messages among the circuits. The plastic part consists of an array of LUT-based reconfigurable logic primitives, each of which is connected only to adjacent ones. Combining logic and layout synthesis, we propose a new array-based algorithm to map logic functions into the PCA plastic part. This algorithm produces a folded array of sum-of-multi-input-complex-terms, especially for the PCA plastic part.
Hsien-Ho CHUANG Jing-Yang JOU C. Bernard SHUNG
A delay-optimal technology mapping algorithm is developed on a general model of FPGA with hard-wired non-homogeneous logic block architectures which is composed of different sizes of look-up tables (LUTs) hard-wired together. This architecture has the advantages of short delay of hard-wired connections and area-efficiency of non-homogeneous structure. The Xilinx XC4000 is one commercial example, where two 4-LUTs are hard-wired to one 3-LUT. In this paper, we present a two-dimensional labeling approach and a level-2 node cut algorithm to handle the hard-wired feature. The experimental results show that our algorithm generates favorable results for Xilinx XC4000 CLBs. Over a set of MCNC benchmarks, our algorithm produces results with 17% fewer CLB depth than that of FlowMap in similar CPU time on average, and with 4% fewer CLB depth than that of PDDMAP on average while PDDMAP needs 15 times more CPU time.
Tomoyuki YODA Atsushi TAKAHASHI
A semi-synchronous circuit is a circuit in which the clock is assumed to be distributed periodically to each individual register, though not necessarily to all registers simultaneously. In this paper, we propose an algorithm to achieve the target clock period by modifying a given target clock schedule as small as possible, where the realization cost of the target clock schedule is assumed to be minimum. The proposed algorithm iteratively improves a feasible clock schedule. The algorithm finds a set of registers that can reduce the cost by changing their clock timings with same amount, and changes the clock timing with optimal amount. Experiments show that the algorithm achieves the target clock period with fewer modifications.
Masanori HASHIMOTO Hidetoshi ONODERA
This paper discusses a gate resizing method for performance enhancement based on statistical static timing analysis. The proposed method focuses on timing uncertainties caused by local random fluctuation. Our method aims to remove both over-design and under-design of a circuit, and realize high-performance and high-reliability LSI design. The effectiveness of our method is examined by 6 benchmark circuits. We verify that our method can reduce the delay time further from the circuits optimized for minimizing the delay without the consideration of delay fluctuation.
Jun'ichiro MINAMI Tetsushi KOIDE Shin'ichi WAKABAYASHI
This paper presents a timing-driven iterative improvement circuit partitioning algorithm under path delay constraints for the general delay model. The proposed algorithm is an extension of the Fiduccia & Mattheyses (FM) method so as to handle path delay constraints and consists of the clustering and iterative improvement phases. In the first phase, we reduce the size of a given circuit, with a new clustering algorithm to obtain a partition in a short computation time. Next, the iterative improvement phase based on the FM method is applied, and then a new path-based timing violation removal algorithm is also performed so as to remove all the timing violations. From experimental results for ISCAS89 benchmarks, we have demonstrated that the proposed algorithm can produce the partitions which mostly satisfy the timing constraints.
Kazuhisa OKADA Takayuki YAMANOUCHI Takashi KAMBE
In this paper, we propose a cell synthesis method for a Salicide process. Our method utilizes the local interconnect between adjacent transistors, which is available in some Salicide processes, and optimizes the transistor placement of a cell considering both area and the number of local interconnects. In this way we reduce the number of metal wires and contacts. The circuit model is not restricted to conventional series-parallel CMOS logic, and our method enables us to synthesize CMOS pass-transistor circuits. Experimental results show that our method uses the local interconnect effectively, and optimizes both cell area and metal wire length.
Shunji SAIKA Masahiro FUKUI Masahiko TOYONAGA Toshiro AKINO
Another high performance simulated annealing is proposed which we call widely stepping simulated annealing (WSSA). It flies from a starting high temperature to a finishing low temperature staying at only twenty or so temperatures to approach thermal equilibriums. We survey the phase transition in simulated annealing process and estimate the major cost variation (dEc) at the critical temperature. The WSSA uses a function (H(t)) that represents the probability for a hill-climbing with the dEc of cost increase to be accepted in Metropolis' Monte Carlo simulation at temperature t. We have applied the first version of WSSA to one dimensional transistor placement optimizations for several industrial standard cells, and compared its performance with simulated annealing with a geometrically scheduled cooling. The solutions by the WSSA are as good as, and sometimes much better than, the solutions by the simulated annealing, while the time consumption by the WSSA is properly under one 30th of that by the simulated annealing.
Tomohiro FUJITA Hidetoshi ONODERA
In this paper we present a case study of a hierarchical statistical analysis. The method which we use here bridges the statistical information between process-level and system-level, and enables us to know the effect of the process variation on the system performance. We use two modeling techniques--intermediate model and response surface model--in order to link the statistical information between adjacent design levels. We show an experiment of the hierarchical statistical analysis applied to a Phase Locked Loop (PLL) circuit, and indicate that the hierarchical statistical analysis is practical with respect to both accuracy and simulation cost. Following three applications are also presented in order to show advantage of this linking method; these are Monte Carlo analysis, worst-case analysis, and sensitive analysis. The results of the Monte Carlo and the worst-case analysis indicate that this method is realistic statistical one. The result of the sensitive analysis enables us to evaluate the effect of process variation at the system level. Also, we can derive constraints on the process variation from a performance requirement.
Kazuhiro NAKAMURA Shinji MARUOKA Shinji KIMURA Katsumasa WATANABE
Multi-cycle paths are paths between registers where 2 or more clock cycles are allowed to propagate signals, and the detection of multi-cycle paths is important in deciding proper clock period, timing verification and logic optimization. This paper presents a satisfiability-based multi-cycle path detection method, where the detection problems are reduced to CNF formulae and the satisfiability is checked using SAT provers. We also show heuristics on conversion from multi-level circuits into CNF formulae. We have applied our method to ISCAS'89 benchmarks and other sample circuits. Experimental results show the remarkable improvements on the size of manipulatable circuits.
The use of the column-rank of the system sensitivity matrix as a testability measure for parametric faults in linear analog circuits was pioneered by Sen and Saeks in 1970s, and later re-introduced by several others. Its practical use has been limited by how it can be calculated. Numerical algorithms suffer from inevitable round-off errors, while traditional symbolic techniques can only handle very small circuits. In this paper, an efficient method is introduced for the analysis of Sen and Saeks' analog testability. The method employs determinant decision diagram based symbolic circuit analysis. Experimental results have demonstrated the new method is capable of handling much larger analog circuits.
High-speed systems require a wide-frequency-range clock system for data processing. Phase-locked loop (PLL) is used for such a system that requires wide-range variable frequency clock. Frequency calibration method enables the voltage-controlled oscillator (VCO) in a PLL to cover the expected frequency range for high-speed applications that require a wide locking range. Frequency range adjustment is implemented by means of a current digital to analog converter (DAC), which controls the performance curves of a VCO and a bias circuit. This method adjusts the VCO's frequency-voltage performance curves before functional operation so that a PLL can cover requested frequency range with its best condition. Both the limit of control voltage and its target reference voltage are given with same voltage reference. This ensures correct performance after frequency adjustment even under the temperature fluctuation. It eliminates post-production physical adjustment such as fuse trimming which increases the cost and TAT in manufacturing and testing. A high-speed wide-locking range VCO with an automatic frequency performance calibration circuit is implemented within small space in a high-speed hard disk drive channel with 0.25-µm 2.5 V CMOS four-layer metal technology.
A 3.3 V CMOS PLL (Phase Locked loop) with a self-feedback VCO (Voltage Controlled Oscillator) is designed for a high frequency, low voltage, and low power applications. This paper proposes a new PLL architecture to improve voltage-to-frequency linearity of VCO with a new delay cell. The proposed VCO with a self-feedback path operates at a wide frequency range of 30 MHz-1 GHz with a good linearity. The DC-DC Voltage Up/Down Converter is newly designed to regulate the control voltage of the two-stage VCO. The designed PLL architecture is implemented on a 0.6 µm n-well CMOS process. The simulation results illustrate a locking time of 2.6 µsec at 1 GHz, lock in range of 100 MHz-1 GHz, and a power dissipation of 112 mW.
This paper presents the design and simulation of a power efficient 1:4 interpolation FIR filter with partitioned look Up Table (LUT) structure. Using the symmetry of the filter coefficients and the contents of the LUT, the area of the proposed filter is minimized. The two filters share the partitioned LUT and activate the LUT selectively to realize the low power operation. Experimental results suggest that the proposed filter reduces the power consumption by 25% and simultaneously reduces the gate area by 7% compared to the previously proposed single-architecture dual-channel filter.
Luc RYNDERS Patrick SCHAUMONT Serge VERNALDE Ivo BOLSENS
Timing verification of digital synchronous designs is a complex process that is traditionally carried out deep in the design cycle, at the gate level. A method, embodied in a C++ based design system, is presented that allows modeling and verification of clock regions at a higher level. By combining event-driven, clock-cycle true and behavioral simulation, we are able to perform static and dynamic timing analysis of the clock regions.
Sumitaka SAKAUCHI Yoichi HANEDA Shoji MAKINO Masashi TANAKA Yutaka KANEDA
We investigated the dependence of the desired echo return loss on frequency for various hands-free telecommunication conditions by subjective assessment. The desired echo return loss as a function of frequency (DERLf) is an important factor in the design and performance evaluation of a subband echo canceller, and it is a measure of what is considered an acceptable echo caused by electrical loss in the transmission line. The DERLf during single-talk was obtained as attenuated band-limited echo levels that subjects did not find objectionable when listening to the near-end speech and its band-limited echo under various hands-free telecommunication conditions. When we investigated the DERLf during double-talk, subjects also heard the speech in the far-end room from a loudspeaker. The echo was limited to a 250-Hz bandwidth assuming the use of a subband echo canceller. The test results showed that: (1) when the transmission delay was short (30 ms), the echo component around 2 to 3 kHz was the most objectionable to listeners; (2) as the transmission delay rose to 300 ms, the echo component around 1 kHz became the most objectionable; (3) when the room reverberation time was relatively long (about 500 ms), the echo component around 1 kHz was the most objectionable, even if the transmission delay was short; and (4) the DERLf during double-talk was about 5 to 10 dB lower than that during single-talk. Use of these DERLf values will enable the design of more efficient subband echo cancellers.
Shoichi TAKEDA Shuichi KATO Koki TORIUMI
Aged people who live alone are in particular need of a daily health check, medication, and of warm communication with family and friends. The authors have been developing a life-support computer system with such functions. Among them, a daily health check function with the capability of measuring blood pressure, detecting diseases from coughing, and so on would in particular be very powerful for primary care. As a first step to achieving quick services for a daily health check with a personal computer, utilization of cough information is considered. Features of cough data are analyzed aiming at developing an automatic cough data detection method. This paper proposes a novel method for extracting cough signals from other types of signals. The differential coefficient of a low-pass filtered waveform is first shown to be an effective parameter for discriminating between vowel and cough signals, and the relationship between cut-off frequency and cough detection rate is clarified. This parameter is then applied to cough signals mixed with vowel signals or white noises to evaluate robustness. The evaluation tests show that the cough feature can be perfectly detected for a 20 dB S/N ratio when the cut-off frequency is set to 24 [Hz]. The experimental results suggest that the proposed cough detection method can be a useful tool as a primary care for aged people with a bronchitis like an asthmatic bronchitis and a bronchopneumonia.
Sung-Wook JUNG Chang-Gene WOO Sang-Won OH Hae-Moon SEO Pyung CHOI
The delta-sigma modulator (DSM) is an excellent choice for high-resolution analog-to-digital converters. Recently, a band-pass DSM has been a desirable choice for direct conversion of an IF signal into a digital bit stream. This paper proposes a quadrature band-pass DSM for digitizing a narrow-band IF signal. This modulator can achieve a lower total order, higher signal-to-noise ratio (SNR), and higher bandwidth when compared with conventional band-pass modulators. An experimental prototype employing the quadrature topology has been integrated in 0.6 µm, double-poly, double-metal CMOS technology with capacitors synthesized from a stacked poly structure. This system clocked at 13 MHz and digitized a 200 kHz bandwidth signal centered at 4.875 MHz with 100 dB of dynamic range. Power consumption is 190 mW at 5 V.
Chung-Hsin LIU Nen-Fu HUANG Chiou-Yng LEE
This study presents two new bit-parallel cellular multipliers based on an irreducible all one polynomial (AOP) over the finite field GF(2m). Using the property of the AOP, this work also presents an efficient algorithm of inner-product multiplication for computing AB2 multiplications is proposed, with a structure that can simplify the time and space complexity for hardware implementations. The first structure employs the new inner-product multiplication algorithm to construct the bit-parallel cellular architecture. The designed multiplier only requires the computational delays of (m+1)(TAND+TXOR). The second proposed structure is a modification of the first structure, and it requires (m+2) TXOR delays. Moreover, the proposed multipliers can perform A2iB2j computations by shuffling the coefficients to make i and j integers. For the computing multiplication in GF(2m), the novel multipliers turn out to be efficient as they simplify architecture and accelerate computation. The two novel architectures are highly regular, simpler, and have shorter computation delays than the conventional cellular multipliers.
Mohd Abdur RASHID Masao KODAMA
Debye's asymptotic series is frequently used for calculation of cylindrical functions. However, it seems that until now this series has not been used in all-purpose programs for numerical calculation of the cylindrical functions. The authors attempt to develop these all-purpose programs. We present some improvements for the numerical calculation. As the results, Debye's series can be used for the all-purpose programs, and it is found out that the series gives sufficient accuracy if some conditions are satisfied.
Shin-ichi NAKAYAMA Shigeru MASUYAMA
This paper presents an O(n2)-time algorithm for constructing two edge-disjoint paths connecting two given pairs of vertices in a given tournament graph. It improves the time complexity of a previously known O(n4)-time algorithm.
Manjula SANDIRIGAMA Akihiro SHIMIZU Matu-Tarow NODA
In this paper we propose SAS-Coin, a very practical micro payment scheme based on a hash chain and a simple one time password authentication protocol called SAS. While it has many desirable features of a coin (anonymity etc.), it has no public key operations at any stage and has very little overheads. Moreover authentication is also available and a session key could be generated for encrypted information supply without any additional cost at all. Since there are no public key operations this is extremely useful for mobile telephone applications. This has sufficient security even for larger payments. Comparative analysis with some of the already proposed systems is also done.
Kunihiko SADAKANE Hiroshi IMAI
Two new algorithms for improving the speed of the LZ77 compression are proposed. One is based on a new hashing algorithm named two-level hashing that enables fast longest match searching from a sliding dictionary, and the other uses suffix sorting. The former is suitable for small dictionaries and it significantly improves the speed of gzip, which uses a naive hashing algorithm. The latter is suitable for large dictionaries which improve compression ratio for large files. We also experiment on the compression ratio and the speed of block sorting compression, which uses suffix sorting in its compression algorithm. The results show that the LZ77 using the two-level hash is suitable for small dictionaries, the LZ77 using suffix sorting is good for large dictionaries when fast decompression speed and efficient use of memory are necessary, and block sorting is good for large dictionaries.
Kiattichai SAOWAPA Haruhiko KANEKO Eiji FUJIWARA
This paper presents a class of binary block codes capable of correcting single synchronization errors and single reversal errors with fewer check bits than the existing codes by 3 bits. This also shows a decoding circuit and analyzes its complexity.
DC-free error-correcting codes based on partition chain are presented in this paper. The partition chain can be constructed from code partition chain of Reed-Muller codes. The line coding parameters for the partition chain such as maximum runlength and running digital sum are obtained. The trellis and multilevel code structure can be used to design the DC-free error-correcting codes. Especially, by adopting DC-free trellis codes as constituent codes, DC-free turbo codes can be designed. As results, the presented DC-free error-correcting codes have good coding characteristics.
Seungjin CHOI Shunichi AMARI Andrzej CICHOCKI
Spatio-temporal decorrelation is the task of eliminating correlations between associated signals in spatial domain as well as in time domain. In this paper, we present a simple but efficient adaptive algorithm for spatio-temporal decorrelation. For the task of spatio-temporal decorrelation, we consider a dynamic recurrent network and calculate the associated natural gradient for the minimization of an appropriate optimization function. The natural gradient based spatio-temporal decorrelation algorithm is applied to the task of blind deconvolution of linear single input multiple output (SIMO) system and its performance is compared to the spatio-temporal anti-Hebbian learning rule.
Yeon-Dae KWON Yasunori ISHIHARA Shougo SHIMIZU Minoru ITO
Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NP-complete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called k-sparse databases, for which we can efficiently find all the large itemsets. Intuitively, k-sparsity of a database means that the supports of itemsets of size k or more are sufficiently low in the database. In this paper, we introduce the notion of (k,c)-sparsity, which is strictly weaker than the k-sparsity in our previous work. The value of c represents a degree of sparsity. Using (k,c)-sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly co-occurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly co-occurrent itemset problem formally as deciding whether there exists a highly co-occurrent itemset with a given size, and show that the problem is NP-complete under whichever measure. Furthermore, based on the notion of (k,c)-sparsity, we propose subclasses of databases for which we can efficiently find all the highly co-occurrent itemsets.
Ishtiaq Rasool KHAN Ryoji OHBA
New designs of MAXFLAT discrete and differentiating Hilbert transformers are presented using their interrelationships with digital differentiators. The new designs have the explicit formulas for their tap-coefficients, which are further modified to obtain a new class of narrow transition band filters, with a performance comparable to the Chebyshev filters.
Kensaku FUJII Yoshinori TANAKA
The signed regressor algorithm, a variation of the least mean square (LMS) algorithm, is characterized by the estimation way of using the clipped reference signals, namely, its sign (
This paper introduces a generalized cyclic convolution which can be implemented via the conventional cyclic convolution system by the discrete Fourier transform (DFT) with pre-multiplication for the input and post-multiplication for the output. The generalized cyclic convolution is applied for computing a negacyclic convolution. Comparison shows that the proposed implementation is more efficient and simpler in structure than other methods. The modified Fermat number transform (MFNT) is known to be useful for computing a linear convolution of integer-valued sequences. The generalized cyclic convolution is also applied for generalizing the linear convolution system by MFNT, and easing the signal length restriction imposed by the system.
Hee-Suk PANG SeongJoon BAEK Koeng-Mo SUNG
A simple but effective fundamental frequency estimation method is proposed using parametric cubic convolution. The performance of the method is shown to be good not only for the stationary signals but also for the signal whose fundamental frequency is changing with time. In the simulation, comparisons with other high-accuracy methods are also shown. Due to its accuracy and simplicity, the proposed method is practically useful.
We present a result on the robust stabilization of uncertain nonlinear systems via applying feedback linearization. The allowable size of uncertainty is derived for stability. Based on that, we propose a technique that allows us to handle nonlinear systems which are not input-state linearizable. The usefulness of the technique is illustrated by numerical examples.
Xiaojing SHI Hiroki MATSUMOTO Kenji MURAO
A novel SC (Switched-Capacitor) offset- and gain-compensated sample/hold circuit is presented. It is implemented by a new topology which reduces the effects due to the imperfections of op-amp. Simulation results indicate that the circuit achieves high accuracy without requiring high-quality components.
Recently, an efficient algorithm has been proposed for finding all solutions of systems of nonlinear equations using linear programming. In this algorithm, linear programming problems are formulated by surrounding component nonlinear functions by rectangles. In this letter, it is shown that weakly nonlinear functions can be surrounded by smaller rectangles, which makes the algorithm very efficient.
Hung-Yu CHIEN Jinn-Ke JAN Yuh-Min TSENG
Based on the systematic block codes, we propose a (t,n) multi-secret sharing scheme. Compared with the previous works, our scheme has the advantages of smaller communication overhead, easy generator matrix construction and non-disclosure of users secret shares after multiple secret reconstruction operations. These advantages make the practical implementation of our scheme very attractive.
This letter points out some flaws in the previous works on UKS (unknown key-share) attacks. We show that Blake-Wilson and Menezes' revised STS-MAC (Station-to-Station Message Authentication Code) protocol, which was proposed to prevent UKS attack, is still vulnerable to a new UKS attack. Also, Hirose and Yoshida's key agreement protocol presented at PKC'98 is shown to be insecure against public key substitution UKS attacks. Finally, we discuss countermeasures for such UKS attacks.
Michiharu MAEDA Hiromi MIYAJIMA
This paper describes two competitive learning algorithms from the viewpoint of deleting mechanisms of weight (reference) vectors. The techniques are termed the adaptivity and sensitivity deletions participated in the criteria of partition error and distortion error, respectively. Experimental results show the effectiveness of the proposed approaches in the average distortion.