Yoko KAMIDOI Shin'ichi WAKABAYASHI Noriyoshi YOSHIDA
This paper presents an efficient heuristic algorithm for min-cut bisection of weighted hypergraphs. The proposed algorithm is based on a heuristic algorithm proposed by Kahng, which was devised for non-weighted hypergraph bisection, adopting a non-weighted graph called intersection graph to represent a given hypergraph. In the proposed algorithm, instead of an intersection graph, a bipartite graph called netgraph is newly introduced to explicitly represent the weights of nodes of a hypergraph. Using the netgraph, it is easy to partition a weighted hypergraph into two hypergraphs with same size. Computation time of the proposed method is O(m2), where m is the number of nodes of a given hypergraph. Experimental results with real circuit data show that the proposed method produces better solutions in shorter computation time compared with existing methods.
Ze Cang GU Shoichiro YAMADA Kunio FUKUNAGA Shojiro YONEDA
A new algorithm for timing driven placement based on the fuzzy theory is proposed. In this method, the signal delay on the longest path, the chip area and the total wire length can be simultaneously minimized. Introducing the probability measures of fuzzy events, falling down into the local optimal solutions can be avoided. At first, we define the fuzzy placement relation using the graph distance matrix and fuzzy distance relation matrix, and we give a new placement method based on the fuzzy placement relation and the probability measures of fuzzy events. Secondly, we extend this placement method so as to apply to the timing driven placement problem by introducing a fuzzy membership functions which represent the signal delay on the longest path and the chip area. Finally, experimental results are shown to compare our method with one of the previous methods.
This paper deals with the sub-problems of generating a mask pattern from the logical description of a large-scale CMOS circuit. The large-scale layout can be generated in divide-and-conquer style: divide a given circuit into a set of sub-circuits, generate the layout of each sub-circuit, and merge the resulting layouts to create the whole layout. This paper proposes a layout synthesis algorithm for a sub-circuit with physical constraints for the synthesis scheme above. The physical constraints considered here are the relative placement of logic cells (sets of logic gates) and the routing constraint based on the costs of wiring layers and vias. These constraints will be given by the global optimizer in a two-dimensional layout synthesis routine, and they should be kept at the subsequent one-dimensional layout synthesis for a sub-circuit. The latter is also given for enhancing the circuit performance by limiting the usage of wiring layers and vias for special net such as a clock net. The placement constraint is maintained using PQ-tree, a tree structure representing a set of restricted permutations of elements. One-dimensional layout synthesis determines the placement of transistors by the enhanced pairwise exchanging method under the PQ-tree representation. The routing constraints is considered in the newly developed line-search routing method using a cost-based searching. Experimental results for practical standard cells, including up to 200 transistors, prove that the algorithms can produce the layouts comparable to handcrafted cells. Also on a two-dimensional layout synthesis using the algorithms, the results for benchmark circuits of Physical Design Workshop 1989, i.e., MCNC benchmark circuits, are superior to the best results exhibited at Design Automation Conference 1990.
Masayuki HAYASHI Hiroyoshi YAMAZAKI Shuji TSUKIYAMA Nobuyuki NISHIGUCHI
We propose a hierarchical multi-layer global router for Sea-Of-Gates VLSI's, which is different from the conventional global routers, in that routing and layering are executed simultaneously. The main problems to be solved in the global routing for a multi-layer VLSI are which wire segments are laid out on upper layers and how they are connected to terminals located on lower layers. The main objective is to minimize the maximum of local congestions of all layers. We solve these problems in a hierarchical manner by routing from upper layers to lower layers.
Yoshio HIROSE Hideaki ANBUTSU Koichi YAMASHITA Gensuke GOTO
This paper describes a VLSI processor architecture designed for a back-propagation accelerator. Three techniques are used to accelerate the simulation. The first is a multi-processor approach where a neural network simulation is suitable for parallel processing. By constructing a ring network using several processors, the simulation speed is multiplied by the number of the processors. The second technique is internal parallel processing. Each processor contains 4 multipliers and 4 ALUs that all work in parallel. The third technique is pipelining. The connections of eight functional units change according to the current stage of the back-propagation algorithm. Intermediate data is sent from one functional unit to another without being stored in extra registers and data is processed in a pipeline manner. The data is in 24-bit floating point format (18-bit mantissa and 6-bit oxponent). The chip has about 88,000 gates, including microcode ROM for processor control, the processor is designed using 0.8-µm CMOS gate arrays, and the estimated performance at 40 MHz is 20 million connection updates per second (MCUPS). For a ring network with 4 processors, performance can be enhanced up to 90 MCUPS.
Takao KANEKO Yukio AKAZAWA Mitsuyoshi NAGATANI
An automatic macrocell generator has been developed and applied to the analog layout of SC and active-RC filters. The generator consists of a process independent generation procedure, a leafcell library, and a circuit description of the leafcells. The unit element arrays of the whole filter are generated together to minimize the array height of the entire filter macrocell, so that the area of the generated filter is as small as that of a manually laid out filter. Three SC filters and one active-RC filter were designed and fabricated by 1.5-µm CMOS technology, that successfully yielded an S/N ratio of more than 70 dB with a quick turn around time.
Takao ONOYE Akihisa YAMADA Itthichai ARUNGSRISANGCHAI Masakazu TANAKA Isao SHIRAKAWA
An autonatic layout scheme dedicated to bipolar analog modules is described. A layout model is settled in such a way that the VCC/GND line is laid out on top/bottom edge of a rectangular region, within which the whole elements are placed and interconnected. According to this simple modeling, a layout scheme can be constructed of a series of the following algorithms: First clustering is executed for partitioning a given circuit into clusters, each having connections with VCC and GND lines, and then linear ordering is applied to clusters so as to be placed in a one-dimensional array. After a relative placement of circuits elements in each cluster, a block compactor is implemented by means of packing blocks in each cluster into an idle space, and then a detailed router is conducted to attain 100% interconnection. Finally a layout compactor is invoked to pack all layout patterns into a rectangle of the minimum possible area. A number of implementation results are also shown to reveal the practicability of the proposed analog module generator.
Shinichi HONIDEN Naoshi UCHIHIRA
Net-Oriented Analysis and Design (NOAD) is defined as three items: (1) Various nets are utilized as an effective modeling method. (2) Inter-relationships among verious nets are determined. (3) Verification or analysis methods for nets are provided and they are implemented based on the mathematical theory, that is Net theory. Very few methods have been presented to satisfy these three items. For example, the Real-Time SA method covers item (1) only. The Object-Oriented Analysis and Design method (OOA/OOD) covers items (1) and (2). NOAD can be regarded as an extension to OOA/OOD. This paper discusses how effectively various nets have been used in actual software development support metnods and tools and evaluates such several methods and tools from the NOAD viewpoint.
Frederico Buchholz MACIEL Yoshikazu MIYANAGA Koji TOCHINAI
The throughput of a parallel execution of a Digital Signal Processing (DSP) algorithm is limited by the iteration bound, which is the minimum period between the start of consecutive iterations. It is given by T=max (Ti/Di), where Ti and Di are the total time of operations and the number of delays in loop i, respectively. A schedule is said rate-optimal if its iteration period is T. The throughput of a DSP algorithm execution can be increased by reducing the Ti's, which can be done by taking as many operations as possible out of loops without changing the semantic of the calculation. This paper presents an optimization technique, called Loop Shrinking, which reduces the iteration bound this way by using commutativity, associativity and distributivity. Also, this paper presents a scheduling method, called Period-Driven Scheduling, which gives rate-optimal schedules more efficiently than existing approaches. An implementation of both is then presented for a system in development by the authors. The system shows reduction in the iteration bound near or equal to careful hand-tunning, and hardware-optimal designs in most of the cases.
Satoshi MIKI Hiroshi MIYANAGA Hironori YAMAUCHI
This paper presents a method for LSI implementation of a long-tap acoustic echo canceller algorithm using the residue number system (RNS) and the mixed-radix number system (MRS). It also presents a quantitative comparison of echo canceller architectures, one using the RNS and the other using the binary number system (BNS). In the RNS, addition, subtraction, and multiplication are executed quickly but scaling, overflow detection, and division are difficult. For this reason, no echo canceller using the RNS has been implemented. We therefore try to design an echo canceller architecture using the RNS and the NLMS algorithm. It is shown that the echo canceller algorithm can be effectively implemented using the RNS by introducing the MRS. The quantitative comparison of echo canceller architectures shows that a long-tap acoustic echo canceller can be implemented more effectively in terms of chip size and power dissipation by the architecture using the RNS.
This paper describes a modeling of the cryptography based on a concept of Petri nets. Movement of tokens in the net model shows a dynamic behavior of systems. On the other hand, the cryptography is considered as a bit operation, so that we can point out a common property between the net structure and the cryptography, which provides our idea that movement of tokens of the net model corresponds to a bitoperation of the cryptography. Some effective keys in the net model are considered by means of the net elements, which are based on T-invariant and net structures. It is shown that the keys of the net structured cryptography provide reasonable strength comparing with the data encryption standard (DES).
Boolean unification is an algorithm to obtain the general solution of a given Boolean equation. Since a general solution provides a way to represent a complete don't care set, Boolean unification can be a powerful technique when applied to logic synthesis. In this paper we present various applications of Boolean unification to combinational logic synthesis. Three topics of combinational logic synthesis: redesign, multi-level logic minimization and minimization of Boolean relations are discussed. All these problems can be uniformly formalized as Boolean equations. Experimental results are also reported.
Takenobu TANIDA Toshimasa WATANABE Masahiro YAMAUCHI Kinji ONAGA
The subject of the paper is to propose two approximation algorithms FM_SPLA, FM_DPLA for priority-list scheduling in timed Petri nets. Their capability is compared with that of existing algorithms SPLA, DPLA through experimental results, where SPLA and DPLA have previously been proposed by the authors.
Noriyasu ARAKAWA Terunao SONEOKA
This paper proposes a test case generation method for testing concurrent programs as a black box. Typical applications are system testing for switching systems and inter-operability testing for OSI products. We adopt a two-step approach: first generate the control flow graph which represents global behaviors of a given concurrent program, and then apply conventional test case generation methods for the control flow graph. To generate a control flow graph without state space explosion, the black-box equivalence between system behaviors is introduced. The proposed algorithm generates a minimal control flow graph which consists of representatives of equivalence classes. Two practical techniques for the second step are discussed for a case study using a commercial digital PBX. The results show the feasibility of the proposed method.
Minoru NODA Hiroshi MATSUOKA Norio HIGASHISAKA Masaaki SHIMADA Hiroshi MAKINO Shuichi MATSUE Yasuo MITSUI Kazuo NISHITANI Akiharu TADA
Air-bridge metal interconnection technology is used for upper level power supply line interconnections in GaAs LSI's to reduce the signal propagation delay time. This technology reduces both parasitic capacitance between the signal line and the power supply line, and propagation delay in the signal line to about 10% and about 50%, respectively, compared to conventional 3-level interconnections without air-bridges. Under standard load conditions (FI=FO=2, length of load line=2 mm), the air-bridge technique leads to gate propagation delays which are about 60% of those in conventional interconnections. We fabricated 2.1-k gate Gate Arrays and 4-kb SRAM's using the air-bridge structure to interconnect power supply lines. For a Gate Array with 0.7 µm gate Buried P-layer Lightly Doped Drain (BPLDD) FET's, the typical gate propagation delay under standard load conditions was about 110 ps with a dissipation power of 1.4 mW/gate. SRAM's with 05 µm gate BPLDD's had typical access time (tacc) of 1.5 ns with a dissipation power of 700 mW/chip.
Masahiro HIGUCHI Hiroyuki SEKI Tadao KASAMI
Many practical communication protocols provide priority service as well as ordinary service. In such a protocol, the protocol machines can initiate a priority service at most of the states. This characteristic leads an extreme increment of the number of state transitions on the protocol machines and causes state space explosion in verification of safety property of the protocol. This paper describes a method of constructing a communication protocol from composition of a subprotocol for ordinary service and that for priority service. This paper also presents a sufficient condition for a composed protocol to inherit safety property from the subprotocols. By using the composition method and the sufficient condition, the decision problem for safety property of the composed protocol can be reduced to those of the subprotocols. An experimental result of verification of a part of OSI session protocol is also described. The result shows that the method can reduce the computation time for verifying safety property to about 3% against the naive way.
An integrated platform INTEGRAL has been developed for developing large complex communication software systems. At the heart of INTEGRAL, a pair of graphical and textual specification languages, DISCOL (DIStributed Communication-Oriented Language), has been developed based on Petri nets. Around DISCOL, a wide variety of design and analysis tools have been integrated in coherent manner so that a seamless support from design to verification and testing are made available along with software life-cycle. The platform has been applied to the development of a PBX simulator named UICPBX. In the development, some real communication services have been fully specified with DISCOL. Such experiences have revealed the effectiveness of the proposed techniques.
Shinji KIMURA Tsutomu IGAKI Hiromasa HANEDA
The paper describes a parallel algorithm for the manipulation of binary decision diagrams on a shared memory multi-processor system. Binary decision diagrams are very efficient representations of logic functions, and are widely used in computer aided design of logic circuits. Logic operations on logic functions such as AND and OR are reduced to operations on binary decision diagrams representing these functions. Operations on binary decision diagrams are time-consuming in some cases, and a fast manipulation method is needed. As with the manipulation, we focus on the construction of a binary decision diagram from a logic formula, and devised a parallel algorithm for the construction. In the construction, there are many logic operations to be processed, and some of them can be processed in parallel. At first, we introduce an extraction method and a parallel-execution method for such parallelizable operations. This is the parallel execution method for an operation sequence (or a set of operations). To extract more parallelism, we introduce a dynamic expansion method of a logic operation. The dynamic expansion is a method to obtain sub-operations from a logic operation using the modified Shannon's expansion. These sub-operations are executed in parallel and the results of these sub-operations are merged to obtain the result of the original operation. Our parallel algorithm, which is based on the construction of shared binary decision diagrams with the negative edge and the operation cache, is implemented in C on a shared memory multi-processor system Sequent S-81 (CPU 80386 (16 MHz)28, 86.75MB), and applied to multiplier examples and ISCAS benchmarks. The speed-up ratio becomes 14 for multipliers, and becomes 11 for c1908 in ISCAS benchmarks.
Alberto Palacios PAWLOVSKY Makoto HANAWA Osamu NISHII Tadahiko NISHIMUKAI
Advances in semiconductor technology have made it possible to develop an experimental 1000 MIPS superscalar RISC processor. The high performance of this processor was obtained using architectural concepts such as multiple CPU configuration, superscalar microarchitecture, and high-speed device technology. This paper focuses on the novel features of this RISC processor, its device technology, architectural characteristics and one technology that has been devised to make its integer CPU cores fault-tolerant.
Naoki HARADA Shigeru KURODA Kohki HIKOSAKA
A Pt-based gate and photochemical dry etching were developed to fabricate N-InAlAs/InGaAs HEMT ICs. The N-InAlAs/Pt contact showed a Schottky barrier at 0.82 eV, about 0.3 eV larger than ΔEc, and nearly ideal I-V characteristics. Its main disadvantage was the excess penetration of Pt into InAlAs. We proposed a thin-Pt/Ti/Au multilayer gate, more thermally stable than the thick-Pt gate, where Ti layer suppresses the above problem with Pt. The multilayer gate also showed a Schottky barrier (φ) of 0.83 eV and an edeality dactor of 1.1. The high φ value makes it possible to fabricate an E-mode N-InAlAs/InGaAs HEMT. We also developed photochemical selective dry etching using CH3Br gas and a low-pressure mercury lamp. The etching selectivity was 25 at an etch rate of 17 nm/min for InGaAs and 0.7 nm/min for InAlAs. The 1.2-µm-gate E-mode HEMT fabricated using the Pt-based gate and photochemical etching had an excellent peak transconductance of 620 mS/mm with a threshold voltage of +0.03 V. The standard deviation of the threshold voltage of E-mode HEMTs on a 2-inch wafer was 20 mV at an average of +0.088 V. These results indicate the effectiveness of the Pt-based gate and photochemical etching for fabricating N-InAlAs/InGaAs HEMT ICs.