Nozomu TOGAWA Masao SATO Tatsuo OHTSUKI
In this paper, we extend the circuit partitioning algorithm which we have proposed for multi-FPGA systems and present a new algorithm in which the delay of each critical signal path is within a specified upper bound imposed on it. The core of the presented algorithm is recursive bipartitioning of a circuit. The bipartitioning procedure consists of three stages: 0) detection of critical paths; 1) bipartitioning of a set of primary inputs and outputs; and 2) bipartitioning of a set of logic-blocks. In 0), the algorithm computes the lower bounds of delays for paths with path delay constraints and detects the critical paths based on the difference between the lower and upper bound dynamically in every bipartitioning procedure. The delays of the critical paths are reduced with higher priority. In 1), the algorithm attempts to assign the primary inputs and outputs on each critical path to one chip so that the critical path does not cross between chips. Finally in 2), the algorithm not only decreases the number of crossings between chips but also assigns the logic-blocks on each critical path to one chip by exploiting a network flow technique. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it resolves almost all path delay constraints with maintaining the maximum number of required I/O blocks per chip small compared with conventional alogorithms.
Kwang-Su SEONG Chong-Min KYUNG
In this paper, we propose a clustering based linear ordering algorithm which consists of global ordering and local ordering. In the global ordering, the algorithm forms clusters from n given vertices and orders the clusters. In the local ordering, the elements in each cluster are linearly ordered. The linear order, thus produced, is used to obtain optimal κ-way partitioning based on scaled cost objective function. When the number of cluster is one, the proposed algorithm is exactly the same as MELO [2]. But the proposed algorithm has more global partitioning information than MELO by clustering. Experiment with 11 benchmark circuits for κ-way (2 κ 10) partitioning shows that the proposed algorithm yields an average of 10.6% improvement over MELO [2] for the κ-way scaled cost partitioning.
This paper describes a waveform relaxationbased coupled lossy transmission line circuit simulator DESIRE3T+. First, the generalized method of characteristics (GMC) is reviewed, which replaces a lossy transmission line with an equivalent disjoint network. Next, the generalized line delay window (GLDW) partitioning technique is proposed, which accelerates the transient analysis of the circuits including transmission lines replaced by GMC model. Finally GMC model and GLDW technique are implemented in hte relaxation-based circuit simulator DESIRE3T+ which can analyze bipolar transistor circuits by using the dynamic decomposition technique, and the performance is estimated.
The estimation of model parameter is essentially important for an MRF image model to work well. Because the maximum likelihood estimate (MLE), which is statistically optimal, is too difficult to implement, the conventional estimates such as the maximum pseudo-likelihood estimate (MPLE), the coding method estimate (CME), and the least-squares estimate (LSE) are all based on the (conditional) pixel probabilities for simplicity. However, the conventional pixel-based estimators are not very satisfactorily accurate, especially when the interactions of pixels are strong. We therefore propose two window-based estimators to improve the estimation accuracy: the adjoining-conditional-window (ACW) scheme and the separated-conditional-window (SCW) scheme. The replacement of the pixel probabilities by the joint probabilities of window pixels was inspired by the fact that the pixels in an image present information in a joint way and hence the more pixels we deal with the joint probabilities of, the more accurate the estimate should be. The window-based estimators include the pixel-based ones as special cases. We present respectively the relationship between the MLE and each of the two window-based estimates. Through the relationships we provide a unified view that the conventional pixel-based estimates and our window-based estimates all approximate the MLE. The accuracy of all the estimates can be described by two types of superiority: the cross-scheme superiority that an ACW estimate is more accurate than the SCW estimate with the same window size, and the in-scheme superiority that an ACW (or SCW) estimate more accurate than another ACW (or SCW) estimate which uses smaller window size. The experimental results showed the two types of superiority and particularly the significant improvement in estimation accuracy due to using window probabilities instead of pixel probabilities.
Takashi SUGIMOTO Yoshifumi NISHIO Akiko USHIDA
In this paper, we propose a novel SPICE oriented steady-state analysis of nonlinear circuits based on the circuit partition technique. Namely, a given circuit is partitioned into the linear and nonlinear subnetworks by the application of the substitution theorem. Each subnetwork is solved using SPICE simulator by the different techniques of AC analysis and transient analysis, respectively, whose steady-state reponse is found by an iteration method. The novel points of our algorithm are as follows: Once the linear subnetworks are solved by AC analysis, each subnetwork is replaced by a simple equivalent RL or RC circuit at each frequency component. On the other hand, the reponse of nonlinear subnetworks are solved by transient analysis. If we assume that the sensitivity circuit is approximated at the DC operational point, the variational value will be also calculated from a simple RL ro RC circuit. Thus, our method is very simple and can be also applied to large scale circuits, effciently. To improve the convergency, we introduce a compensation technique which is usefully applied to stiff circuits containing components such as diodes and transistors.
Efficient parallel algorithms for several problems on proper circular arc graphs are presented in this paper. These problems include finding a maximum matching, partitioning into a minimum number of induced subgraphs each of which has a Hamiltonian cycle (path), partitioning into induced subgraphs each of which has a Hamiltonian cycle (path) with at least k vertices for a given k, and adding a minimum number of edges to make the graph contain a Hamiltonian cycle (path). It is shown here that the above problems can all be solved in logarithmic time with a linear number of EREW PRAM processors, or in constant time with a linear number of BSR processors. A more important part of this work is perhaps the extension of basic BSR to allow simultaneous multiple BROADCAST instructions.
Keisuke NAKANO Naoyuki KARASAWA Masakazu SENGOKU Shoji SHINODA Takeo ABE
This paper describes communication traffic characteristics in cellular systems employing the concept of reuse partitioning and Dynamic Channel Assignment. Such systems hava a problem of the spatial unbalance of blocking probability. The objective of this paper is overcoming this problem. To accomplish this objective, we use a method for analyzing communication traffic characteristics. We also show results on traffic characteristics in the systems.
We define two restricted classes of Boolean circuits by assuming the following conditions on underlying graphs of circuits, and prove, for each class, nonlinear lower bounds on size of circuits computing cyclic shifts:
We propose a partitioned-bus architecture with a variable-width-bus scheme to reduce power consumption in bus lines in VLSIs. The partitioned-bus architecture (horizontal partitioning) restricts bus driving range to minimal, while the variable-width-bus scheme (vertical partitioning) uses additional tag lines so as to automatically indicate effective bus width with-out driving unnecessary bus lines depending on data to be transferred. Applying this method to a general purpose microprocessor, we demonstrate 30% and 35% power consumption reduction in bus lines, respectively for the partitioned-bus architecture and the variable-width-bus scheme, compared with the conventional bus architecture. Combining the both together, we show about 55% power consumption reduction in bus lines for typical applications. Increase in chip area for this architecture is about 30% compared with the conventional bus architecture.
Kojiro HAMABE Yukitsuna FURUYA
This paper reviews Dynamic Channel Allocation (DCA) in TDMA cellular systems. The emphasis is on distributed DCA, which features decentralized control and adaptability to interference. Performance measures are discussed not only from a theoretical viewpoint but also from a practical viewpoint. Major techniques to enhance the capacity of cellular systems are channel segregation, reuse-partitioning, and transmitter power control. In addition to the performance of conventional cellular systems, differing performance in microcellular systems and multi-layer cellular systems is also discussed.
Masahiro SANO Shintaro SHIMOGORI Fumiyasu HIROSE
This paper presents a new approach of data pipelining for mincut partitioning acceleration using a parallel computer. When using a parallel computer, it is important to have many processors always active, also the quality of the partitioning must not be sacrificed. Out approach covers both speed and quality. We choose the hardware CAD accelerator TP5000 to implement our approach, which consists of dedicated Very Long Instruction Word (VLIW) processors with high-speed interconnections. The TP5000 allows its connections to be reconfigured to optimize the data pipelines. We estimate that the speed of our approach using 10 processors on the TP5000 is 30 times faster than a SPARCStation-10.
Nozomu TOGAWA Masao SATO Tatsuo OHTSUKI
In circuit partitioning for FPGAs, partitioned signal nets are connected using I/O blocks, through which signals are coming from or going to external pins. However, the number of I/O blocks per chip is relatively small compared with the number of logic-blocks, which realize logic functions, accommodated in the FPGA chip. Because of the I/O block limitation, the size of a circuit implemented on each FPGA chip is usually small, which leads to a serious decrease of logic-block utilization. It is required to utilize unused logic-blocks in terms of reducing the number of I/O blocks and realize circuits on given FPGA chips. In this paper, we propose an algorithm which partitions an initial circuit into multi-FPGA chips. The algorithm is based on recursive bi-partitioning of a circuit. In each bi-partitioning, it searches a partitioning position of a circuit such that each of partitioned subcircuits is accommodated in each FPGA chip with making the number of signal nets between chips as small as possible. Such bi-partitioning is achieved by computing a minimum cut repeatedly applying a network flow technique, and replicating logic-blocks appropriately. Since a set of logic-blocks assigned to each chip is computed separately, logic-blocks to be replicated are naturally determined. This means that the algorithm makes good use of unused logic-blocks from the viewpoint of reducing the number of signal nets between chips, i.e. the number of required I/O blocks. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it decreases the maximum number of I/O blocks per chip by a maximum of 49% compared with conventional algorithms.
Nguyen Ngoc BINH Masaharu IMAI Akichika SHIOMI Nobuyuki HIKICHI
This paper proposes a new method to design an optimal pipelined instruction set processor using formal HW/SW codesign methodology. A HW/SW partitioning algorithm for selecting an optimal pipelined architecture is introduced. The codesign task addressed in this paper is to find a set of hardware implemented operations to achieve the highest performance of an ASIP with pipelined architecture under given gate count and power consumption constraints. The problem formalization as well as the proposed algorithm can be considered as an extension of our previous work toward a pipelined architecture. The experimental results show that the proposed method is quite effective and efficient.
Yoshitsugu TSUCHIYA Yoshihiro KANEKO Kazuo HORIUCHI
A 2-switch node network is one of the most fundamental structure among communication nets such as telephone networks and local area networks etc. In this letter, we prove that a problem of designing a 2-switch node network satisfying capacity conditions of switch nodes and their link, which we call 2-switch node network problem, is NP-complete.
Michinari SHIMODA Tokuya ITAKURA Yuko YAMADA
The two-dimensional scattering problem of electromagnetic waves by a perfectly conducting wedge is analyzed by means of the Wiener-Hopf technique together with the formulation using the partition of scatterers. The Wiener-Hopf equations are derived on two complex planes. Investigating the mapping between these complex planes and introducing the appropriate functions which satisfy the edge condition of the wedge, the solutions of these equations are obtained by the decomposition procedure of functions. By deforming the integration path of the Fourier inverse transform, it is found that the representation of the scattered wave is in agreement with the integral representation using the Sommerfeld contours.
Kazuhiko SHIMADA Takeshi WATANABE Masakazu SENGOKU Takeo ABE
The applicability of Dynamic Channel Assignment methods to a Reuse Partitioning system in cellular radio systems is investigated in this paper. The investigations indicate that such a system has a tendency to increase the difference between blocking probability for the partitioning two coverage areas in comparison with the conventional Reuse Partitioning system employing Fixed Channel Assignment method. Two schemes using new Channel Rearrangement algorithms are also proposed in order to alleviate the difference as a disadvantage which gives unequal service to the system. The simulation results show that the proposed schemes are able to reduce the difference significantly while increasing the carried traffic by 10% as compared with the conventional system.
Paulo LORENZO Munehiro GOTO Arthur J. CATTO
The Manchester Dataflow Machine (MDFM) works with tasks of size equal to one single instruction. This fine granularity aims at exploring all parallelism at the instruction level. However, this project decision increases the instruction communication cost, which ends up to jam the interconnection network and reduces the system performance. One way to skirt this problem is to adopt variable size tasks instead of working with such small task size. In this paper, in order to study whether or not the usage of such variable size tasks in the MDFM architecture contributes to the improvement of the performance, some simulations by toy programs take place. In the simulation, variable size tasks are realized by packing the sequential instruction stretches into one task. To manage this packing, the Sequential Block (SB) technique is developed. The simulation of those packed and unpacked programs give an outline of advantages and disadvantages of working with variable size tasks, and how the SB technique should be implemented in the system.
Nguyen Ngoc BINH Masaharu IMAI Akichika SHIOMI Nobuyuki HIKICHI Yoshimichi HONMA Jun SATO
In this paper we describe the formal conditions to detect and resolve all kinds of pipeline data hazards and propose a scheduling algorithm for pipelined instruction set processor synthesis. The algorithm deals with multi cycle operations and tries to minimize the pipeline execution cycles under a given hardware configuration with/without hardware interlock. The main feature that makes the proposed algorithm different from existing ones is the algorithm is for estimating the performance in HW/SW partitioning, with capability of handling a module library of different FUs and dealing with multi cycle operations to be implemented in software. Experimental results of application to ASIP HW/SW codesign show that the proposed algorithm is effective and considerable pipeline execution cycle reduction rates can be achieved. The time complexity of the scheduing algorithm is of O(n2) in the worst case, where n is the number of instructions in a given basic block.
We developed a parallel bordered-block-diagonal (BBD) matrix solution for parallel circuit simulation. In parallel circuit sumulation on a MIMD parallel computer, a circuit is partitioned into as many subcircuits as the processors of a parallel computer. Circuit partition produce a BBD matrix. In parallel BBD matrix solution, diagonal blocks are easily solved separately in each processor. It is difficult, however, to solve the interconnection (IC) submatrix of a BBD matrix effectively in parallel. To make matters worse, the more a circuit is partitioned into subcircuits for highly parallel circuit simulation, the larger the size of an IC submatrix becomes. From an examination, we found that an IC submatrix is more dense (about 30% of all entries are non-zeros) than a normal circuit matrix, and the non-zeros per row in an IC submatrix are almost constant with the number of subcircuits. To attain high-speed circuit simulation, we devised a data structure for BBD matrix processing and an approach to parallel BBD matrix solution. Our approach solves the IC submatrix in a BBD matrix as well as the diagonal blocks in parallel using all processors. In this approach, we allocate an IC submatrix in block-wise order rather than in dot-wise order onto all processors. Thus, we balance the processor perfomance with the communication capacity of a parallel computer system. When we changed the block size of IC submatrix allocation from dot-wise order to 88 block-wise order, the 88 block-wise order allocation almost halved the matrix solution time. The parallel simulation of a sample circuit with 3277 transistors was 16.6 times faster than a single processor when we used 49 processors.
Masahiko TOYONAGA Shih-Tsung YANG Isao SHIRAKAWA Toshiro AKINO
This paper describes a new clustering approach for VLSI placement, which is based on a fractal dimension analysis for the topological structure of modules in a logic diagram. A distinctive feature of this approach is that a measure of the 'fractal dimension' has been introduced into a logic diagram in such a way that the clustering of modules is iterated while the fractal dimension among clustered modules is retained in a prescribed range. A part of experimental results is also shown, which demonstrates that our clustering approach raises the placement performance much higher than the conventional clustering methods.