1-4hit |
We developed a parallel bordered-block-diagonal (BBD) matrix solution for parallel circuit simulation. In parallel circuit sumulation on a MIMD parallel computer, a circuit is partitioned into as many subcircuits as the processors of a parallel computer. Circuit partition produce a BBD matrix. In parallel BBD matrix solution, diagonal blocks are easily solved separately in each processor. It is difficult, however, to solve the interconnection (IC) submatrix of a BBD matrix effectively in parallel. To make matters worse, the more a circuit is partitioned into subcircuits for highly parallel circuit simulation, the larger the size of an IC submatrix becomes. From an examination, we found that an IC submatrix is more dense (about 30% of all entries are non-zeros) than a normal circuit matrix, and the non-zeros per row in an IC submatrix are almost constant with the number of subcircuits. To attain high-speed circuit simulation, we devised a data structure for BBD matrix processing and an approach to parallel BBD matrix solution. Our approach solves the IC submatrix in a BBD matrix as well as the diagonal blocks in parallel using all processors. In this approach, we allocate an IC submatrix in block-wise order rather than in dot-wise order onto all processors. Thus, we balance the processor perfomance with the communication capacity of a parallel computer system. When we changed the block size of IC submatrix allocation from dot-wise order to 88 block-wise order, the 88 block-wise order allocation almost halved the matrix solution time. The parallel simulation of a sample circuit with 3277 transistors was 16.6 times faster than a single processor when we used 49 processors.
Tetsuro KAGE Fumiyo KAWAFUJI Junichi NIITSUMA
We have studied a circuit partitioning approach in the view of parallel circuit simulation on a MIMD parallel computer. In parallel circuit simulation, a circuit is partitioned into equally sized subcircuits while minimizing the number of interconnection nodes. Besides circuit partitioning time should be short enough compared with the total simulation time. From the details of circuit simulation time, we found that balancing subcircuits is critical for low parallel processing, whereas minimizing the interconnection nodes is critical for highly parallel processing. Our circuit partitioning approach consists of four steps: Grouping transistors, initial partitioning the transistor-groups, minimizing the number of interconnection nodes, and balancing the subcircuits. It is based on an algorithmic approach, and can directly control the tradeoffs between balancing subcircuits and minimizing the interconnection nodes by adjusting the parameters. We partitioned a test circuit with 3277 transistors into 4, 9, ... , 64 subcircuits, and did parallel simulations using PARACS, our parallel circuit simulator, on an AP1000 parallel computer. The circuit partitioning time was short enough-less than 3 percent of the total simulation time. The highest performance of parallel analysis using 49 processors was 16 times that of a single processor, and that for total simulation was 9 times.
Atsushi KUROKAWA Toshiki KANAMOTO Tetsuya IBE Akira KASEBE Wei Fong CHANG Tetsuro KAGE Yasuaki INOUE Hiroo MASUDA
Floating dummy metal fills inserted for planarization of multi-dielectric layers have created serious problems because of increased interconnect capacitance and the enormous number of fills. We present new dummy filling methods to reduce the interconnect capacitance and the number of dummy metal fills needed. These techniques include three ways of filling: 1) improved floating square fills, 2) floating parallel lines, and 3) floating perpendicular lines (with spacing between dummy metal fills above and below signal lines). We also present efficient formulas for estimating the appropriate spacing and number of fills. In our experiments, the capacitance increase using the conventional regular square method was 13.1%, while that using the methods of improved square fills, extended parallel lines, and perpendicular lines were 2.7%, 2.4%, and 1.0%, respectively. Moreover, the number of necessary dummy metal fills can be reduced by two orders of magnitude through use of the parallel line method.
Tetsuro KAGE Hisanori FUJISAWA Fumiyo KAWAFUJI Tomoyasu KITAURA
Circuit simulators are used to verify circuit functionality and to obtain detailed timing information before the expensive fabrication process takes place. They have become an essential CAD tool in an era of sub-micron technology. We have developed a new event-driven MOS circuit simulator to replace a direct method circuit simulator. In our simulator, partitioned subcircuits are analyzed by a direct method matrix solver, and these are controlled by an event-driven scheme to maintain accuracy. The key of this approach is how to manage events for circuit simulation. We introduced two types of events: self-control events for a subcircuit and prediction correcting events between subcircuits. They control simulation accuracy, and bring simulation efficiency through multi-rate behavior of a large scale circuit. The event-driven scheme also brings some useful functions which are not available from a direct method circuit simulator, such as a selected block simulation function and a batch simulation function for load variation. We simulated logic modules (buffer, adder, and counter) with about 1000 MOSFETs with our event-driven MOS circuit simulator. Our simulator was 5-7 times faster than a SPICE-like circuit simulator, while maintaining the less than 1% error accuracy. The selected block simulation function enables to shorten simulation time without losing any accuracy by selecting valid blocks in a circuit to simulate specified node waveforms. Using this function, the logic modules were simulated 13-28 times faster than the SPICE-like circuit simulator while maintaining the same accuracy.