Tomonori IZUMI Atsushi TAKAHASHI Yoji KAJITANI
A floorplan is a partition of a rectangle into subrectangles, each of which is associated with a module. Zero-wasted-area layouts are known to exist when the height and width of modules are constrained only by the area, and several methods have been proposed for deriving such layouts. However, because these methods are global and indirect, they are inherently slow. We propose a new algorithm which simulates the air-pressure mechanics. It begins with a layout, which is not necessarily feasible, and iterates the movement of one wall at a time to the force-balancing position. The key issue is that it is guaranteed that every movement makes a current layout approach a zero-wasted-area layout by the measure of energy which is defined here. Experimental results on the example in several literatures and artificially made complex examples showed very fast convergence. The algorithm is evolved to methods which move all the walls simultaneously, resulting in a further speed enhancement.
Tomonori IZUMI Toshihiko YOKOMARU Atsushi TAKAHASHI Yoji KAJITANI
The packing problem is to pack given items into given containers as efficiently as possible under various constraints. It is fundamental and significant with variations and applications. The Set-Bin-Packing (SBP) is a class of packing problems: Pack given items into as few bins which have the same capacity where every item is a set and a bin can contain items as long as the number of distinct elements in the union of the items equals to or less than the capacity. One of applications is in FPGA technology mapping, which is our initial motivation. In this paper, the computational complexity of SBP is studied with respect to three parameters α, γ, and δ which are the capacity, the upper bound of the number of elements in an item, and the upper bound of the number of items having an element, respectively. In contrast that the well known Integer-Bin-Packing (IBP) is NP-hard but is proved that even a simplest heuristics First-Fit-Decreasing (FFD) outputs exact solutions as long as α 6, our result reveals that SBP remains NP-hard for a small values of these parameters. The results are summarized on a 3D map of computational complexities with respect to these three parameters.
Kengo R. AZEGAMI Atsushi TAKAHASHI Yoji KAJITANI
We improve the algorithm to obtain the min-cut graph of a hyper-graph and show an application to the sub-network extraction problem. The min-cut graph is a directed acyclic graph whose directed cuts correspond one-to-one to the min-cuts of the hyper-graph. While the known approach trades the exactness of the min-cut graph for some speed improvement, our proposed algorithm gives an exact one without substantial computation overhead. By using the exact min-cut graph, an exhaustive algorithm finds an optimal sub-circuit that is extracted by a min-cut from the circuit. By experiments with the industrial data, the proposing method showed a performance enough for practical use.
Ahmed AWAD Atsushi TAKAHASHI Chikaaki KODAMA
With being pushed into sub-16nm regime, advanced technology nodes printing in optical micro-lithography relies heavily on aggressive Optical Proximity Correction (OPC) in the foreseeable future. Although acceptable pattern fidelity is utilized under process variations, mask design time and mask manufacturability form crucial parameters whose tackling in the OPC recipe is highly demanded by the industry. In this paper, we propose an intensity based OPC algorithm to find a highly manufacturable mask solution for a target pattern with acceptable pattern fidelity under process variations within a short computation time. This is achieved through utilizing a fast intensity estimation model in which intensity is numerically correlated with local mask density and kernel type to estimate the intensity in a short time and with acceptable estimation accuracy. This estimated intensity is used to guide feature shifting, alignment, and concatenation following linearly interpolated variational intensity error model to achieve high mask manufacturability with preserving acceptable pattern fidelity under process variations. Experimental results show the effectiveness of our proposed algorithm on the public benchmarks.
Yukihide KOHIRA Atsushi TAKAHASHI
Under the assumption that the clock can be inputted to each register at an arbitrary timing, the minimum feasible clock period might be reduced by register relocation while maintaining the circuit behavior and topology. However, if the minimum feasible clock period is reduced, then the number of registers tends to be increased. In this paper, we propose a gate-level register relocation method that reduces the number of registers while keeping the target clock period. In experiments, the proposed method reduces the number of registers in the practical time in most circuits.
Bakhtiar Affendi ROSDI Atsushi TAKAHASHI
A new algorithm is proposed to reduce the number of intermediate registers of a pipelined circuit using a combination of multi-clock cycle paths and clock scheduling. The algorithm analyzes the pipelined circuit and determines the intermediate registers that can be removed. An efficient subsidiary algorithm is presented that computes the minimum feasible clock period of a circuit containing multi-clock cycle paths. Experiments with a pipelined adder and multiplier verify that the proposed algorithm can reduce the number of intermediate registers without degrading performance, even when delay variations exist.
In this paper, we propose a global routing method for 2-layer BGA packages. In our routing model, the global routing for each net is uniquely determined by a via assignment of each net. Our global routing method starts from an initial monotonic via assignment and incrementally improves the via assignment to optimize the total wire length and the wire congestion. Experimental results show that our proposed method generates a better global routing efficiently.
Yukihide KOHIRA Atsushi TAKAHASHI
Due to the increase of operation frequency in recent LSI systems, signal propagation delays are required to achieve specifications with very high accuracy. In order to achieve the severe requirements, signal propagation delay is taken into account in the routing design of PCB (Printed Circuit Board). In the routing design of PCB, the controllability of wire length is often focused on since it enables us to control the routing delay. In this paper, we propose CAFE router which obtains routes of multiple nets with target wire lengths for single layer routing grid with obstacles. CAFE router extends the route of each net from a terminal to the other terminal greedily so that the wire length of the net approaches its target wire length. Experiments show that CAFE router obtains the routes of nets with small length error in short time.
Yiqiang SHENG Atsushi TAKAHASHI
In this paper, a novel high-performance heuristic algorithm, named relay-race algorithm (RRA), which was proposed to approach a global optimal solution by exploring similar local optimal solutions more efficiently within shorter runtime for NP-hard problem is investigated. RRA includes three basic parts: rough search, focusing search and relay. The rough search is designed to get over small hills on the solution space and to approach a local optimal solution as fast as possible. The focusing search is designed to reach the local optimal solution as close as possible. The relay is to escape from the local optimal solution in only one step and to maintain search continuity simultaneously. As one of typical applications, multi-objective placement problem in physical design optimization is solved by the proposed RRA. In experiments, it is confirmed that the computational performance is considerably improved. RRA achieves overall Pareto improvement of two conflicting objectives: power consumption and maximal delay. RRA has its potential applications to improve the existing search methods for more hard problems.
Yukihide KOHIRA Atsushi TAKAHASHI
Multi-domain clock skew scheduling in general-synchronous framework is an effective technique to improve the performance of sequential circuits by using practical clock distribution network. Although the upper bound of performance of a circuit increases as the number of clock domains increases in multi-domain clock skew scheduling, the improvement of the performance becomes smaller while the cost of clock distribution network increases much. In this paper, a linear time algorithm that finds an optimum two-domain clock skew schedule in general-synchronous framework is proposed. Experimental results on ISCAS89 benchmark circuits and artificial data show that optimum circuits are efficiently obtained by our method in short time.
Yasuhiro TAKASHIMA Atsushi TAKAHASHI Yoji KAJITANI
The most basic cross-talk minimization problem is to assign given n intervals to n parallel tracks where the cross-talk is defined between two intervals assigned to the adjacent tracks with the amount linear to parallel running length. This paper solves the problem for the case when any pair of intervals intersects and the objective is to minimize the sum of cross-talks. We begin the discussion with the fact that twice the sum of lengths of n/2 shortest intervals is a lower bound. Then an interval set that attains this lower bound is characterized with a simple assignment algorithm. Some additional considerations provide the minimum cross-talk for the other interval sets. The main procedure is to sort the intervals twice with respect to the length of left and right halves of intervals.
Atsushi TAKAHASHI Shuichi UENO Yoji KAJITANI
The family Pk of graphs with proper-path-width at most k is minor-closed. It is known that the number of minimal forbidden minors for a minor-closed family of graphs is finite, but we have few such families for which all the minimal forbidden minors are listed. Although the minimal acyclic forbidden minors are characterized for Pk, all the minimal forbidden minors are known only for P1. This paper lists 36 minimal forbidden minors for P2, and shows that there exist no other minimal forbidden minors for P2.
Yukihide KOHIRA Shuhei TANI Atsushi TAKAHASHI
In general-synchronous framework, in which the clock is distributed periodically to each register but not necessarily simultaneously, the circuit performance such as the clock period is expected to be improved by delay insertion. However, if the amount of inserted delays is too much, then the circuit is changed too much and the circuit performance might not be improved. In this paper, we propose an efficient delay insertion method that minimizes the amount of inserted delays in the clock period improvement in general-synchronous framework. In the proposed method, the amount of inserted delays is minimized by using an appropriate clock schedule and by inserting delays into appropriate places in the circuit. Experiments show that the proposed method can obtain optimum solutions in short time in many cases.
Tomoyuki YODA Atsushi TAKAHASHI
A semi-synchronous circuit is a circuit in which the clock is assumed to be distributed periodically to each individual register, though not necessarily to all registers simultaneously. In this paper, we propose an algorithm to achieve the target clock period by modifying a given target clock schedule as small as possible, where the realization cost of the target clock schedule is assumed to be minimum. The proposed algorithm iteratively improves a feasible clock schedule. The algorithm finds a set of registers that can reduce the cost by changing their clock timings with same amount, and changes the clock timing with optimal amount. Experiments show that the algorithm achieves the target clock period with fewer modifications.
Shimpei SATO Eijiro SASSA Yuta UKON Atsushi TAKAHASHI
In order to obtain high-performance circuits in advanced technology nodes, design methodology has to take the existence of large delay variations into account. Clock scheduling and speculative execution have overheads to realize them, but have potential to improve the performance by averaging the imbalance of maximum delay among paths and by utilizing valid data available earlier than worst-case scenarios, respectively. In this paper, we propose a high-performance digital circuit design method with speculative executions with less overhead by utilizing clock scheduling with delay insertions effectively. The necessity of speculations that cause overheads is effectively reduced by clock scheduling with delay insertion. Experiments show that a generated circuit achieves 26% performance improvement with 1.3% area overhead compared to a circuit without clock scheduling and without speculative execution.
Atsushi TAKAHASHI Shuichi UENO Yoji KAJITANI
A graph G is said to be universal for a family F of graphs if G contains every graph in F as a subgraph. A minimum universal graph for F is a universal graph for F with the minimum number of edges. This paper considers a minimum universal graph for the family Fkn of graphs on n vertices with path-width at most k. We first show that the number of edges in a universal graph Fkn is at least Ω(kn log(n/k)). Next, we construct a universal graph for Fkn with O(kn log(n/k)) edges, and show that the number of edges in a minimum universal graph for Fkn is Θ(kn log(n/k)) .
Kazunori INOUE Wataru TAKAHASHI Atsushi TAKAHASHI Yoji KAJITANI
It is known that the clock-period can be shorter than the maximum of signal-delays between registers if the clock arrival time to each register is properly scheduled. The algorithm to design an optimal clock-schedule was given. In this paper, we propose a clock-tree routing algorithm that realizes a given clock-schedule using the Elmore-delay model. Following the deferred-merge-embedding (DME) framework, the algorithm generates a topology of the clock-tree and simultaneously determines the locations and sizes of intermediate buffers. The experimental results showed that this method constructs a clock-tree with moderate wire length for a random layout of scheduled registers. Notably, the required wire length for a gentle layout of scheduled registers was shown to be almost equal to that of zero-skew clock-trees.
In this paper, a practical clock-scheduling engine is introduced. The minimum feasible clock-period is obtained by using a modified Bellman-Ford shortest path algorithm. Then an optimum cost clock-schedule is obtained by using a bipartite matching algorithm. It also provides useful information to circuit synthesis tools. The experiment to a circuit with about 10000 registers and 100000 signal paths shows that a result is obtained within a few minutes. The computation time is almost linear to the circuit size in practice.
Tomoyuki YODA Atsushi TAKAHASHI
A semi-synchronous circuit is a circuit in which every register is ticked by a clock periodically, but not necessarily simultaneously. In a semi-synchronous circuit, the minimum delay between registers may be critical with respect to the clock period of the circuit, while it does not affect the clock period of an ordinary synchronous circuit. In this paper, we discuss a delay insertion method which makes such a semi-synchronous circuit faster. The maximum delay-to-register ratio over the cycles in the circuit gives a lower bound of the clock period. We show that this bound is achieved in the semi-synchronous framework by the proposing gate-level delay insertion method.