Yasuhiro TAKASHIMA Atsushi TAKAHASHI Yoji KAJITANI
The switch-block architecture of FPGAs is discussed to see a good balance between programmable-switch resources and routability. For the purpose, FPGAs are assumed to have certain extremal structures, whose switch-blocks consist of parallel or complete switch-sets where a switch-set is a set of switches between two sides of the switch-block. A polynomial time detailed-routing algorithm for a given global-routing is presented if the switch-block consists of two or less parallel switch-sets or three that form a cycle. For other FPGAs, the corresponding decision problem is proved to be -complete. A best compromise between switch resources and routability is offered.
Atsushi TAKAHASHI Nobuyoshi KOMURO Shiro SAKATA Shigeo SHIODA Tutomu MURASE
In wireless single-hop networks, IEEE 802.11e Enhanced Distributed Channel Access (EDCA) is the standard for Quality of Service (QoS) control. However, it is necessary for controlling QoS to modify the currently used IEEE 802.11 Distributed Coordination Function (DCF)-compliant terminals as well as Access Points (APs). In addition, it is necessary to modify the parameter of IEEE 802.11e EDCA when the traffic is heavy. This paper proposes a novel scheme to guarantee QoS of high-priority flow with Receiving Opportunity Control in MAC Frame (ROC) employed adaptive flow control in wireless multi-hop network. In the proposed scheme, the edge APs which are directly connected to user terminals estimate the network capacity, and calculate appropriate ACK prevention probability against low-priority flow according to traffic load. Simulation evaluation results show that the proposed scheme guarantees QoS.
Bakhtiar Affendi ROSDI Atsushi TAKAHASHI
A new algorithm is proposed to reduce the area of a pipelined circuit using a combination of multi-clock cycle paths, clock scheduling and delay balancing. The algorithm analyzes the circuit and replaces intermediate registers with delay elements under the condition that the circuit works correctly at given target clock-period range with the smaller area. Experiments with pipelined multipliers verify that the proposed algorithm can reduce the area of a pipelined circuit without degrading performance.
Atsushi TAKAHASHI Shuichi UENO Yoji KAJITANI
We introduce the interval set of a graph G which is a representation of the proper-path-decomposition of G, and show a linear time algorithm to construct an optimal interval set for any tree T. It is shown that a proper-path-decomposition of T with optimal width can be obtained from an optimal interval set of T in O(n log n) time.
Atsushi TAKAHASHI Yoji KAJITANI
Given a switch-box, let C be a connection requirement. If there is a polynomial time algorithm (router) to complete C, C is said to be tractable by the algorithm. There have been proposed a number of switch-box routers but none that makes clear its tractable problems. We propose a switch-box router, or rather a principle, BOX-PEELER with a simple characterization of a class of tractable problems. BOX-PEELER is developed to be an underlying concept in switch-box routing as LEFT-EDGE method has been in 2-side channel routing.
Keiichi KUROKAWA Takuya YASUI Masahiko TOYONAGA Atsushi TAKAHASHI
In this paper, we propose a new clock tree synthesis method for semi-synchronous circuits. A clock tree obtained by the proposed method is a multi-level multi-way clock tree such that a clock-input timing of each register is a multiple of a predefined unit delay and the wire length from a clock buffer to an element driven by it is bounded. The clock trees are constructed for several practical circuits. The size of constructed clock tree is comparable to a zero skew clock tree. In order to assure the practical quality of the clock trees, they are examined under the five delay conditions, which cover various environmental and manufacturing conditions. As a result, they are proved stable under each condition and improve the clock speed up to 17.3% against the zero skew clock trees.
Yoichi TOMIOKA Atsushi TAKAHASHI
Ball Grid Array packages in which I/O pins are arranged in a grid array pattern realize a number of connections between chips and a printed circuit board, but it takes much time in manual routing. We propose a fast routing method for 2-layer Ball Grid Array packages that iteratively modifies via assignment. In experiments, in most cases, via assignment and global routing on both of layers in which all nets are realized and the violation of wire congestion on layer 1 is small are speedily obtained.
Yosuke TAKAHASHI Yukihide KOHIRA Atsushi TAKAHASHI
The reduction of the peak power consumption of LSI is required to reduce the instability of gate operation, the delay increase, the noise, and etc. It is possible to reduce the peak power consumption by clock scheduling because it controls the switching timings of registers and combinational logic elements. In this paper, we propose a fast peak power wave estimation method for clock scheduling and fast clock scheduling methods for the peak power reduction. In experiments, it is shown that the peak power wave estimated by the proposed method in a few seconds is highly correlated with the peak power wave obtained by HSPICE simulation in several days. By using the proposed peak power wave estimation method, proposed clock scheduling methods find clock schedules that greatly reduce the peak power consumption in a few minutes.
Kengo R. AZEGAMI Masato INAGI Atsushi TAKAHASHI Yoji KAJITANI
In this paper, we propose an improved network-flow based multi-way circuit partitioning algorithm whose objective is to minimize the number of sub-circuits. It iteratively extracts a size-maximal feasible sub-circuit one at a time. In our approach, two devices are applied. One is in the use of an exact min-cut graph, and the other is in the idea of keeping the number of I/O pins of the residual circuit as small as possible after one-time extraction. We implemented our algorithm in C for experiments, and tested it with several industrial cases and MCNC benchmarks. Compared to the known approach, we observed more than 10% reduction in average of the sub-circuit number.
Seiichiro ISHIJIMA Tetsuaki UTSUMI Tomohiro OTO Atsushi TAKAHASHI
A circuit in which the clock is assumed to be distributed periodically to each individual register though not necessarily to all registers simultaneously, called a semi-synchronous circuit, is expected to achieve higher frequency or a smaller clock tree compared with an ordinary synchronous circuit, called a complete-synchronous circuit. In this paper, we propose a circuit design method that realizes a semi-synchronous circuit with higher frequency by modifying the clock tree of a complete-synchronous circuit. We confirm that the proposed method is easy to incorporate with current practical design environment by designing a four stage pipelined processor compatible with MIPS operation code. The obtained processor circuit is the first semi-synchronous circuit designed systematically with theoretical background.
Keiichi KUROKAWA Takuya YASUI Yoichi MATSUMURA Masahiko TOYONAGA Atsushi TAKAHASHI
In several researches in recent years, it is shown that the circuit of a higher clock frequency can be obtained by controlling the clock-input timing of each register. However, the power consumption of the clock-tree obtained by them tends to be larger since the locations of registers are not well taken into account in clock scheduling. In this paper, we propose a novel clock tree synthesis that attains both the higher clock frequency and the lower power consumption. Our proposed algorithm determines the clock-input timings of registers step by step in constructing a clock tree structure. First, the clock period of a circuit is improved by controlling the clock-input timing of each register, and second, the clock-input timings are modified to construct a low power clock tree without deteriorating the obtained clock period. According to our experiments using several benchmark circuits, the power consumption of our clock trees attain about 9.5% smaller than previous methods.
Makoto SAITOH Masaaki AZUMA Atsushi TAKAHASHI
We introduce a clock schedule algorithm to obtain a clock schedule that achieves a shorter clock period and that can be realized by a light clock tree. A shorter clock period can be achieved by controlling the clock input timing of each register, but the required wire length and power consumption of a clock tree tends to be large if clock input timings are determined without considering the locations of registers. To overcome the drawback, our algorithm constructs a cluster that consists of registers with the same clock input timing located in a close area. The registers in each cluster are driven by a buffer and a shorter wire length can be achieved. In our algorithm, first registers are partitioned into clusters by their locations, and clusters are modified to improve the clock period while maintaining the radius of each cluster small. In our experiments, the clock period achieved in average is about 13% shorter than that achieved by a zero-skew clock tree, and about 4% longer than the theoretical minimum. The wire length and power consumption of a clock tree according to an obtained clock schedule is comparable to these of a zero skew tree.
Yukihide KOHIRA Suguru SUEHIRO Atsushi TAKAHASHI
In recent VLSI systems, signal propagation delays are requested to achieve the specifications with very high accuracy. In order to meet the specifications, the routing of a net often needs to be detoured in order to increase the routing delay. A routing method should utilize a routing area with obstacles as much as possible in order to realize the specifications of nets simultaneously. In this paper, a fast longer path algorithm that generates a path of a net in routing grid so that the length is increased as much as possible is proposed. In the proposed algorithm, an upper bound for the length in which the structure of a routing area is taken into account is used. Experiments show that our algorithm utilizes a routing area with obstacles efficiently.
Yoichi TOMIOKA Yoshiaki KURATA Yukihide KOHIRA Atsushi TAKAHASHI
In this paper, we propose a routing method for 2-layer ball grid array packages that generates a routing pattern satisfying a design rule. In our proposed method, the routing structure on each layer is restricted while keeping most of feasible patterns to efficiently obtain a feasible routing pattern. A routing pattern that satisfies the design rule is formulated as a mixed integer linear programming. In experiments with seven data, we obtain a routing pattern such that satisfies the design rule within a practical time by using a mixed integer linear programming solver.
Shimpei SATO Kano AKAGI Atsushi TAKAHASHI
Routing problems derived from silicon-interposer and etc. are often formulated as a set-pair routing problem where the combination of pin-pairs to be connected is flexible. In this routing problem, a length matching routing pattern is often required due to the requirement of the signal propagation delays be the same. We propose a fast length matching routing method for the set-pair routing problem. The existing algorithm generates a good length matching routing pattern in practical time. However, due to the limited searching range, there are length matching routing patterns that cannot find due to the limited searching range of the algorithm. Also, it needs heavy iterative steps to improve a solution, and the computation time is practical but not fast. In the set-pair routing, although pin-pairs to be connected is flexible, it is expected that combinations of pin-pairs which realize length matching are restricted. In our method, such a combination of pin-pairs is selected in advance, then routing is performed to realize the connection of the selected pin-pairs. Heavy iterative steps are not used for both the selection and the routing, then a routing pattern is generated in a short time. In the experiments, we confirm that the quality of routing patterns generated by our method is almost equivalent to the existing algorithm. Furthermore, our method finds length matching routing patterns that the existing algorithm cannot find. The computation time is about 360 times faster than the existing algorithm.
Yoichi TOMIOKA Atsushi TAKAHASHI
Ball Grid Array packages in which I/O pins are arranged in a grid array pattern realize a number of connections between chips and PCB, but it takes much time in manual routing. So the demand for automation of package routing is increasing. In this paper, we give the necessary and sufficient condition that all nets can be connected by monotonic routes when a net consists of a finger and a ball and fingers are on the two parallel boundaries of the Ball Grid Array package, and propose a monotonic routing method based on this condition. Moreover, we give a necessary condition and a sufficient condition when fingers are on the two orthogonal boundaries, and propose a monotonic routing method based on the necessary condition.
Yukihide KOHIRA Atsushi TAKAHASHI
Under the assumption that clock can be inputted to each register at an arbitrary timing, the minimum feasible clock period can be determined if delays between registers are given. This minimum feasible clock period might be reduced by register relocation maintaining the circuit behavior and topology. In this paper, we propose a gate-level register relocation method to reduce the minimum feasible clock period. The proposed method is a greedy local circuit modification method. We prove that the proposed method achieves the clock period achieved by retiming with delay decomposition, if the delay of each element in the circuit is unique. Experiments show that the computation time of the proposed method and the number of registers of a circuit obtained by the proposed method are smaller than those obtained by the retiming method in the conventional synchronous framework.
Hideki MITSUBAYASHI Atsushi TAKAHASHI Yoji KAJITANI
The most crucial factor that degrades a high speed VLSI is the signal propagation delay in a routing tree. It is estimated by the sum of the delay caused by the source-to-sink path length and by the total length. To design a routing tree in which these two are both small and balanced, we propose an algorithm to construct such a spanning tree, based on the idea of constructing a tree combining the minimum-spanning-tree and shortest-path-tree algorithms. This idea is extended to finding a rectilinear Steiner tree. Experiments are presented to illustrate how the source-to-sink path length and total length can be ballanced and small.
Yuta UKON Shimpei SATO Atsushi TAKAHASHI
Advanced information-processing services such as computer vision require a high-performance digital circuit to perform high-load processing at high speed. To achieve high-speed processing, several image-processing applications use an approximate computing technique to reduce idle time of the circuit. However, it is difficult to design the high-speed image-processing circuit while controlling the error rate so as not to degrade service quality, and this technique is used for only a few applications. In this paper, we propose a method that achieves high-speed processing effectively in which processing time for each task is changed by roughly detecting its completion. Using this method, a high-speed processing circuit with a low error rate can be designed. The error rate is controllable, and a circuit design method to minimize the error rate is also presented in this paper. To confirm the effectiveness of our proposal, a ripple-carry adder (RCA), 2-dimensional discrete cosine transform (2D-DCT) circuit, and histogram of oriented gradients (HOG) feature calculation circuit are evaluated. Effective clock periods of these circuits obtained by our method with around 1% error rate are improved about 64%, 6%, and 12%, respectively, compared with circuits without error. Furthermore, the impact of the miscalculation on a video monitoring service using an object detection application is investigated. As a result, more than 99% of detection points required to be obtained are detected, and it is confirmed the miscalculation hardly degrades the service quality.
Yuta NAKATANI Atsushi TAKAHASHI
In the routing design of interposer and etc., the combination of a pin pair to be connected by wire is often flexible, and the reductions of the total wire length and the length difference are pursued to keep the circuit performance. Even though the total wire length can be minimized by finding a minimum cost maximum flow in set pair routing problems, the length difference is often large, and the reduction of it is not easy. In this paper, an algorithm that reduces the length difference while keeping the total wire length small is proposed. In the proposed algorithm, an initial routing first obtained by a minimum cost maximum flow. Then it is modified to reduce the maximum length while keeping the minimum total wire length, and a connection of the minimum length is detoured to reduce the length difference. The effectiveness of the proposed algorithm is confirmed by experiments.