In this paper, we propose a design method to design asynchronous circuits with bundled-data implementation on commercial Field Programmable Gate Arrays using placement constraints. The proposed method uses two types of placement constraints to reduce the number of delay adjustments to fix timing violations and to improve the performance of the bundled-data implementation. We also propose a floorplan algorithm to reduce the control-path delays specific to the bundled-data implementation. Using the proposed method, we could design the asynchronous circuits whose performance is close to and energy consumption is small compared to the synchronous counterparts with less delay adjustment.
Katsuhisa YAMANAKA Md. Saidur RAHMAN Shin-ichi NAKANO
Given an axis-aligned rectangle R and a set P of n points in the proper inside of R we wish to partition R into a set S of n+1 rectangles so that each point in P is on the common boundary between two rectangles in S. We call such a partition of R a feasible floorplan of R with respect to P. Intuitively, P is the locations of columns and a feasible floorplan is a floorplan in which no column is in the proper inside of a room, i.e., columns are allowed to be placed only on the partition walls between rooms. In this paper we give an efficient algorithm to enumerate all feasible floorplans of R with respect to P. The algorithm is based on the reverse search method, and enumerates all feasible floorplans in O(|SP|) time using O(n) space, where SP is the set of the feasible floorplans of R with respect to P, while the known algorithms need either O(n|SP|) time and O(n) space or O(log n|SP|) time and O(n3) space.
Chien-Hui LIAO Charles H.-P. WEN
Hotspots occur frequently in 3D multi-core processors (3D-MCPs), and they may adversely impact both the reliability and lifetime of a system. We present a new thermally constrained task scheduler based on a thermal-pattern-aware voltage assignment (TPAVA) to reduce hotspots in and optimize the performance of 3D-MCPs. By analyzing temperature profiles of different voltage assignments, TPAVA pre-emptively assigns different initial operating-voltage levels to cores for reducing temperature increase in 3D-MCPs. The proposed task scheduler consists of an on-line allocation strategy and a new voltage-scaling strategy. In particular, the proposed on-line allocation strategy uses the temperature-variation rates of the cores and takes into two important thermal behaviors of 3D-MCPs that can effectively minimize occurrences of hotspots in both thermally homogeneous and heterogeneous 3D-MCPs. Furthermore, a new vertical-grouping voltage scaling (VGVS) strategy that considers thermal correlation in 3D-MCPs is used to handle thermal emergencies. Experimental results indicate that, when compared to a previous online thermally constrained task scheduler, the proposed task scheduler can reduce hotspot occurrences by approximately 66% (71%) and improve throughput by approximately 8% (2%) in thermally homogeneous (heterogeneous) 3D-MCPs. These results indicate that the proposed task scheduler is an effective technique for suppressing hotspot occurrences and optimizing throughput for 3D-MCPs subject to thermal constraints.
Kotaro TERADA Masao YANAGISAWA Nozomu TOGAWA
As application hardware designs and implementations in a short term are required, high-level synthesis is more and more essential EDA technique nowadays. In deep-submicron era, interconnection delays are not negligible even in high-level synthesis thus distributed-register and -controller architectures (DR architectures) have been proposed in order to cope with this problem. It is also profitable to take data-bitwidth into account in high-level synthesis. In this paper, we propose a bitwidth-aware high-level synthesis algorithm using operation chainings targeting Tiled-DR architectures. Our proposed algorithm optimizes bitwidths of functional units and utilizes the vacant tiles by adding some extra functional units to realize effective operation chainings to generate high performance circuits without increasing the total area. Experimental results show that our proposed algorithm reduces the overall latency by up to 47% compared to the conventional approach without area overheads by eliminating unnecessary bitwidths and adding efficient extra FUs for Tiled-DR architectures.
Koki IGAWA Masao YANAGISAWA Nozomu TOGAWA
In this paper, we propose a floorplan aware high-level synthesis algorithm with body biasing for delay variation compensation, which minimizes the average leakage energy of manufactured chips. In order to realize floorplan-aware high-level synthesis, we utilize huddle-based distributed register architecture (HDR architecture). HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit more but can increase the latency. We assign control-data flow graph (CDFG) nodes in non-critical paths to the huddles with larger expected leakage energy and those in critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 39.7% without latency and yield degradation compared with typical-case design with body biasing.
Koichi FUJIWARA Kazushi KAWAMURA Masao YANAGISAWA Nozomu TOGAWA
Recently, high-level synthesis techniques for FPGA designs (FPGA-HLS techniques) are strongly required in various applications. Both interconnection delays and clock skews have a large impact on circuit performance implemented onto FPGA, which indicates the need for floorplan-driven FPGA-HLS algorithms considering them. To appropriately estimate interconnection delays and clock skews at HLS phase, a reasonable model to estimate them becomes essential. In this paper, we demonstrate several experiments to characterize interconnection delays and clock skews in FPGA and propose novel estimate models called “IDEF” and “CSEF”. In order to evaluate our models, we integrate them into a conventional floorplan-driven FPGA-HLS algorithm. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the latency by up to 22% compared with conventional approaches.
Katsuhisa YAMANAKA Shin-ichi NAKANO
In this paper, we consider the problem of generating uniformly random mosaic floorplans. We propose a polynomial-time algorithm that generates such floorplans with f faces. Two modified algorithms are created to meet additional criteria.
Kotaro TERADA Masao YANAGISAWA Nozomu TOGAWA
In deep-submicron era, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register architectures (RDR architectures) have been proposed to cope with this problem. In this paper, we propose a high-level synthesis algorithm using operation chainings which reduces the overall latency targeting RDR architectures. Our algorithm consists of three steps: The first step enumerates candidate operations for chaining. The second step introduces maximal chaining distance (MCD), which gives the maximal allowable inter-island distance on RDR architecture between chaining candidate operations. The last step performs list-scheduling and binding simultaneously based on the results of the two preceding steps. Our algorithm enumerates feasible chaining candidates and selects the best ones for RDR architecture. Experimental results show that our proposed algorithm reduces the latency by up to 40.0% compared to the original approach, and by up to 25.0% compared to a conventional approach. Our algorithm also reduces the number of registers and the number of multiplexers compared to the conventional approaches in some cases.
Koichi FUJIWARA Kazushi KAWAMURA Shin-ya ABE Masao YANAGISAWA Nozomu TOGAWA
Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications such as computerized stock tradings and reconfigurable network processings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer's cost concurrently. In this paper, we propose a floorplan-driven HLS algorithm for multiplexer reduction targeting FPGA designs. By utilizing distributed-register architectures called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer's cost, we propose two novel binding methods called datapath-oriented scheduling/FU binding and datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 47% and latency by up to 22% compared with conventional approaches while the number of required control steps is almost the same.
Katsuhisa YAMANAKA Shin-ichi NAKANO
Recently a compact code of mosaic floorplans with ƒ inner face was proposed by He. The length of the code is 3ƒ-3 bits and asymptotically optimal. In this paper, we propose a new code of mosaic floorplans with ƒ inner faces including k boundary faces. The length of our code is at most $3f - rac{k}{2} - 1$ bits. Hence our code is shorter than or equal to the code by He, except for few small floorplans with k=ƒ≤3. Coding and decoding can be done in O(ƒ) time.
Katherine Shu-Min LI Yingchieh HO Liang-Bi CHEN
Crosstalk-induced noise has become a key problem in interconnect optimization when technology improves, spacing diminishes, and coupling capacitance/inductance increases. Buffer insertion/sizing is one of the most effective and popular techniques to reduce interconnect delay and decouple coupling effects. It is traditionally applied to post-layout optimization. However, it is obviously infeasible to insert/size hundreds of thousands buffers during the post-layout stage when most routing regions are occupied. Therefore, it is desirable to incorporate buffer planning into floorplanning to ensure timing closure and design convergence. In this paper, we first derive formulae of buffer insertion for timing and noise optimization, and then apply the formulae to compute the feasible regions for inserting buffers to meet both timing and noise constraints. Experimental results show that our approach achieves an average success rate of 80.9% (78.2%) of nets meeting timing constraints alone (both timing and noise constraints) and consumes an average extra area of only 0.49% (0.66%) over the given floorplan, compared with the average success rate of 75.6% of nets meeting timing constraints alone and an extra area of 1.33% by the BBP method proposed previously.
Wei ZHONG Song CHEN Bo HUANG Takeshi YOSHIMURA Satoshi GOTO
Application-Specific Network-on-Chips (ASNoCs) have been proposed as a more promising solution than regular NoCs to the global communication challenges for particular applications in nanoscale System-on-Chip (SoC) designs. In ASNoC Design, one of the key challenges is to generate the most suitable and power efficient NoC topology under the constraints of the application specification. In this work, we present a two-step floorplanning (TSF) algorithm, integrating topology synthesis into floorplanning phase, to automate the synthesis of such ASNoC topologies. At the first-step floorplanning, during the simulated annealing, we explore the optimal positions and clustering of cores and implement an incremental path allocation algorithm to predictively evaluate the power consumption of the generated NoC topology. At the second-step floorplanning, we explore the optimal positions of switches and network interfaces on the floorplan. A power and timing aware path allocation algorithm is also integrated into this step to determine the connectivity across different switches. Experimental results on a variety of benchmarks show that our algorithm can produce greatly improved solutions over the latest works.
Nan LIU Song CHEN Takeshi YOSHIMURA
Heterogeneous resources such as configurable logic blocks (CLBs), multiplier blocks (MULs) and RAM blocks (RAMs) where millions of logic gates are included have been added to field programmable gate arrays (FPGAs). The fixed-outline floorplanning used by the existing methods always has a big penalty item in the objective function to ensure all the modules are placed in the specified chip region, which maybe greatly degrade the wirelength. This paper presents a three-phase floorplanning method for heterogeneous FPGAs. First, a non-slicing free-outline floorplanning method is used to optimize the wirelength, however, in this phase, the satisfaction of resource requirements from functional modules might fail. Second, a min-cost-max-flow algorithm is used to tune the assignment of CLBs to functional modules, and assign contiguous regions to each module so that all the functional modules satisfy CLB requirements. Finally, the MULs and RAMs are allocated to modules by a network flow model. CLBs hold the maximum quantity among all the resources. Therefore, making a high utilization of them means an enhancement of the FPGA densities. The proposed method can improve the utilization of CLBs, hence, much larger circuits could be mapped to the same FPGA chip. The results show that about 7–85% wirelength reduction is obtained, and CLB utilization is improved by about 25%.
In this paper, we propose a synthesis method for asynchronous circuits with bundled-data implementation. The proposed method iteratively applies behavioral synthesis and floorplanning to obtain a near optimum circuit in the term of latency under given design constraints. To improve latency, behavioral synthesis and floorplanning are carried out so that the delay of the control circuit is minimized and the addition of delay elements to satisfy timing constraints is minimized. We evaluate the effectiveness of the proposed method in terms of latency, area, and the number of timing violations while synthesizing several benchmarks. Experimental results show that the proposed method synthesizes faster circuits compared to the circuit synthesized without the proposed method. Also, the proposed method is effective to reduce the number of timing violations.
Yuchun MA Xin LI Yu WANG Xianlong HONG
In 3D IC design, thermal issue is a critical challenge. To eliminate hotspots, physical layouts are always adjusted by some incremental changes, such as shifting or duplicating hot blocks. In this paper, we distinguish the thermal-aware incremental changes in three different categories: migrating computation, growing unit and moving hotspot blocks. However, these modifications may degrade the packing area as well as interconnect distribution greatly. In this paper, mixed integer linear programming (MILP) models are devised according to these different incremental changes so that multiple objectives can be optimized simultaneously. Furthermore, to avoid random incremental modification, which may be inefficient and need long runtime to converge, here potential gain is modeled for each candidate incremental change. Based on the potential gain, a novel thermal optimization flow to intelligently choose the best incremental operation is presented. Experimental results show that migrating computation, growing unit and moving hotspot can reduce max on-chip temperature by 7%, 13% and 15% respectively on MCNC/GSRC benchmarks. Still, experimental results also show that the thermal optimization flow can reduce max on-chip temperature by 14% to the initial packings generated by an existing 3D floorplanning tool CBA, and achieve better area and total wirelength improvement than individual operations do. The results with the initial packings from CBA_T (Thermal-aware CBA floorplanner) show that 13.5% temperature reduction can be obtained by our incremental optimization flow.
Akira OHCHI Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
As device feature size decreases, interconnection delay becomes the dominating factor of circuit total delay. Distributed-register architectures can reduce the influence of interconnection delay. They may, however, increase circuit area because they require many local registers. Moreover original distributed-register architectures do not consider control signal delay, which may be the bottleneck in a circuit. In this paper, we propose a high-level synthesis method targeting generalized distributed-register architecture in which we introduce shared/local registers and global/local controllers. Our method is based on iterative improvement of scheduling/binding and floorplanning. First, we prepare shared-register groups with global controllers, each of which corresponds to a single functional unit. As iterations proceed, we use local registers and local controllers for functional units on a critical path. Shared-register groups physically located close to each other are merged into a single group. Accordingly, global controllers are merged. Finally, our method obtains a generalized distributed-register architecture where its scheduling/binding as well as floorplanning are simultaneously optimized. Experimental results show that the area is decreased by 4.7% while maintaining the performance of the circuit equal with that using original distributed-register architectures.
Qing DONG Bo YANG Jing LI Shigetoshi NAKATAKE
This paper presents an efficient algorithm for incremental buffer insertion and module resizing for a full-placed floorplan. Our algorithm offers a method to use the white space in a given floorplan to resize modules and insert buffers, and at the same time keeps the resultant floorplan as close to the original one as possible. Both the buffer insertion and module resizing are modeled as geometric programming problems, and can be solved extremely efficiently using new developed solution methods. The experimental results suggest that the the wire length difference between the initial floorplan and result are quite small (less than 5%), and the global structure of the initial floorplan are preserved very well.
In modern VLSI physical design, huge integration scale necessitates hierarchical design and IP reuse to cope with design complexity. Besides, interconnect delay becomes dominant to overall circuit performance. These critical factors require some modules to be placed along designated boundaries to effectively facilitate hierarchical design and interconnection optimization related problems. In this paper, boundary constraints of general floorplan are solved smoothly based on the novel representation Single-Sequence (SS). Necessary and sufficient conditions of rooms along specified boundaries of a floorplan are proposed and proved. By assigning constrained modules to proper boundary rooms, our proposed algorithm always guarantees a feasible SS code with appropriate boundary constraints in each perturbation. Time complexity of the proposed algorithm is O(n). Experimental results on MCNC benchmarks show effectiveness and efficiency of the proposed method.
Youhei INOUE Toshihiko TAKAHASHI Ryo FUJIMAKI
A subdivision of a rectangle into rectangular faces with horizontal and vertical line segments is called a rectangular drawing or floorplan. It has been an open problem to determine whether there exist a polynomial time algorithm for computing R(n). We affirmatively solve the problem, that is, we introduce an O(n4)-time and O(n3)-space algorithm for R(n). The algorithm is based on a recurrence for R(n), which is the main result of the paper. We also implement our algorithm and computed R(n) for n 3000.
Xiaoyi WANG Jin SHI Yici CAI Xianlong HONG
It's a trend to consider the power supply integrity at early stage to improve the design quality. Specifically, floorplanning process is modified to improve the power supply as well. In the modified floorplanning process, both the floorplan and power/ground (P/G) network are adjusted to search for optimal floorplan as well as the most robust power supply. In this paper, we propose a novel algorithm to carry out this modified floorplanning. A new analytical method is proposed to estimate the voltage drop while the floorplan is varying constantly. This fast analytical voltage drop estimating method is plugged into the modified floorplanner to speed up the whole floorplanning process. Compared with previous methods, our algorithm can search for the optimal floorplan with consideration of power supply integrity more efficiently and therefore leads to better results. Furthermore, this paper also proposes a novel heuristic method to optimize the topology of P/G network. This optimization algorithm could construct a more robust power supply system. Experimental results show the method can speedup the IR-drop aware floorplanning process by about 10 times and reduce the routing area of P/G network while maintaining the floorplan quality and power supply integrity.