In this paper, we propose an efficient linear ordering algorithm for netlist partitioning. The proposed algorithm incrementally merges two segments which are selected based on the proposed cost function until only one segment remains. The final resultant segment then corresponds to the linear order. Compared to the earlier work, the proposed algorithm yields an average of 11.4% improvement for the ten-way scaled cost partitioning.
Hiro ITO Hideyuki UEHARA Mitsuo YOKOYAMA
Let m 2, n 2, and q 2 be positive integers. Let Sr and Sb be two disjoint sets of points in the plane such that no three points of Sr Sb are collinear, |Sr| = nq, and |Sb| = mq. This paper shows that Kaneko and Kano's conjecture is true, i.e., there are q disjoint convex regions of the plain such that each region includes n points of Sr and m points of Sb. This is a generalization of 2-dimension Ham Sandwich Theorem.
Kengo R. AZEGAMI Atsushi TAKAHASHI Yoji KAJITANI
We improve the algorithm to obtain the min-cut graph of a hyper-graph and show an application to the sub-network extraction problem. The min-cut graph is a directed acyclic graph whose directed cuts correspond one-to-one to the min-cuts of the hyper-graph. While the known approach trades the exactness of the min-cut graph for some speed improvement, our proposed algorithm gives an exact one without substantial computation overhead. By using the exact min-cut graph, an exhaustive algorithm finds an optimal sub-circuit that is extracted by a min-cut from the circuit. By experiments with the industrial data, the proposing method showed a performance enough for practical use.
Trong-Yen LEE Pao-Ann HSIUNG Sao-Jie CHEN
The hardware-software codesign of distributed embedded systems is a more challenging task, because each phase of codesign, such as copartitioning, cosynthesis, cosimulation, and coverification must consider the physical restrictions imposed by the distributed characteristics of such systems. Distributed systems often contain several similar parts for which design reuse techniques can be applied. Object-oriented (OO) codesign approach, which allows physical restriction and object design reuse, is adopted in our newly proposed Distributed Embedded System Codesign (DESC) methodology. DESC methodology uses three types of models: Object Modeling Technique (OMT) models for system description and input, Linear Hybrid Automata (LHA) models for internal modeling and verification, and SES/workbench simulation models for performance evaluation. A two-level partitioning algorithm is proposed specifically for distributed systems. Software is synthesized by task scheduling and hardware is synthesized by system-level and object-oriented techniques. Design alternatives for synthesized hardware-software systems are then checked for design feasibility through rapid prototyping using hardware-software emulators. Through a case study on a Vehicle Parking Management System (VPMS), we depict each design phase of the DESC methodology to show benefits of OO codesign and the necessity of a two-level partitioning algorithm.
Trong-Yen LEE Pao-Ann HSIUNG Sao-Jie CHEN
A novel Multi-Level Partitioning (MLP) technique taking into account real-world constraints for hardware-software partitioning in Distributed Embedded Multiprocessor Systems (DEMS) is proposed. This MLP algorithm uses a gradient metric based on hardware-software cost and performance as the core metric for selection of optimal partitions and consists of three nested levels. The innermost level is a simple binary search that allows quick evaluations of a large number of possible partitions. The middle level iterates over different possible allocations of processors (that execute software) to subsystems. The outermost level iterates over the number of processors and the hardware cost range. Heuristics are applied to each level to avoid the expensive exhaustive search. The application of MLP as a recently purposed Distributed Embedded System Codesign (DESC) methodology shows its feasibility. Comparisons between real-world examples partitioned using MLP and using other existing techniques demonstrate contrasting strengths of MLP. Sharing, clustering, and hierarchical system model are some important features of MLP, which contribute towards producing more optimal partition results.
Jun'ichiro MINAMI Tetsushi KOIDE Shin'ichi WAKABAYASHI
This paper presents a timing-driven iterative improvement circuit partitioning algorithm under path delay constraints for the general delay model. The proposed algorithm is an extension of the Fiduccia & Mattheyses (FM) method so as to handle path delay constraints and consists of the clustering and iterative improvement phases. In the first phase, we reduce the size of a given circuit, with a new clustering algorithm to obtain a partition in a short computation time. Next, the iterative improvement phase based on the FM method is applied, and then a new path-based timing violation removal algorithm is also performed so as to remove all the timing violations. From experimental results for ISCAS89 benchmarks, we have demonstrated that the proposed algorithm can produce the partitions which mostly satisfy the timing constraints.
Mizuki TAKAHASHI Nagisa ISHIURA Akihisa YAMADA Takashi KAMBE
This paper presents a method of thread composition in a hardware compiler Bach. Bach synthesizes RT level circuits from a system description written in Bach-C language, where a system is modeled as communicating processes running in parallel. The system description is decomposed into threads, i.e., strings of sequential processes, by grouping processes which are not executed in parallel. The set of threads are then converted into behavioral VHDL models and passed to a behavioral synthesizer. The proposed method attempts to find a thread configuration that maximize resource sharing among processes in the threads. Experiments on two real designs show that the circuit sizes were reduced by 3.7% and 14.7%. We also show the detailed statistics and analysis of the size of the resulting gate level circuits.
Hiromasa FUJII Kouhei MIZUNO Takahiko SABA Iwao SASASE
In cellular systems, autonomous reuse partitioning (ARP) is one of the channel assignment strategy which attains the high spectral efficiency. In the strategy, the movement of mobile stations (MSs) causes the disturbance of reuse partition. Furthermore the smaller cell size causes the spectral efficiency worse. In this paper, we propose a new ARP strategy with reuse partitioning reconstructing, named RP-reconstructing ARP strategy, for microcellular systems. We evaluate the performance of the proposed strategy with blocking rate and forced call termination rate by the computer simulation. The results show that the system with the proposed strategy accommodates 1.5 times as many users as the system with ARP does.
For cluster systems consisting of multiple nodes and shared servers which consist of an on-line and a backup server, we propose a hot-standby scheme of shared servers. In this scheme for shared servers, the shared servers have user data and control data. The on-line shared server sends only the control data to the backup server when it receives an update command. When the on-line shared server fails, the backup shared server reconstructs the shared data by using the latest control data sent from the on-line server and the user data sent from each node. We evaluated the system recovery time and the performance overhead for the hot-standby scheme. This enables the system recovery time to be shortened to 30 seconds and the performance overhead to be reduced to 2%.
Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
This paper proposes a hardware/software cosynthesis system for digital signal processor cores and a hardware/software partitioning algorithm which is one of the key issues for the system. The target processor has a VLIW-type core which can be composed of a processor kernel, multiple data memory buses (X-bus and Y-bus), hardware loop units, addressing units, and multiple functional units. The processor kernel includes five pipeline stages (RISC-type kernel) or three pipeline stages (DSP-type kernel). Given an application program written in the C language and a set of application data, the system synthesizes a processor core by selecting an appropriate kernel (RISC-type or DSP-type kernel) and required hardware units according to the application program/data and the hardware costs. The system also generates the object code for the application program and a software environment (compiler and simulator) for the processor core. The experimental results demonstrate that the system synthesizes processor cores effectively according to the features of an application program and the synthesized processor cores execute most application programs with the minimum number of clock cycles compared with several existing processors.
Seung-June KYOUNG Kwang-Su SEONG In-Cheol PARK Chong-Min KYUNG
Clustering is almost essential in improving the performance of iterative partitioning algorithms. In this paper, we present a clustering algorithm based on the following observation: if a group of cells is assigned to the same partition in numerous local optimum solutions, it is desirable to merge the group into a cluster. The proposed algorithm finds such a group of cells from randomly generated local optimum solutions and merges it into a cluster. We implemented a multilevel bipartitioning algorithm (MBP) based on the proposed clustering algorithm. For MCNC benchmark netlists, MBP improves the total average cut size by 9% and the total best cut size by 3-4%, compared with the previous state-of-the-art partitioners.
A variable partition duplex scheme on packet reservation multiple access protocol (VPD-PRMA) is analyzed in this paper. We assume a four-state speech model for a conversational pair and successfully obtain performance measures by approximate Markovian analysis. Analytical results show that they quite fit simulation results; and VPD-PRMA can get higher statistical multiplexing gain than fixed partition duplex (FPD)-PRMA, due to the trunking effect. We further investigate the effect of design parameters of permission probability and enlarged reservation duration on system performance by computer simulation. Simulation results shows that it exists appropriate values for these two design parameters so that the packet dropping probability can be minimized. The adjustment of permission probability can greatly improve the performance of uplink traffic with slight deterioration of the performance of downlink traffic; the provision of enlarged reservation duration scheme can enhance the system performance.
Shih-Chang WANG Jeng-Ping LIN Sy-Yen KUO
In this paper, we propose a novel fault-tolerant multicast algorithm for n-dimensional wormhole routed hypercubes. The multicast algorithm will remain functional if the number of faulty nodes in an n-dimensional hypercube is less than n. Multicast is the delivery of the same message from one source node to an arbitrary number of destination nodes. Recently, wormhole routing has become one of the most popular switching techniques in new generation multicomputers. Previous researches have focused on fault-tolerant one-to-one routing algorithms for n-dimensional meshes. However, little research has been done on fault-tolerant one-to-many (multicast) routing algorithms due to the difficulty in achieving deadlock-free routing on faulty networks. We will develop such an algorithm for faulty hypercubes. Our approach is not based on adding physical or virtual channels to the network topology. Instead, we integrate several techniques such as partitioning of nodes, partitioning of channels, node label assignments, and dual-path multicast to achieve fault tolerance. Both theoretical analysis and simulation are performed to demonstrate the effectiveness of the proposed algorithm.
Nguyen Ngoc BINH Masaharu IMAI Yoshinori TAKEUCHI
In designing ASIPs (Application Specific Integrated Processors), the papers investigated so far have almost focused on the optimization of the CPU core and did not pay enough attention to the optimization of the RAM and ROM sizes together. This paper overcomes this limitation and proposes an optimization algorithm to define the best ratio between the CPU core, RAM and ROM of an ASIP chip to achieve the highest performance while satisfying design constraints on the chip area. The partitioning problem is formalized as a combinatorial optimization problem that partitions the operations into hardware and software so that the performance of the designed ASIP is maximized under given chip area constraint, where the chip area includes the HW cost of the register file for a given application program with associated input data set. The optimization problem is parameterized so that it can be applied with different technologies to synthesize CPU cores, RAMs or ROMs. The experimental results show that the proposed algorithm is found to be effective and efficient.
Motohiko ISAKA Robert H. MORELOS-ZARAGOZA Marc P. C. FOSSORIER Shu LIN Hideki IMAI
Unequal error protection (UEP) is a very promising coding technique for satellite broadcasting, as it gradually reduces the transmission rate. From the viewpoint of bandwidth efficiency, UEP should be achieved in the context of multilevel coded modulation. However, the conventional mapping between encoded bits and modulation signals, usually realized for multilevel block modulation codes and multistage decoding, is not very compatible with UEP coding because of the large number of resulting nearest neighbor codewords. In this paper, new coded modulation schemes for UEP based on unconventional partitioning are proposed. A linear operation referred to as interlevel combination is introduced. This operation generalizes previous partitioning proposed for UEP applications and provides additional flexibility with respect to UEP capabilities. The error performance of the proposed codes are evaluated both by computer simulations and a theoretical analysis. The obtained results show that the proposed codes achieve good tradeoff between the proportion and the error performance of each error protection level.
Kazuhiko IWASAKI Hiroyuki GOTO
The exact expected test lengths of pseudo-random patterns that are generated by LFSRs are theoretically analyzed for a CUT containing hard random-pattern-resistant faults. The exact expected test lengths are also analyzed when more than one primitive polynomials are selected.
Shyh-Jong CHEN Rung-Ji SHANG Xian-June HUANG Shang-Jang RUAN Feipei LAI
By treating each different output pattern as a state, we propose a low power architecture for pipelined circuits using bipartition. It is possible that the output of a pipelined circuit transit mainly among some of different states. If some few states dominate most of the time, we could partition the combinational portion of a pipelined circuit into two blocks: one that contains the few states with high activity is small and the other that contains the remainder with low activity is big. The original pipelined circuit is bipartitioned into two individual pipelined circuits. An additional combination logic block is introduced to control which of the two partitioned blocks to work. Power reduction is based on the observation that most time the small block is at work and the big one is at idle. In order to minimize the power consumption of this architecture, we present an algorithm that can improve the efficiency of this additional control block. Experiments with MCNC benchmarks show high percentage of power saving by using our new architecture for low power pipelined circuit design.
Hiroyoshi WATANABE Masayuki ARAI Kenzo OKUDA
In this paper, we propose an algorithm of classification by feature partitioning (CFP) which learns concepts in the batch mode. The proposed algorithm achieved almost the same predictive accuracies as the best results of a CFP algorithm presented by Guvenir and Sirin. However, our algorithm is not affected by parameters and the order of examples.
Takayuki SAITO Yoshiyasu TAKEFUJI
The graph partitioning problem is a famous combinatorial problem and has many applications including VLSI circuit design, task allocation in distributed computer systems and so on. In this paper, a novel neural network for the m-way graph partitioning problem is proposed where the maximum neuron model is used. The undirected graph with weighted nodes and weighted edges is partitioned into several subsets. The objective of partitioning is to minimize the sum of weights on cut edges with keeping the size of each subset balanced. The proposed algorithm was compared with the genetic algorithm. The experimental result shows that the proposed neural network is better or comparable with the other existing methods for solving the m-way graph partitioning problem in terms of the computation time and the solution quality.
Hidenori SATO Hiroaki MATSUDA Akira ONOZAWA
This paper presents a clock routing technique called Balanced-Mesh Method (BMM) which incorporates the advantages of two famous conventional-clock-routing techniques. One is the balanced-tree method (BTM) where the clock net is routed as a tree so that the delay times of clock signal are balanced, and the other is the fixed-mesh method (FMM) where the clock net is routed as a fixed mesh driven by a large buffer. In BMM, the clock net is routed as a set of relatively small meshes of interconnects driven by relatively small buffers. Each mesh covers an area called a Mesh-Routing Region (MR) in which its delay and skew can be suppressed within a certain range. These small meshes are connected by a balanced tree with the chip clock source as its root. To implement BMM, we developed an MR-partitioning program that partitions the circuit into MR's according to a set of pre-determined constraints on the number of flip-flops and the area in each MR, and a clock-global-routing program that provides each mesh routing and the tree routing connecting meshes. We applied BMM to the design of an MPEG2-encoder LSI and achieved a skew of 210ps. In addition, the experimental results show BMM yields the lowest power dissipation compared to conventional methods.