The search functionality is under construction.

Author Search Result

[Author] Masao SATO(10hit)

1-10hit
  • A CAM-Based Parallel Fault Simulation Algorithm with Minimal Storage Size

    Shinsuke OHNO  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E78-A No:12
      Page(s):
    1755-1764

    CAMs (Content Addressable Memories) are functional memories which have functions such as word-parallel equivalence search, bilateral 1-bit data shifting between consecutive words, and word-parallel writing. Since CAMs can be integrated because of their regular structure, massively parallel CAM functions can be executed. Taking advantage of CAMs, Ishiura and Yajima have proposed a parallel fault simulation algorithm using a CAM. This algorithm, however, requires a large amount of CAM storage to simulate large-scale circuits. In this paper, we propose a new massively parallel fault simulation algorithm requiring less CAM storage, and compare it with Ishiura and Yajima's algorithm. Experimental results of the algorithm on CHARGE --the CAM-based hardware engine developed in our laboratory--are also reported.

  • Simultaneous Placement and Global Routing for Transport-Processing FPGA Layout

    Nozumu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E79-A No:12
      Page(s):
    2140-2150

    Transport-processing FPGAs have been proposed for flexible telecommunication systems. Since those FPGAs have finer granularity of logic functions to implement circuits on them, the amount of routing resources tends to increase. In order to keep routing congstion small, it is necessary to execute placement and routing simultaneously. This paper proposes a simultaneous placement and global routing algorithm for transport-processing FPGAs whose primary objective is minimizing routing congestion. The algorithm is based on hierarchical bipartition of layout regions and sets of LUTs (Look Up Tables) to be placed. It achieves bipartitioning which leads to small routing congestion by applying a network flow technique to it and computing a maximum flow and a minimum cut. If there exist connections between bipartitioned LUT sets, pairs of pseudo-terminals are introduced to preserve the connections. A sequence of pseudo-terminals represents a global route of each net. As a result, both placement of LUTs and global routing are determined when hierarchical bipartitioning procedures are finished. The proposed algorithm has been implemented and applied to practical transport-processing circuits. The experimental results demonstrate that it decreases routing congestion by an average of 37% compared with a conventional algorithm and achieves 100% routing for the circuits for which the conventional algorithm causes unrouted nets.

  • Optimal Constraint Graph Generation Algorithm for Layout Compaction Using Enhanced Plane-Sweep Method

    Toru AWASHIMA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E76-A No:4
      Page(s):
    507-512

    This paper presents an optimal constraint graph generation algorithm for graph-based one-dimensional layout compaction. The first published algorithm for this problem was the shadow-propagation algorithm. However, without sophisticated implementation of a shadow-front, complexity of the algorithm could fall into O(n2), where n is the number of layout objects. Although our algorithm, called the enhanced plane-sweep based graph generation algorithm, is an extension of the shadow-propagation algorithm, such a drawback is resolved by introducing an enhanced plane-sweep technique. The algorithm maintains multiple shadow-fronts simultaneously by storing them in a work-list called previous-boundary. Since a balanced search tree is selected for implementation of the worklist, total complexity of the algorithm is O(n log n) which is optimal. Experimental results show that the enhanced plane-sweep based graph generation algorithm runs in almost linear time with respect to the number of layout objects and is faster than the perpendicular plane-sweep algorithm which is also optimal in terms of time complexity.

  • A VLSI Geometrical Design Rule Verification Accelerated by CAM-Based Hardware Engine

    Tetsuro TAKIZAWA  Kazuto KUBOTA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER-VLSI Design Technology

      Vol:
    E74-A No:10
      Page(s):
    3072-3077

    VLSI technology has matured to the extent that hundreds of thousands or even millions of transistors can be integrated in a single chip. Layout data for VLSIs consist of around ten mask layers, and the total number of polygons on the layers is about ten times larger than the number of circuit elements. In order to deal with such a large number of polygons, algorithms for mask pattern processing are usually based on the underlying assumption that the whole data is accommodated in the secondary storage and that only those patterns within a few consecutive thin slits of the plane can be processed in the main memory (work list). This is called the work-list method. A linear time geometrical design rule verification algorithm is presented in this paper. This algorithm is based on the work-list method. Content Addressable Memory (CAM) is introduced to implement the work list, so as to make the algorithm run in linear time with linear memory space. Data in RAM can be accessed only by its address, whereas data in CAM is accessed not only by address but also by a content which matches a given referential data. This function is called equivalence search. An equivalence search is executed in constant time independent of the amount of data. The advantages of CAM for our algorithm are summarized as follows. (1) It provides the flexibility to deal with a variety of geometrical search problems for VLSI design. (2) Each geometrical search is done in constant time. (3) Complicated coding for sophisticated data structures depending on problems is not necessary, unlike software implementation.

  • Maple: A Simultaneous Technology Mapping, Placement, and Global Routing Algorithm for Field-Programmable Gate Arrays

    Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E77-A No:12
      Page(s):
    2028-2038

    Technology mapping algorithms for LUT (Look Up Table) based FPGAs have been proposed to transfer a Boolean network into logic-blocks. However, since those algorithms take no layout information into account, they do not always lead to excellent results. In this paper, a simultaneous technology mapping, placement and global routing algorithm for FPGAs, Maple, is presented. Maple is an extended version of a simultaneous placement and global routing algorithm for FPGAs, which is based on recursive partition of layout regions and block sets. Maple inherits its basic process and executes the technology mapping simultaneously in each recursive process. Therefore, the mapping can be done with the placement and global routing information. Experimental results for some benchmark circuits demonstrate its efficiency and effectiveness.

  • A performance-Oriented Simultaneous Placement and Global Routing Algorithm for Transport-Processing FPGAs

    Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E80-A No:10
      Page(s):
    1795-1806

    In layout design of transport-processing FPGAs, it is required that not only routing congestion is kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and global routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within a specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (Look Up Table) sets to be placed. In each bipartitioning, the algorithm first searches the paths with tighter path length constraints by estimating their path lengths. Second the algorithm proceeds the bipartitioning so that the path lengths of critical paths can be reduced. The algorithm is applied to transport-processing circuits and compared with conventional approaches. The results demonstrate that the algorithm satisfies the path length constraints for 11 out of 13 circuits, though it increases routing congestion by an average of 20%. After detailed routing, it achieves 100% routing for all the circuits and decreases a circuit delay by an average of 23%.

  • Fast Scheduling and Allocation Algorithms for Entropy CODEC

    Katsuharu SUZUKI  Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER-High Level Synthesis

      Vol:
    E80-D No:10
      Page(s):
    982-992

    Entropy coding/decoding are implemented on FPGAs as a fast and flexible system in which high-level synthesis technologies are key issues. In this paper, we propose scheduling and allocation algorithms for behavioral descriptions of entropy CODEC. The scheduling algorithm employs a control-flow graph as input and finds a solution with minimal hardware cost and execution time by merging nodes in the control-flow graph. The allocation algorithm assigns operations to operators with various bit lengths. As a result, register-transfer level descriptions are efficiently obtained from behavioral descriptions of entropy CODEC with complicated control flow and variable bit lengths. Experimental results demonstrate that our algorithms synthesize the same circuits as manually designed within one second.

  • A Circuit Partitioning Algorithm with Replication Capability for Multi-FPGA Systems

    Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E78-A No:12
      Page(s):
    1765-1776

    In circuit partitioning for FPGAs, partitioned signal nets are connected using I/O blocks, through which signals are coming from or going to external pins. However, the number of I/O blocks per chip is relatively small compared with the number of logic-blocks, which realize logic functions, accommodated in the FPGA chip. Because of the I/O block limitation, the size of a circuit implemented on each FPGA chip is usually small, which leads to a serious decrease of logic-block utilization. It is required to utilize unused logic-blocks in terms of reducing the number of I/O blocks and realize circuits on given FPGA chips. In this paper, we propose an algorithm which partitions an initial circuit into multi-FPGA chips. The algorithm is based on recursive bi-partitioning of a circuit. In each bi-partitioning, it searches a partitioning position of a circuit such that each of partitioned subcircuits is accommodated in each FPGA chip with making the number of signal nets between chips as small as possible. Such bi-partitioning is achieved by computing a minimum cut repeatedly applying a network flow technique, and replicating logic-blocks appropriately. Since a set of logic-blocks assigned to each chip is computed separately, logic-blocks to be replicated are naturally determined. This means that the algorithm makes good use of unused logic-blocks from the viewpoint of reducing the number of signal nets between chips, i.e. the number of required I/O blocks. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it decreases the maximum number of I/O blocks per chip by a maximum of 49% compared with conventional algorithms.

  • A Circuit Partitioning Algorithm with Path Delay Constraints for Multi-FPGA Systems

    Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E80-A No:3
      Page(s):
    494-505

    In this paper, we extend the circuit partitioning algorithm which we have proposed for multi-FPGA systems and present a new algorithm in which the delay of each critical signal path is within a specified upper bound imposed on it. The core of the presented algorithm is recursive bipartitioning of a circuit. The bipartitioning procedure consists of three stages: 0) detection of critical paths; 1) bipartitioning of a set of primary inputs and outputs; and 2) bipartitioning of a set of logic-blocks. In 0), the algorithm computes the lower bounds of delays for paths with path delay constraints and detects the critical paths based on the difference between the lower and upper bound dynamically in every bipartitioning procedure. The delays of the critical paths are reduced with higher priority. In 1), the algorithm attempts to assign the primary inputs and outputs on each critical path to one chip so that the critical path does not cross between chips. Finally in 2), the algorithm not only decreases the number of crossings between chips but also assigns the logic-blocks on each critical path to one chip by exploiting a network flow technique. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it resolves almost all path delay constraints with maintaining the maximum number of required I/O blocks per chip small compared with conventional alogorithms.

  • A Simultaneous Technology Mapping, Placement, and Global Routing Algorithm for FPGAs with Path Delay Constraints

    Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E79-A No:3
      Page(s):
    321-329

    In this paper, we propose a new FPGA design algorithm, Maple-opt, in which technology mapping, placement, and global routing are executed so that the delay of each critical signal path in an input circuit is within a specified upper bound imposed on it. The basic algorithm of Maple-opt is top-down hi-erarchical bi-partitioning of regions. Technology mapping onto logic-blocks of FPGAs, their placement, and global routing are determined simulatenously in each hierarchical process. This simultaneity leads to less congested layout for routing. In addition to that, Maple-opt computes a lower bound of delay for each path with a constraint value and determines critical paths based on the difference between the lower bound and the constraint value dynamically in each hierarchical process. Two delay reduction processes are executed for the critical paths; one is routing delay reduction and the other is logic-block delay reduction. Routing delay reduction is realized such that, when bi-partitioning a region, each constrained path is assigned to one subregion. Logic-block delay reduction is realized such that each constrained path is mapped onto fewer logic-blocks. Experimental results for some benchmark circuits show its efficiency and effectiveness.