The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Nobuyuki GOTO(4hit)

1-4hit
  • Power and Area Minimization by Reorganizing CMOS Complex-Gates

    Masayoshi TACHIBANA  Sachiko KUROSAWA  Reiko NOJIMA  Naohito KOJIMA  Masaaki YAMADA  Takashi MITSUHASHI  Nobuyuki GOTO  

     
    PAPER

      Vol:
    E79-A No:3
      Page(s):
    312-320

    This paper proposes a method for achieving low-power control-logic modules using a combination of CMOS complex gate reorganization, transistor size optimization, and transistor layout. Complex gate reorganization minimizes transistor count and net count without changing the functionality of the circuit. Transistor sizing and layout are interdependent, the optimization of one results in the optimization of the other. The authors applied the reorganization method to a 10,846-transistor circuit, and succeeded in reducing the transistor count by 10%, and the net count by 9%. Transistor sizing and layout compaction reduced the average transistor size by one tenth, while the same delay was maintained. Total circuit capacitance, which is strongly related to power dissipation, was cut to 36%, even when wiring capacitances were dominant.

  • Performance Evaluation of a Processing Element for an On-Chip Multiprocessor

    Masafumi TAKAHASHI  Hiroshige FUJII  Emi KANEKO  Takeshi YOSHIDA  Toshinori SATO  Hiroyuki TAKANO  Haruyuki TAGO  Seigo SUZUKI  Nobuyuki GOTO  

     
    PAPER

      Vol:
    E77-C No:7
      Page(s):
    1092-1100

    A 250-MIPS, 125-MFLOPS peak performance processing element (PE), which is being developed for an on-chip multiprocessor, has been modeled and evaluated. The PE includes the following new architecture components: an FPU shared by several IUs in order to increase the efficiency of the FPU pipelines, an on-chip data cache with a prefetch mechanism to reduce clock cycles waiting for memory, and an interface to high speed DRAM, such as Rambus DRAM and Synchronous DRAM. As a result, a PE model with an FPU shared by four or eight IUs causes only 10% performance reduction compared to a model with an un-shared FPU model while saving the cost of three FPUs. Furthermore, a PE model with prefetch operates 1.2 to 1.8 times faster than a model without prefetch at 250-MHz clock rate when the Rambus DRAM is connected. It becomes clear that this PE architecture can bring a high effective performance at over 250-MHz, and is cost-effective for the on-chip multiprocessor.

  • Synergistic Power/Area Optimization with Transistor Sizing and Wire Length Minimization

    Masaaki YAMADA  Sachiko KUROSAWA  Reiko NOJIMA  Naohito KOJIMA  Takashi MITSUHASHI  Nobuyuki GOTO  

     
    PAPER-DA/Architecture

      Vol:
    E78-C No:4
      Page(s):
    441-446

    The paper ptoposes a method to synthesize low-power control-logic modules by combining transistor-size optimization and transistor layout. Transistor sizing and layout work synergistically to achieve power/area optimization. Transistor size minimization provides more spaces for layout to be compacted. Layout compaction results in shorter wire length (i.e. smaller load capacitance), which allows transistors to become smaller. The details of transistor sizing and layout compaction are also described. When applied to circuits with up to 10,000 transistors, the optimizer reduced the average transistor size to one eighth while maintaining the same delay. The power dissipation is cut to half even when wiring capacitances are dominant.

  • A Hierarchical Global Router for Mscro-Block-Embedded Sea-of-Gates

    Mototaka KURIBAYASHI  Masaaki YAMADA  Takashi MITSUHASHI  Nobuyuki GOTO  

     
    PAPER

      Vol:
    E76-A No:10
      Page(s):
    1694-1704

    A fast and efficient heuristic hierarchical global router for Sea-of-Gates(SOG) with embedded macro-blocks is described. The key point in the method is carry out a new optimal domain decomposition scheduling at every hierarchical level. This scheduling is intended to avoid macro-block-through wirings and to reduce wiring congestion near macro-blocks which may occur at lower levels. The new global router yielded superior results compared with previous hierarchical routers and a non-hierarchical maze router by evaluating with several actual SOG circuits including a 300K gate master chip and benchmark data supplied from MCNC. Overflows were reduced to one-half or one-quarter for macro-block embedded data compared with previous hierarchical routers. Concerning the running time, the router remarkably outperformed the non-hierarchical maze router, which took more than 390 times longer time for the tested large data.