The search functionality is under construction.

Author Search Result

[Author] Hiroshige FUJII(3hit)

1-3hit
  • Hiding Data Cache Latency with Load Address Prediction

    Toshinori SATO  Hiroshige FUJII  Seigo SUZUKI  

     
    PAPER-Computer Systems

      Vol:
    E79-D No:11
      Page(s):
    1523-1532

    A new prediction method for the effective address is presented. This method works with the buffer named the address prediction buffer, and allows the data cache to be accessed speculatively. As a consequence of the trend toward increasing clock frequency, the internal cache is no longer able to fill the speed gap between the processor and the external memory, and the data cache latency degrades the processor performance. In order to hide this latency, the prediction method is proposed. By this method, the load address is predicted, and the data is fetched earlier than the memory access stage. In the case that the prediction is correct, the latency is hidden. Even if the prediction is incorrect, the performance is not degraded by any miss penalties. We have found that the prediction accuracy is 81.9% on average, and thus the performance is improved by 6.6% on average and a maximum of 12.1% for the integer programs.

  • Performance Evaluation of a Processing Element for an On-Chip Multiprocessor

    Masafumi TAKAHASHI  Hiroshige FUJII  Emi KANEKO  Takeshi YOSHIDA  Toshinori SATO  Hiroyuki TAKANO  Haruyuki TAGO  Seigo SUZUKI  Nobuyuki GOTO  

     
    PAPER

      Vol:
    E77-C No:7
      Page(s):
    1092-1100

    A 250-MIPS, 125-MFLOPS peak performance processing element (PE), which is being developed for an on-chip multiprocessor, has been modeled and evaluated. The PE includes the following new architecture components: an FPU shared by several IUs in order to increase the efficiency of the FPU pipelines, an on-chip data cache with a prefetch mechanism to reduce clock cycles waiting for memory, and an interface to high speed DRAM, such as Rambus DRAM and Synchronous DRAM. As a result, a PE model with an FPU shared by four or eight IUs causes only 10% performance reduction compared to a model with an un-shared FPU model while saving the cost of three FPUs. Furthermore, a PE model with prefetch operates 1.2 to 1.8 times faster than a model without prefetch at 250-MHz clock rate when the Rambus DRAM is connected. It becomes clear that this PE architecture can bring a high effective performance at over 250-MHz, and is cost-effective for the on-chip multiprocessor.

  • Implementation Techniques for Fast OBDD Dynamic Variable Reordering

    Hiroshige FUJII  

     
    PAPER

      Vol:
    E78-A No:12
      Page(s):
    1729-1734

    Ordered binary decision diagrams (OBDDs) have been widely used in many CAD applications as efficient data structures for representing and manipulating Boolean functions. For the efficient use of the OBDD, it is essential to find a good variable order, because the size of the OBDD heavily depends on its variable order. Dynamic variable reordering is a promising solution to the variable ordering problem of the OBDD. Dynamic variable reordering with the sifting algorithm is especially effective in minimizing the size of the OBDD and reduces the need to find a good initial variable order. However, it is very time-consuming for practical use. In this paper, we propose two new implementation techniques for fast dynamic variable reordering. One of the proposed techniques reduces the number of variable swaps by using the lower bound of the OBDD size, and the other accelerates the variable swap itself by recording the node states before the swap and the pivot nodes of the swap. By using these new techniques, we have achieved the speed-up ranging from 2.5 to 9.8 for benchmark circuits. These techniques have reduced the disadvantage of dynamic variable reordering and have made it more attractive for users.