The search functionality is under construction.

Author Search Result

[Author] Hideharu AMANO(66hit)

61-66hit(66hit)

  • MINC: Multistage Interconnection Network with Cache Control Mechanism

    Toshihiro HANAWA  Takayuki KAMEI  Hideki YASUKAWA  Katsunobu NISHIMURA  Hideharu AMANO  

     
    PAPER-Interconnection Networks

      Vol:
    E80-D No:9
      Page(s):
    863-870

    A novel approach to the cache coherent Multistage Interconnection Network (MIN) called the MINC (MIN with Cache control mechanism) is proposed. In the MINC, the directory is located only on the shared memory using the Reduced Hierarchical Bit-map Directory schemes (RHBDs). In the RHBD, the bit-map directory is reduced and carried in the packet header for quick multicasting without accessing the directory in each hierarchy. In order to reduce unnecessary packets caused by compacting the bit map in the RHBD, a small cache called the pruning cache is introduced in the switching element. The simulation reveals the pruning cache works most effectively when it is provided in every switching element of the first stage, and it reduces the congestion more than 50% with only 4 entries. The MINC cache control chip with 16 inputs/outputs is implemented on the LPGA (Laser Programmable Gate Array), and works with a 66 MHz clock.

  • Architecture and Evaluation of a Third-Generation RHiNET Switch for High-Performance Parallel Computing

    Hiroaki NISHI  Shinji NISHIMURA  Katsuyoshi HARASAWA  Tomohiro KUDOH  Hideharu AMANO  

     
    PAPER

      Vol:
    E86-D No:10
      Page(s):
    1987-1995

    RHiNET-3/SW is the third-generation switch used in the RHiNET-3 system. It provides both low-latency processing and flexible connection due to its use of a credit-based flow-control mechanism, topology-free routing, and deadlock-free routing. The aggregate throughput of RHiNET-3/SW is 80 Gbps, and the latency is 140 ns. RHiNET-3/SW also provides a hop-by-hop retransmission mechanism. Simulation demonstrated that the effective throughput at a node in a 64-node torus RHiNET-3 system is equivalent to the effective throughput of a 64-bit 33-MHz PCI bus and that the performance of RHiNET-3/SW almost equals or exceeds the best performance of RHiNET-2/SW, the second-generation switch. Although credit-based flow control requires 26% more gates than rate-based flow control to manage the virtual channels (VCs), it requires less VC memory than rate-based flow control. Moreover, its use in a network system reduces latency and increases the maximum throughput compared to rate-based flow control.

  • Proxy Responses by FPGA-Based Switch for MapReduce Stragglers

    Koya MITSUZUKA  Michihiro KOIBUCHI  Hideharu AMANO  Hiroki MATSUTANI  

     
    PAPER-Computer System

      Pubricized:
    2018/06/15
      Vol:
    E101-D No:9
      Page(s):
    2258-2268

    In parallel processing applications, a few worker nodes called “stragglers”, which execute their tasks significantly slower than other tasks, increase the execution time of the job. In this paper, we propose a network switch based straggler handling system to mitigate the burden of the compute nodes. We also propose how to offload detecting stragglers and computing their results in the network switch with no additional communications between worker nodes. We introduce some approximate techniques for the proxy computation and response at the switch; thus our switch is called “ApproxSW.” As a result of a simulation experiment, the proposed approximation based on task similarity achieves the best accuracy in terms of quality of generated Map outputs. We also analyze how to suppress unnecessary proxy computation by the ApproxSW. We implement ApproxSW on NetFPGA-SUME board that has four 10Gbit Ethernet (10GbE) interfaces and a Virtex-7 FPGA. Experimental results shows that the ApproxSW functions do not degrade the original 10GbE switch performance.

  • Vertical Link On/Off Regulations for Inductive-Coupling Based Wireless 3-D NoCs

    Hao ZHANG  Hiroki MATSUTANI  Yasuhiro TAKE  Tadahiro KURODA  Hideharu AMANO  

     
    PAPER-Computer System

      Vol:
    E96-D No:12
      Page(s):
    2753-2764

    We propose low-power techniques for wireless three-dimensional Network-on-Chips (wireless 3-D NoCs), in which the connections among routers on the same chip are wired while the routers on different chips are connected wirelessly using inductive-coupling. The proposed low-power techniques stop the clock and power supplies to the transmitter of the wireless vertical links only when their utilizations are higher than the threshold. Meanwhile, the whole wireless vertical link will be shut down when the utilization is lower than the threshold in order to reduce the power consumption of wireless 3-D NoCs. This paper uses an on-demand method, in which the dormant data transmitter or the whole vertical link will be activated as long as a flit comes. Full-system many-core simulations using power parameters derived from a real chip implementation show that the proposed low-power techniques reduce the power consumption by 23.4%-29.3%, while the performance overhead is less than 2.4%.

  • Implementation of Data Driven Applications on a Multi-Context Reconfigurable Device

    Masaki UNO  Yuichiro SHIBATA  Hideharu AMANO  

     
    PAPER

      Vol:
    E86-D No:5
      Page(s):
    841-849

    WASMII is a virtual hardware system that executes dataflow algorithms using a dynamically reconfigurable multi-context device with a data driven control mechanism. Although the effectiveness of the system has been evaluated through simulations and using an emulator, implementation of WASMII was infeasible due to the unavailability of such a device. However, the first prototype of a practical dynamically reconfigurable multi-context device called DRL has been developed by NEC, and we developed a reconfigurable test bed using four sample DRL chips. On this board, we have implemented and executed some simple applications of WASMII mechanism. Evaluation results show that the performance of the parallel implementation of WASMII is almost twice as that of a PC with a CPU based on the corresponding technology.

  • Code Compression with Split Echo Instructions

    Iver STUBDAL  Arda KARADUMAN  Hideharu AMANO  

     
    PAPER-Fundamentals of Software and Theory of Programs

      Vol:
    E92-D No:9
      Page(s):
    1650-1656

    Code density is often a critical issue in embedded computers, since the memory size of embedded systems is strictly limited. Echo instructions have been proposed as a method for reducing code size. This paper presents a new type of echo instruction, split echo, and evaluates an implementation of both split echo and traditional echo instructions on a MIPS R3000 based processor. Evaluation results show that memory requirement is reduced by 12% on average with small additional hardware cost.

61-66hit(66hit)