IEICE global.ieice.org Site

Author Search Result

[Author] Hideharu AMANO(66hit)

61-66hit(66hit)

MINC: Multistage Interconnection Network with Cache Control Mechanism
Toshihiro HANAWA Takayuki KAMEI Hideki YASUKAWA Katsunobu NISHIMURA Hideharu AMANO

PAPER-Interconnection Networks

Vol:
E80-D No:9
Page(s):
863-870
A novel approach to the cache coherent Multistage Interconnection Network (MIN) called the MINC (MIN with Cache control mechanism) is proposed. In the MINC, the directory is located only on the shared memory using the Reduced Hierarchical Bit-map Directory schemes (RHBDs). In the RHBD, the bit-map directory is reduced and carried in the packet header for quick multicasting without accessing the directory in each hierarchy. In order to reduce unnecessary packets caused by compacting the bit map in the RHBD, a small cache called the pruning cache is introduced in the switching element. The simulation reveals the pruning cache works most effectively when it is provided in every switching element of the first stage, and it reduces the congestion more than 50% with only 4 entries. The MINC cache control chip with 16 inputs/outputs is implemented on the LPGA (Laser Programmable Gate Array), and works with a 66 MHz clock.
Architecture and Evaluation of a Third-Generation RHiNET Switch for High-Performance Parallel Computing
Hiroaki NISHI Shinji NISHIMURA Katsuyoshi HARASAWA Tomohiro KUDOH Hideharu AMANO

PAPER

Vol:
E86-D No:10
Page(s):
1987-1995
RHiNET-3/SW is the third-generation switch used in the RHiNET-3 system. It provides both low-latency processing and flexible connection due to its use of a credit-based flow-control mechanism, topology-free routing, and deadlock-free routing. The aggregate throughput of RHiNET-3/SW is 80 Gbps, and the latency is 140 ns. RHiNET-3/SW also provides a hop-by-hop retransmission mechanism. Simulation demonstrated that the effective throughput at a node in a 64-node torus RHiNET-3 system is equivalent to the effective throughput of a 64-bit 33-MHz PCI bus and that the performance of RHiNET-3/SW almost equals or exceeds the best performance of RHiNET-2/SW, the second-generation switch. Although credit-based flow control requires 26% more gates than rate-based flow control to manage the virtual channels (VCs), it requires less VC memory than rate-based flow control. Moreover, its use in a network system reduces latency and increases the maximum throughput compared to rate-based flow control.
Proxy Responses by FPGA-Based Switch for MapReduce Stragglers
Koya MITSUZUKA Michihiro KOIBUCHI Hideharu AMANO Hiroki MATSUTANI

PAPER-Computer System

Pubricized:
2018/06/15
Vol:
E101-D No:9
Page(s):
2258-2268
In parallel processing applications, a few worker nodes called “stragglers”, which execute their tasks significantly slower than other tasks, increase the execution time of the job. In this paper, we propose a network switch based straggler handling system to mitigate the burden of the compute nodes. We also propose how to offload detecting stragglers and computing their results in the network switch with no additional communications between worker nodes. We introduce some approximate techniques for the proxy computation and response at the switch; thus our switch is called “ApproxSW.” As a result of a simulation experiment, the proposed approximation based on task similarity achieves the best accuracy in terms of quality of generated Map outputs. We also analyze how to suppress unnecessary proxy computation by the ApproxSW. We implement ApproxSW on NetFPGA-SUME board that has four 10Gbit Ethernet (10GbE) interfaces and a Virtex-7 FPGA. Experimental results shows that the ApproxSW functions do not degrade the original 10GbE switch performance.
Vertical Link On/Off Regulations for Inductive-Coupling Based Wireless 3-D NoCs
Hao ZHANG Hiroki MATSUTANI Yasuhiro TAKE Tadahiro KURODA Hideharu AMANO

PAPER-Computer System

Vol:
E96-D No:12
Page(s):
2753-2764
We propose low-power techniques for wireless three-dimensional Network-on-Chips (wireless 3-D NoCs), in which the connections among routers on the same chip are wired while the routers on different chips are connected wirelessly using inductive-coupling. The proposed low-power techniques stop the clock and power supplies to the transmitter of the wireless vertical links only when their utilizations are higher than the threshold. Meanwhile, the whole wireless vertical link will be shut down when the utilization is lower than the threshold in order to reduce the power consumption of wireless 3-D NoCs. This paper uses an on-demand method, in which the dormant data transmitter or the whole vertical link will be activated as long as a flit comes. Full-system many-core simulations using power parameters derived from a real chip implementation show that the proposed low-power techniques reduce the power consumption by 23.4%-29.3%, while the performance overhead is less than 2.4%.
Implementation of Data Driven Applications on a Multi-Context Reconfigurable Device
Masaki UNO Yuichiro SHIBATA Hideharu AMANO

PAPER

Vol:
E86-D No:5
Page(s):
841-849
WASMII is a virtual hardware system that executes dataflow algorithms using a dynamically reconfigurable multi-context device with a data driven control mechanism. Although the effectiveness of the system has been evaluated through simulations and using an emulator, implementation of WASMII was infeasible due to the unavailability of such a device. However, the first prototype of a practical dynamically reconfigurable multi-context device called DRL has been developed by NEC, and we developed a reconfigurable test bed using four sample DRL chips. On this board, we have implemented and executed some simple applications of WASMII mechanism. Evaluation results show that the performance of the parallel implementation of WASMII is almost twice as that of a PC with a CPU based on the corresponding technology.
Code Compression with Split Echo Instructions
Iver STUBDAL Arda KARADUMAN Hideharu AMANO

PAPER-Fundamentals of Software and Theory of Programs

Vol:
E92-D No:9
Page(s):
1650-1656
Code density is often a critical issue in embedded computers, since the memory size of embedded systems is strictly limited. Echo instructions have been proposed as a method for reducing code size. This paper presents a new type of echo instruction, split echo, and evaluates an implementation of both split echo and traditional echo instructions on a MIPS R3000 based processor. Evaluation results show that memory requirement is reduced by 12% on average with small additional hardware cost.

61-66hit(66hit)

Author Search Result

[Author] Hideharu AMANO(66hit)

MINC: Multistage Interconnection Network with Cache Control Mechanism

Architecture and Evaluation of a Third-Generation RHiNET Switch for High-Performance Parallel Computing

Proxy Responses by FPGA-Based Switch for MapReduce Stragglers

Vertical Link On/Off Regulations for Inductive-Coupling Based Wireless 3-D NoCs

Implementation of Data Driven Applications on a Multi-Context Reconfigurable Device

Code Compression with Split Echo Instructions

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles