The search functionality is under construction.

Keyword Search Result

[Keyword] network interface(3hit)

1-3hit
  • AXI-NoC: High-Performance Adaptation Unit for ARM Processors in Network-on-Chip Architectures

    Xuan-Tu TRAN  Tung NGUYEN  Hai-Phong PHAN  Duy-Hieu BUI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E100-A No:8
      Page(s):
    1650-1660

    The increasing demand on scalability and reusability of system-on-chip design as well as the decoupling between computation and communication has motivated the growth of the Network-on-Chip (NoC) paradigm in the last decade. In NoC-based systems, the computational resources (i.e. IPs) communicate with each other using a network infrastructure. Many works have focused on the development of NoC architectures and routing mechanisms, while the interfacing between network and associated IPs also needs to be considered. In this paper, we present a novel efficient AXI (AMBA eXtensible Interface) compliant network adapter for NoC architectures, which is named an AXI-NoC adapter. The proposed network adapter achieves high communication throughput of 20.8Gbits/s and consumes 4.14mW at the operating frequency of 650MHz. It has a low area footprint (952 gates, approximate to 2,793µm2 with CMOS 45nm technology) thanks to its effective hybrid micro-architectures and with zero latency thanks to the proposed mux-selection method.

  • Network Interface Architecture with Scalable Low-Latency Message Receiving Mechanism

    Noboru TANABE  Atsushi OHTA  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2536-2544

    Most of scientists except computer scientists do not want to make efforts for performance tuning with rewriting their MPI applications. In addition, the number of processing elements which can be used by them is increasing year by year. On large-scale parallel systems, the number of accumulated messages on a message buffer tends to increase in some of their applications. Since searching message queue in MPI is time-consuming, system side scalable acceleration is needed for those systems. In this paper, a support function named LHS (Limited-length Head Separation) is proposed. Its performance in searching message buffer and hardware cost are evaluated. LHS accelerates searching message buffer by means of switching location to store limited-length heads of messages. It uses the effects such as increasing hit rate of cache on host with partial off-loading to hardware. Searching speed of message buffer when the order of message reception is different from the receiver's expectation is accelerated 14.3 times with LHS on FPGA-based network interface card (NIC) named DIMMnet-2. This absolute performance is 38.5 times higher than that of IBM BlueGene/P although the frequency is 8.5times slower than BlueGene/P. LHS has higher scalability than ALPU in the performance per frequency. Since these results are obtained with partially on loaded linear searching on old Pentium®4, performance gap will increase using state of art CPU. Therefore, LHS is more suitable for larger parallel systems. The discussions for adopting proposed method to state of art processors and systems are also presented.

  • Design and Implementation of RHiNET-2/NI0: A Reconfigurable Network Interface for Cluster Computing

    Tomonori YOKOYAMA  Naoyuki IZU  Jun-ichiro TSUCHIYA  Konosuke WATANABE  Hideharu AMANO  Tomohiro KUDOH  

     
    PAPER

      Vol:
    E86-D No:5
      Page(s):
    789-795

    A reconfigurable network interface called RHiNET-2/NI0 is developed for parallel processing of PCs distributed within one or more floors of a building. Two configurations: the HS (High Speed) configuration with only a high-speed primitive and the DSM (Distributed Shared Memory) configuration which supports sophisticated primitives can be selected by the network requirement. From the empirical evaluation, it appears that the HS configuration markedly improves the latency of data transfer compared with traditional network interfaces. On the other hand, the DSM configuration executes sophisticated primitives for distributed shared memory more than twice as fast as that of software implementation.