The search functionality is under construction.

Keyword Search Result

[Keyword] interconnect(320hit)

1-20hit(320hit)

  • Testing and Delay-Monitoring for the High Reliability of Memory-Based Programmable Logic Device

    Xihong ZHOU  Senling WANG  Yoshinobu HIGAMI  Hiroshi TAKAHASHI  

     
    PAPER-Dependable Computing

      Pubricized:
    2023/10/03
      Vol:
    E107-D No:1
      Page(s):
    60-71

    Memory-based Programmable Logic Device (MPLD) is a new type of reconfigurable device constructed using a general SRAM array in a unique interconnect configuration. This research aims to propose approaches to guarantee the long-term reliability of MPLDs, including a test method to identify interconnect defects in the SRAM array during the production phase and a delay monitoring technique to detect aging-caused failures. The proposed test method configures pre-generated test configuration data into SRAMs to create fault propagation paths, applies an external walking-zero/one vector to excite faults, and identifies faults at the external output ports. The proposed delay monitoring method configures a novel ring oscillator logic design into MPLD to measure delay variations when the device is in practical use. The logic simulation results with fault injection confirm the effectiveness of the proposed methods.

  • Node-to-Set Disjoint Paths Problem in Cross-Cubes

    Rikuya SASAKI  Hiroyuki ICHIDA  Htoo Htoo Sandi KYAW  Keiichi KANEKO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/10/06
      Vol:
    E107-D No:1
      Page(s):
    53-59

    The increasing demand for high-performance computing in recent years has led to active research on massively parallel systems. The interconnection network in a massively parallel system interconnects hundreds of thousands of processing elements so that they can process large tasks while communicating among others. By regarding the processing elements as nodes and the links between processing elements as edges, respectively, we can discuss various problems of interconnection networks in the framework of the graph theory. Many topologies have been proposed for interconnection networks of massively parallel systems. The hypercube is a very popular topology and it has many variants. The cross-cube is such a topology, which can be obtained by adding one extra edge to each node of the hypercube. The cross-cube reduces the diameter of the hypercube, and allows cycles of odd lengths. Therefore, we focus on the cross-cube and propose an algorithm that constructs disjoint paths from a node to a set of nodes. We give a proof of correctness of the algorithm. Also, we show that the time complexity and the maximum path length of the algorithm are O(n3 log n) and 2n - 3, respectively. Moreover, we estimate that the average execution time of the algorithm is O(n2) based on a computer experiment.

  • Enhancing Cup-Stacking Method for Collective Communication

    Takashi YOKOTA  Kanemitsu OOTSU  Shun KOJIMA  

     
    PAPER-Computer System

      Pubricized:
    2023/08/22
      Vol:
    E106-D No:11
      Page(s):
    1808-1821

    An interconnection network is an inevitable component for constructing parallel computers. It connects computation nodes so that the nodes can communicate with each other. As a parallel computation essentially requires inter-node communication according to a parallel algorithm, the interconnection network plays an important role in terms of communication performance. This paper focuses on the collective communication that is frequently performed in parallel computation and this paper addresses the Cup-Stacking method that is proposed in our preceding work. The key issues of the method are splitting a large packet into slices, re-shaping the slice, and stacking the slices, in a genetic algorithm (GA) manner. This paper discusses extending the Cup-Stacking method by introducing additional items (genes) and proposes the extended Cup-Stacking method. Furthermore, this paper places comprehensive discussions on the drawbacks and further optimization of the method. Evaluation results reveal the effectiveness of the extended method, where the proposed method achieves at most seven percent improvement in duration time over the former Cup-Stacking method.

  • On the Crossing Number of a Torus Network

    Antoine BOSSARD  Keiichi KANEKO  Frederick C. HARRIS, JR.  

     
    PAPER-Graphs and Networks

      Pubricized:
    2022/08/05
      Vol:
    E106-A No:1
      Page(s):
    35-44

    Reducing the number of link crossings in a network drawn on the plane such as a wiring board is a well-known problem, and especially the calculation of the minimum number of such crossings: this is the crossing number problem. It has been shown that finding a general solution to the crossing number problem is NP-hard. So, this problem is addressed for particular classes of graphs and this is also our approach in this paper. More precisely, we focus hereinafter on the torus topology. First, we discuss an upper bound on cr(T(2, k)) the number of crossings in a 2-dimensional k-ary torus T(2, k) where k ≥ 2: the result cr(T(2, k)) ≤ k(k - 2) and the given constructive proof lay foundations for the rest of the paper. Second, we extend this discussion to derive an upper bound on the crossing number of a 3-dimensional k-ary torus: cr(T(3, k)) ≤ 2k4 - k3 - 4k2 - 2⌈k/2⌉⌊k/2⌋(k - (k mod 2)) is obtained. Third, an upper bound on the crossing number of an n-dimensional k-ary torus is derived from the previously established results, with the order of this upper bound additionally established for more clarity: cr(T(n, k)) is O(n2k2n-2) when n ≥ k and O(nk2n-1) otherwise.

  • The Implementation of a Hybrid Router and Dynamic Switching Algorithm on a Multi-FPGA System

    Tomoki SHIMIZU  Kohei ITO  Kensuke IIZUKA  Kazuei HIRONAKA  Hideharu AMANO  

     
    PAPER

      Pubricized:
    2022/06/30
      Vol:
    E105-D No:12
      Page(s):
    2008-2018

    The multi-FPGA system known as, the Flow-in-Cloud (FiC) system, is composed of mid-range FPGAs that are directly interconnected by high-speed serial links. FiC is currently being developed as a server for multi-access edge computing (MEC), which is one of the core technologies of 5G. Because the applications of MEC are sometimes timing-critical, a static time division multiplexing (STDM) network has been used on FiC. However, the STDM network exhibits the disadvantage of decreasing link utilization, especially under light traffic. To solve this problem, we propose a hybrid router that combines packet switching for low-priority communication and STDM for high-priority communication. In our hybrid network, the packet switching uses slots that are unused by the STDM; therefore, best-effort communication by packet switching and QoS guarantee communication by the STDM can be used simultaneously. Furthermore, to improve each link utilization under a low network traffic load, we propose a dynamic communication switching algorithm. In our algorithm, each router monitors the network load metrics, and according to the metrics, timing-critical tasks select the STDM according to the metrics only when congestion occurs. This can achieve both QoS guarantee and efficient utilization of each link with a small resource overhead. In our evaluation, the dynamic algorithm was up to 24.6% faster on the execution time with a high network load compared to the packet switching on a real multi-FPGA system with 24 boards.

  • Boosting the Performance of Interconnection Networks by Selective Data Compression

    Naoya NIWA  Hideharu AMANO  Michihiro KOIBUCHI  

     
    PAPER

      Pubricized:
    2022/07/12
      Vol:
    E105-D No:12
      Page(s):
    2057-2065

    This study presents a selective data-compression interconnection network to boost its performance. Data compression virtually increases the effective network bandwidth. One drawback of data compression is a long latency to perform (de-)compression operation at a compute node. In terms of the communication latency, we explore the trade-off between the compression latency overhead and the reduced injection latency by shortening the packet length by compression algorithms. As a result, we present to selectively apply a compression technique to a packet. We perform a compression operation to long packets and it is also taken when network congestion is detected at a source compute node. Through a cycle-accurate network simulation, the selective compression method using the above compression algorithms improves by up to 39% the network throughput with a moderate increase in the communication latency of short packets.

  • Present Status and Prospect of Graphene Interconnect Applications

    Kazuyoshi UENO  

     
    PAPER

      Pubricized:
    2022/04/21
      Vol:
    E105-C No:10
      Page(s):
    572-577

    Graphene has been expected as an alternative material for copper interconnects in which resistance increases and reliability deteriorates in nanoscale. While the principle advantages are verified by simulations and experiments, they have not been put into practical use due to the immaturity of the manufacturing process leading to mass production. On the other hand, recent steady progress in the fabrication process has increased the possibility of practical application. In this paper, I will review the recent advances and the latest prospects for conductor applications of graphene centered on interconnects. The possibility of further application utilizing the unique characteristics of graphene is discussed.

  • Minimal Paths in a Bicube

    Masaaki OKADA  Keiichi KANEKO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2022/04/22
      Vol:
    E105-D No:8
      Page(s):
    1383-1392

    Nowadays, a rapid increase of demand on high-performance computation causes the enthusiastic research activities regarding massively parallel systems. An interconnection network in a massively parallel system interconnects a huge number of processing elements so that they can cooperate to process tasks by communicating among others. By regarding a processing element and a link between a pair of processing elements as a node and an edge, respectively, many problems with respect to communication and/or routing in an interconnection network are reducible to the problems in the graph theory. For interconnection networks of the massively parallel systems, many topologies have been proposed so far. The hypercube is a very popular topology and it has many variants. The bicube is a such topology and it can interconnect the same number of nodes with the same degree as the hypercube while its diameter is almost half of that of the hypercube. In addition, the bicube keeps the node-symmetric property. Hence, we focus on the bicube and propose an algorithm that gives a minimal or shortest path between an arbitrary pair of nodes. We give a proof of correctness of the algorithm and demonstrate its execution.

  • On a Cup-Stacking Concept in Repetitive Collective Communication

    Takashi YOKOTA  Kanemitsu OOTSU  Shun KOJIMA  

     
    LETTER-Computer System

      Pubricized:
    2022/04/15
      Vol:
    E105-D No:7
      Page(s):
    1325-1329

    Parallel computing essentially consists of computation and communication and, in many cases, communication performance is vital. Many parallel applications use collective communications, which often dominate the performance of the parallel execution. This paper focuses on collective communication performance to speed-up the parallel execution. This paper firstly offers our experimental result that splitting a session of collective communication to small portions (slices) possibly enables efficient communication. Then, based on the results, this paper proposes a new concept cup-stacking with a genetic algorithm based methodology. The preliminary evaluation results reveal the effectiveness of the proposed method.

  • Realization of Multi-Terminal Universal Interconnection Networks Using Contact Switches

    Tsutomu SASAO  Takashi MATSUBARA  Katsufumi TSUJI  Yoshiaki KOGA  

     
    PAPER-Logic Design

      Pubricized:
    2021/04/01
      Vol:
    E104-D No:8
      Page(s):
    1068-1075

    A universal interconnection network implements arbitrary interconnections among n terminals. This paper considers a problem to realize such a network using contact switches. When n=2, it can be implemented with a single switch. The number of different connections among n terminals is given by the Bell number B(n). The Bell number shows the total number of methods to partition n distinct elements. For n=2, 3, 4, 5 and 6, the corresponding Bell numbers are 2, 5, 15, 52, and 203, respectively. This paper shows a method to realize an n terminal universal interconnection network with $ rac {3}{8}(n^2-1)$ contact switches when n=2m+1≥5, and $ rac {n}{8}(3n+2)$ contact switches, when n=2m≥6. Also, it shows that a lower bound on the number of contact switches to realize an n-terminal universal interconnection network is ⌈log 2B(n)⌉, where B(n) is the Bell number.

  • Remote Dynamic Reconfiguration of a Multi-FPGA System FiC (Flow-in-Cloud)

    Kazuei HIRONAKA  Kensuke IIZUKA  Miho YAMAKURA  Akram BEN AHMED  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2021/05/12
      Vol:
    E104-D No:8
      Page(s):
    1321-1331

    Multi-FPGA systems have been receiving a lot of attention as a low cost and energy efficient system for Multi-access Edge Computing (MEC). For such purpose, a bare-metal multi-FPGA system called FiC (Flow-in-Cloud) is under development. In this paper, we introduce the FiC multi FPGA cluster which is applied partial reconfiguration (PR) FPGA design flow to support online user defined accelerator replacement while executing FPGA interconnection network and its low-level multiple FPGA management software called remote PR manager. With the remote PR manager, the user can define the FiC FPGA cluster setup by JSON and control the cluster from user application with the cooperation of simple cluster management tool / library called ficmgr on the client host and REST API service provider called ficwww on Raspberry Pi 3 (RPi3) on each node. According to the evaluation results with a prototype FiC FPGA cluster system with 12 nodes, using with online application replacement by PR and on-the-fly FPGA bitstream compression, the time for FPGA bitstream distribution was reduced to 1/17 and the total cluster setup time was reduced by 21∼57% than compared to cluster setup with full configuration FPGA bitstream.

  • DORR: A DOR-Based Non-Blocking Optical Router for 3D Photonic Network-on-Chips

    Meaad FADHEL  Huaxi GU  Wenting WEI  

     
    PAPER-Computer System

      Pubricized:
    2021/01/27
      Vol:
    E104-D No:5
      Page(s):
    688-696

    Recently, researchers paid more attention on designing optical routers, since they are essential building blocks of all photonic interconnection architectures. Thus, improving them could lead to a spontaneous improvement in the overall performance of the network. Optical routers suffer from the dilemma of increased insertion loss and crosstalk, which upraises the power consumed as the network scales. In this paper, we propose a new 7×7 non-blocking optical router based on the Dimension Order Routing (DOR) algorithm. Moreover, we develop a method that can ensure the least number of MicroRing Resonators (MRRs) in an optical router. Therefore, by reducing these optical devices, the optical router proposed can decrease the crosstalk and insertion loss of the network. This optical router is evaluated and compared to Ye's router and the optimized crossbar for 3D Mesh network that uses XYZ routing algorithm. Unlike many other proposed routers, this paper evaluates optical routers not only from router level prospective yet also consider the overall network level condition. The appraisals show that our optical router can reduce the worst-case network insertion loss by almost 8.7%, 46.39%, 39.3%, and 41.4% compared to Ye's router, optimized crossbar, optimized universal OR, and Optimized VOTEX, respectively. Moreover, it decreases the Optical Signal-to-Noise Ratio (OSNR) worst-case by almost 27.92%, 88%, 77%, and 69.6% compared to Ye's router, optimized crossbar, optimized universal OR, and Optimized VOTEX, respectively. It also reduces the power consumption by 3.22%, 23.99%, 19.12%, and 20.18% compared to Ye's router, optimized crossbar, optimized universal OR, and Optimized VOTEX, respectively.

  • Application Mapping and Scheduling of Uncertain Communication Patterns onto Non-Random and Random Network Topologies

    Yao HU  Michihiro KOIBUCHI  

     
    PAPER-Computer System

      Pubricized:
    2020/07/20
      Vol:
    E103-D No:12
      Page(s):
    2480-2493

    Due to recent technology progress based on big-data processing, many applications present irregular or unpredictable communication patterns among compute nodes in high-performance computing (HPC) systems. Traditional communication infrastructures, e.g., torus or fat-tree interconnection networks, may not handle well their matchmaking problems with these newly emerging applications. There are already many communication-efficient application mapping algorithms for these typical non-random network topologies, which use nearby compute nodes to reduce the network distances. However, for the above unpredictable communication patterns, it is difficult to efficiently map their applications onto the non-random network topologies. In this context, we recommend using random network topologies as the communication infrastructures, which have drawn increasing attention for the use of HPC interconnects due to their small diameter and average shortest path length (ASPL). We make a comparative study to analyze the impact of application mapping performance on non-random and random network topologies. We propose using topology embedding metrics, i.e., diameter and ASPL, and list several diameter/ASPL-based application mapping algorithms to compare their job scheduling performances, assuming that the communication pattern of each application is unpredictable to the computing system. Evaluation with a large compound application workload shows that, when compared to non-random topologies, random topologies can reduce the average turnaround time up to 39.3% by a random connected mapping method and up to 72.1% by a diameter/ASPL-based mapping algorithm. Moreover, when compared to the baseline topology mapping method, the proposed diameter/ASPL-based topology mapping strategy can reduce up to 48.0% makespan and up to 78.1% average turnaround time, and improve up to 1.9x system utilization over random topologies.

  • Efficient Two-Opt Collective-Communication Operations on Low-Latency Random Network Topologies

    Ke CUI  Michihiro KOIBUCHI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2020/07/03
      Vol:
    E103-D No:12
      Page(s):
    2435-2443

    Random network topologies have been proposed as a low-latency network for parallel computers. Although multicast is a common collective-communication operation, multicast algorithms each of which consists of a large number of unicasts are not well optimized for random network topologies. In this study, we firstly apply a two-opt algorithm for building efficient multicast on random network topologies. The two-opt algorithm creates a skilled ordered list of visiting nodes to minimize the total path hops or the total possible contention counts of unicasts that form the target multicast. We secondly extend to apply the two-opt algorithm for the other collective-communication operations, e.g., allreduce and allgather. The SimGrid discrete-event simulation results show that the two-opt multicast outperforms that in typical MPI implementation by up to 22% of the execution time of an MPI program that repeats the MPI_Bcast function. The two-opt allreduce and the two-opt allgather operations also improve by up to 15% and 14% the execution time when compared to those used in typical MPI implementations, respectively.

  • Traffic-Independent Multi-Path Routing for High-Throughput Data Center Networks

    Ryuta KAWANO  Ryota YASUDO  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2020/08/06
      Vol:
    E103-D No:12
      Page(s):
    2471-2479

    Network throughput has become an important issue for big-data analysis on Warehouse-Scale Computing (WSC) systems. It has been reported that randomly-connected inter-switch networks can enlarge the network throughput. For irregular networks, a multi-path routing method called k-shortest path routing is conventionally utilized. However, it cannot efficiently exploit longer-than-shortest paths that would be detour paths to avoid bottlenecks. In this work, a novel routing method called k-optimized path routing to achieve high throughput is proposed for irregular networks. We introduce a heuristic to select detour paths that can avoid bottlenecks in the network to improve the average-case network throughput. Experimental results by network simulation show that the proposed k-optimized path routing can improve the saturation throughput by up to 18.2% compared to the conventional k-shortest path routing. Moreover, it can reduce the computation time required for optimization to 1/2760 at a minimum compared to our previously proposed method.

  • Flex-LIONS: A Silicon Photonic Bandwidth-Reconfigurable Optical Switch Fabric Open Access

    Roberto PROIETTI  Xian XIAO  Marjan FARIBORZ  Pouya FOTOUHI  Yu ZHANG  S. J. Ben YOO  

     
    INVITED PAPER

      Pubricized:
    2020/05/14
      Vol:
    E103-B No:11
      Page(s):
    1190-1198

    This paper summarizes our recent studies on architecture, photonic integration, system validation and networking performance analysis of a flexible low-latency interconnect optical network switch (Flex-LIONS) for datacenter and high-performance computing (HPC) applications. Flex-LIONS leverages the all-to-all wavelength routing property in arrayed waveguide grating routers (AWGRs) combined with microring resonator (MRR)-based add/drop filtering and multi-wavelength spatial switching to enable topology and bandwidth reconfigurability to adapt the interconnection to different traffic profiles. By exploiting the multiple free spectral ranges of AWGRs, it is also possible to provide reconfiguration while maintaining minimum-diameter all-to-all interconnectivity. We report experimental results on the design, fabrication, and system testing of 8×8 silicon photonic (SiPh) Flex-LIONS chips demonstrating error-free all-to-all communication and reconfiguration exploiting different free spectral ranges (FSR0 and FSR1, respectively). After reconfiguration in FSR1, the bandwidth between the selected pair of nodes is increased from 50Gb/s to 125Gb/s while an all interconnectivity at 25Gb/s is maintained using FSR0. Finally, we investigate the use of Flex-LIONS in two different networking scenarios. First, networking simulations for a 256-node datacenter inter-rack communication scenario show the potential latency and energy benefits when using Flex-LIONS for optical reconfiguration based on different traffic profiles (a legacy fat-tree architecture is used for comparison). Second, we demonstrate the benefits of leveraging two FSRs in an 8-node 64-core computing system to provide reconfiguration for the hotspot nodes while maintaining minimum-diameter all-to-all interconnectivity.

  • Node-Disjoint Paths Problems in Directed Bijective Connection Graphs

    Keiichi KANEKO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2019/09/26
      Vol:
    E103-D No:1
      Page(s):
    93-100

    In this paper, we extend the notion of bijective connection graphs to introduce directed bijective connection graphs. We propose algorithms that solve the node-to-set node-disjoint paths problem and the node-to-node node-disjoint paths problem in a directed bijective connection graph. The time complexities of the algorithms are both O(n4), and the maximum path lengths are both 2n-1.

  • Genetic Node-Mapping Methods for Rapid Collective Communications

    Takashi YOKOTA  Kanemitsu OOTSU  Takeshi OHKAWA  

     
    PAPER-Computer System

      Pubricized:
    2019/10/10
      Vol:
    E103-D No:1
      Page(s):
    111-129

    Inter-node communication is essential in parallel computation. The performance of parallel processing depends on the efficiencies in both computation and communication, thus, the communication cost is not negligible. A parallel application program involves a logical communication structure that is determined by the interchange of data between computation nodes. Sometimes the logical communication structure mismatches to that in a real parallel machine. This mismatch results in large communication costs. This paper addresses the node-mapping problem that rearranges logical position of node so that the degree of mismatch is decreased. This paper assumes that parallel programs execute one or more collective communications that follow specific traffic patterns. An appropriate node-mapping achieves high communication performance. This paper proposes a strong heuristic method for solving the node-mapping problem and adapts the method to a genetic algorithm. Evaluation results reveal that the proposed method achieves considerably high performance; it achieves 8.9 (4.9) times speed-up on average in single-(two-)traffic-pattern cases in 32×32 torus networks. Specifically, for some traffic patterns in small-scale networks, the proposed method finds theoretically optimized solutions. Furthermore, this paper discusses in deep about various issues in the proposed method that employs genetic algorithm, such as population of genes, number of generations, and traffic patterns. This paper also discusses applicability to large-scale systems for future practical use.

  • A Generalized Theory Based on the Turn Model for Deadlock-Free Irregular Networks

    Ryuta KAWANO  Ryota YASUDO  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2019/10/08
      Vol:
    E103-D No:1
      Page(s):
    101-110

    Recently proposed irregular networks can reduce the latency for both on-chip and off-chip systems with a large number of computing nodes and thus can improve the performance of parallel applications. However, these networks usually suffer from deadlocks in routing packets when using a naive minimal path routing algorithm. To solve this problem, we focus attention on a lately proposed theory that generalizes the turn model to maintain the network performance with deadlock-freedom. The theorems remain a challenge of applying themselves to arbitrary topologies including fully irregular networks. In this paper, we advance the theorems to completely general ones. Moreover, we provide a feasible implementation of a deadlock-free routing method based on our advanced theorem. Experimental results show that the routing method based on our proposed theorem can improve the network throughput by up to 138 % compared to a conventional deterministic minimal routing method. Moreover, when utilized as the escape path in Duato's protocol, it can improve the throughput by up to 26.3 % compared with the conventional up*/down* routing.

  • Constructing Two Completely Independent Spanning Trees in Balanced Hypercubes

    Yi-Xian YANG  Kung-Jui PAI  Ruay-Shiung CHANG  Jou-Ming CHANG  

     
    LETTER-Fundamentals of Information Systems

      Pubricized:
    2019/06/17
      Vol:
    E102-D No:12
      Page(s):
    2409-2412

    A set of spanning trees of a graphs G are called completely independent spanning trees (CISTs for short) if for every pair of vertices x, y∈V(G), the paths joining x and y in any two trees have neither vertex nor edge in common, except x and y. Constructing CISTs has applications on interconnection networks such as fault-tolerant routing and secure message transmission. In this paper, we investigate the problem of constructing two CISTs in the balanced hypercube BHn, which is a hypercube-variant network and is superior to hypercube due to having a smaller diameter. As a result, the diameter of CISTs we constructed equals to 9 for BH2 and 6n-2 for BHn when n≥3.

1-20hit(320hit)