The search functionality is under construction.

Keyword Search Result

[Keyword] interconnect(320hit)

41-60hit(320hit)

  • Node-to-Node Disjoint Paths Problem in Möbius Cubes

    David KOCIK  Keiichi KANEKO  

     
    PAPER-Dependable Computing

      Pubricized:
    2017/04/25
      Vol:
    E100-D No:8
      Page(s):
    1837-1843

    The Möbius cube is a variant of the hypercube. Its advantage is that it can connect the same number of nodes as a hypercube but with almost half the diameter of the hypercube. We propose an algorithm to solve the node-to-node disjoint paths problem in n-Möbius cubes in polynomial-order time of n. We provide a proof of correctness of the algorithm and estimate that the time complexity is O(n2) and the maximum path length is 3n-5.

  • A Floorplan Aware High-Level Synthesis Algorithm with Body Biasing for Delay Variation Compensation

    Koki IGAWA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E100-A No:7
      Page(s):
    1439-1451

    In this paper, we propose a floorplan aware high-level synthesis algorithm with body biasing for delay variation compensation, which minimizes the average leakage energy of manufactured chips. In order to realize floorplan-aware high-level synthesis, we utilize huddle-based distributed register architecture (HDR architecture). HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit more but can increase the latency. We assign control-data flow graph (CDFG) nodes in non-critical paths to the huddles with larger expected leakage energy and those in critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 39.7% without latency and yield degradation compared with typical-case design with body biasing.

  • Comparative Performances of SOI-Based Optical Interconnect vs. Electrical Interconnect in Analog Electronic Applications

    Siti Sarah MD SALLAH  Sawal Hamid MD ALI  P. Susthitha MENON  Nurjuliana JUHARI  Md Shabiul ISLAM  

     
    PAPER-Optoelectronics

      Vol:
    E100-C No:7
      Page(s):
    655-661

    Silicon-on-insulator (SOI) has become one of the most famous materials in recent years, especially in silicon photonics applications. This paper presents a comparative performance of a SOI-based optical interconnect (OI) vs. an electrical interconnect (EI) for high-speed performances at a circuit level. The SOI-based optical waveguide was designed using OptiBPM to obtain a single mode condition (SMC). Then, the optical interconnect (OI) link was simulated in OptiSPICE and was tested as an interconnection in two-stage CS amplifiers. The results showed that the two-stage CS amplifier using OI offered several advantages in terms of electrical performances, such as voltage gain, frequency bandwidth, slew rate, and propagation delay, which makes it superior to the EI.

  • Novel Chip Stacking Methods to Extend Both Horizontally and Vertically for Many-Core Architectures with ThrouChip Interface

    Hiroshi NAKAHARA  Tomoya OZAKI  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-Architecture

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2871-2880

    The increase of recent non-recurrent engineering cost (design, mask and test cost) have made large System-on-Chip (SoC) difficult to develop especially with advanced technology. We radically explore an approach for cheap and flexible chip stacking by using Inductive coupling ThruChip Interface (TCI). In order to connect a large number of small chips for building a large scale system, novel chip stacking methods called the linear stacking and staggered stacking are proposed. They enable the system to be extended to x or/and y dimensions, not only to z dimension. Here, a novel chip staking layout, and its deadlock-free routing design for the case using single-core chips and multi-core chips are shown. The network with 256 nodes formed by the proposed stacking improves the latency of 2D mesh by 13.8% and the performance of NAS Parallel Benchmarks by 5.4% on average compared to that of 2D mesh.

  • Enhancing Entropy Throttling: New Classes of Injection Control in Interconnection Networks

    Takashi YOKOTA  Kanemitsu OOTSU  Takeshi OHKAWA  

     
    PAPER-Interconnection network

      Pubricized:
    2016/08/25
      Vol:
    E99-D No:12
      Page(s):
    2911-2922

    State-of-the-art parallel computers, which are growing in parallelism, require a lot of things in their interconnection networks. Although wide spectrum of efforts in research and development for effective and practical interconnection networks are reported, the problem is still open. One of the largest issues is congestion control that intends to maximize the network performance in terms of throughput and latency. Throttling, or injection limitation, is one of the center ideas of congestion control. We have proposed a new class of throttling method, Entropy Throttling, whose foundation is entropy concept of packets. The throttling method is successful in part, however, its potentials are not sufficiently discussed. This paper aims at exploiting capabilities of the Entropy Throttling method via comprehensive evaluation. Major contributions of this paper are to introduce two ideas of hysteresis function and guard time and also to clarify wide performance characteristics in steady and unsteady communication situations. By introducing the new ideas, we extend the Entropy throttling method. The extended methods improve communication performance at most 3.17 times in the best case and 1.47 times in average compared with non-throttling cases in collective communication, while the method can sustain steady communication performance.

  • Misalignment Tolerance of Pluggable Ballpoint-Pen Interconnect of Graded-Index Plastic Optical Fiber for 4K/8K UHD Display Open Access

    Azusa INOUE  Yasuhiro KOIKE  

     
    INVITED PAPER

      Vol:
    E99-C No:11
      Page(s):
    1271-1276

    We investigate the influence of launching conditions on misalignment tolerance of pluggable ballpoint-pen interconnects, where graded-index plastic optical fibers (GI POFs) are coupled with ball lenses mounted on their end faces. The lateral-misalignment tolerance of the ballpoint-pen connector decreased with an increase in the driving current of a vertical cavity surface emitting laser (VCSEL) under the center launching condition. This was attributed to the VCSEL multimode oscillation, which increased the connector coupling loss through the higher-order guided mode launching in the GI POF and the resulting output beam expansion in the ballpoint-pen connector. The driving-current dependence of the connector coupling loss could be decreased using offset launchings. For a radial launching offset of 20µm, we could obtain coupling losses below 1dB for lateral coupling offsets of ±50µm with little dependence on the driving current. This suggests that data transmission quality for misaligned connection of the GI POFs can be improved further by optimizing launching systems for the ballpoint-pen interconnects.

  • Job Mapping and Scheduling on Free-Space Optical Networks

    Yao HU  Ikki FUJIWARA  Michihiro KOIBUCHI  

     
    PAPER-Computer System

      Pubricized:
    2016/08/16
      Vol:
    E99-D No:11
      Page(s):
    2694-2704

    A number of parallel applications run on a high-performance computing (HPC) system simultaneously. Job mapping and scheduling become crucial to improve system utilization, because fragmentation prevents an incoming job from being assigned even if there are enough compute nodes unused. Wireless supercomputers and datacenters with free-space optical (FSO) terminals have been proposed to replace the conventional wired interconnection so that a diverse application workload can be better supported by changing their network topologies. In this study we firstly present an efficient job mapping by swapping the endpoints of FSO links in a wireless HPC system. Our evaluation shows that an FSO-equipped wireless HPC system can achieve shorter average queuing length and queuing time for all the dispatched user jobs. Secondly, we consider the use of a more complicated and enhanced scheduling algorithm, which can further improve the system utilization over different host networks, as well as the average response time for all the dispatched user jobs. Finally, we present the performance advantages of the proposed wireless HPC system under more practical assumptions such as different cabinet capacities and diverse subtopology packings.

  • A Built-in Test Circuit for Electrical Interconnect Testing of Open Defects in Assembled PCBs

    Widiant  Masaki HASHIZUME  Shohei SUENAGA  Hiroyuki YOTSUYANAGI  Akira ONO  Shyue-Kung LU  Zvi ROTH  

     
    PAPER-Dependable Computing

      Pubricized:
    2016/08/16
      Vol:
    E99-D No:11
      Page(s):
    2723-2733

    In this paper, a built-in test circuit for an electrical interconnect test method is proposed to detect an open defect occurring at an interconnect between an IC and a printed circuit board. The test method is based on measuring the supply current of an inverter gate in the test circuit. A time-varying signal is provided to an interconnect as a test signal by the built-in test circuit. In this paper, the test circuit is evaluated by SPICE simulation and by experiments with a prototyping IC. The experimental results reveal that a hard open defect is detectable by the test method in addition to a resistive open defect and a capacitive open one at a test speed of 400 kHz.

  • Interconnection-Delay and Clock-Skew Estimate Modelings for Floorplan-Driven High-Level Synthesis Targeting FPGA Designs

    Koichi FUJIWARA  Kazushi KAWAMURA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:7
      Page(s):
    1294-1310

    Recently, high-level synthesis techniques for FPGA designs (FPGA-HLS techniques) are strongly required in various applications. Both interconnection delays and clock skews have a large impact on circuit performance implemented onto FPGA, which indicates the need for floorplan-driven FPGA-HLS algorithms considering them. To appropriately estimate interconnection delays and clock skews at HLS phase, a reasonable model to estimate them becomes essential. In this paper, we demonstrate several experiments to characterize interconnection delays and clock skews in FPGA and propose novel estimate models called “IDEF” and “CSEF”. In order to evaluate our models, we integrate them into a conventional floorplan-driven FPGA-HLS algorithm. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the latency by up to 22% compared with conventional approaches.

  • A Multi-Scenario High-Level Synthesis Algorithm for Variation-Tolerant Floorplan-Driven Design

    Koki IGAWA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:7
      Page(s):
    1278-1293

    In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a multi-scenario high-level synthesis algorithm for variation-tolerant floorplan-driven design targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. At that time, we can explicitly take into account interconnection delays by using distributed-register architectures. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.

  • Layout-Conscious Expandable Topology for Low-Degree Interconnection Networks

    Thao-Nguyen TRUONG  Khanh-Van NGUYEN  Ikki FUJIWARA  Michihiro KOIBUCHI  

     
    PAPER-Computer System

      Pubricized:
    2016/02/02
      Vol:
    E99-D No:5
      Page(s):
    1275-1284

    System expandability becomes a major concern for highly parallel computers and data centers, because their number of nodes gradually increases year by year. In this context we propose a low-degree topology and its floor layout in which a cabinet or node set can be newly inserted by connecting short cables to a single existing cabinet. Our graph analysis shows that the proposed topology has low diameter, low average shortest path length and short average cable length comparable to existing topologies with the same degree. When incrementally adding nodes and cabinets to the proposed topology, its diameter and average shortest path length increase modestly. Our discrete-event simulation results show that the proposed topology provides a comparable performance to 2-D Torus for some parallel applications. The network cost and power consumption of DSN-F modestly increase when compared to the counterpart non-random topologies.

  • Node-to-Set Disjoint Paths Problem in a Möbius Cube

    David KOCIK  Yuki HIRAI  Keiichi KANEKO  

     
    PAPER-Dependable Computing

      Pubricized:
    2015/12/14
      Vol:
    E99-D No:3
      Page(s):
    708-713

    This paper proposes an algorithm that solves the node-to-set disjoint paths problem in an n-Möbius cube in polynomial-order time of n. It also gives a proof of correctness of the algorithm as well as estimating the time complexity, O(n4), and the maximum path length, 2n-1. A computer experiment is conducted for n=1,2,...,31 to measure the average performance of the algorithm. The results show that the average time complexity is gradually approaching to O(n3) and that the maximum path lengths cannot be attained easily over the range of n in the experiment.

  • Emergency Optical Network Construction and Control with Multi-Vendor Interconnection for Quick Disaster Recovery

    Sugang XU  Noboru YOSHIKANE  Masaki SHIRAIWA  Takehiro TSURITANI  Hiroaki HARAI  Yoshinari AWAJI  Naoya WADA  

     
    PAPER-Fiber-Optic Transmission for Communications

      Vol:
    E99-B No:2
      Page(s):
    370-384

    Past disasters, e.g., mega-quakes, tsunamis, have taught us that it is difficult to fully repair heavily damaged network systems in a short time. The only method for quickly restoring core communications is to start by fully utilizing the surviving network resources from different networks. However, as these networks might be built using different vendors' products (which are often incompatible with each other), the interconnection and utilization of these surviving resources are not straightforward. In this paper, we consider an all-optical multi-vendor interconnection method as an efficient reactive approach during disaster recovery. First, we introduce a disaster recovery scenario in which we use the multi-vendor interconnection approach. Second, we present two sub-problems and propose solutions: (1) network planning problem for multi-vendor interconnection-based emergency optical network construction and (2) interconnection problem for multi-vendor optical networks including both the data-plane and the control-and-management-plane. To enable the operation of multi-vendor systems, command translation middleware is developed for individual vendor-specific network control-and-management systems. Simulations are conducted to evaluate our proposal for sub-problem (1). The results reveal that multi-vendor interconnection can lead to minimum-cost network recovery. Additionally, an emergency optical network prototype is implemented on a two-vendor optical network test-bed to address sub-problem (2). Demonstrations of both the data-plane and the control-and-management-plane validate the feasibility of the multi-vendor interconnection approach in disaster recovery.

  • The Fault-Tolerant Hamiltonian Problems of Crossed Cubes with Path Faults

    Hon-Chan CHEN  Tzu-Liang KUNG  Yun-Hao ZOU  Hsin-Wei MAO  

     
    PAPER-Switching System

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2116-2122

    In this paper, we investigate the fault-tolerant Hamiltonian problems of crossed cubes with a faulty path. More precisely, let P denote any path in an n-dimensional crossed cube CQn for n ≥ 5, and let V(P) be the vertex set of P. We show that CQn-V(P) is Hamiltonian if |V(P)|≤n and is Hamiltonian connected if |V(P)| ≤ n-1. Compared with the previous results showing that the crossed cube is (n-2)-fault-tolerant Hamiltonian and (n-3)-fault-tolerant Hamiltonian connected for arbitrary faults, the contribution of this paper indicates that the crossed cube can tolerate more faulty vertices if these vertices happen to form some specific types of structures.

  • An Energy-Efficient Floorplan Driven High-Level Synthesis Algorithm for Multiple Clock Domains Design

    Shin-ya ABE  Youhua SHI  Kimiyoshi USAMI  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1376-1391

    In this paper, we first propose an HDR-mcd architecture, which integrates periodically all-in-phase based multiple clock domains and multi-cycle interconnect communication into high-level synthesis. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay should be considered during high-level synthesis. Next, we propose a high-level synthesis algorithm for HDR-mcd, which can reduce energy consumption by optimizing configuration and placement of huddles. Experimental results show that the proposed method achieves 32.5% energy-saving compared with the existing single clock domain based methods.

  • A Floorplan-Driven High-Level Synthesis Algorithm for Multiplexer Reduction Targeting FPGA Designs

    Koichi FUJIWARA  Kazushi KAWAMURA  Shin-ya ABE  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1392-1405

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications such as computerized stock tradings and reconfigurable network processings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer's cost concurrently. In this paper, we propose a floorplan-driven HLS algorithm for multiplexer reduction targeting FPGA designs. By utilizing distributed-register architectures called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer's cost, we propose two novel binding methods called datapath-oriented scheduling/FU binding and datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 47% and latency by up to 22% compared with conventional approaches while the number of required control steps is almost the same.

  • A High-Level Synthesis Algorithm with Inter-Island Distance Based Operation Chainings for RDR Architectures

    Kotaro TERADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1366-1375

    In deep-submicron era, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register architectures (RDR architectures) have been proposed to cope with this problem. In this paper, we propose a high-level synthesis algorithm using operation chainings which reduces the overall latency targeting RDR architectures. Our algorithm consists of three steps: The first step enumerates candidate operations for chaining. The second step introduces maximal chaining distance (MCD), which gives the maximal allowable inter-island distance on RDR architecture between chaining candidate operations. The last step performs list-scheduling and binding simultaneously based on the results of the two preceding steps. Our algorithm enumerates feasible chaining candidates and selects the best ones for RDR architecture. Experimental results show that our proposed algorithm reduces the latency by up to 40.0% compared to the original approach, and by up to 25.0% compared to a conventional approach. Our algorithm also reduces the number of registers and the number of multiplexers compared to the conventional approaches in some cases.

  • The Case for Network Coding for Collective Communication on HPC Interconnection Networks Open Access

    Ahmed SHALABY  Ikki FUJIWARA  Michihiro KOIBUCHI  

     
    PAPER-Information Network

      Pubricized:
    2014/12/11
      Vol:
    E98-D No:3
      Page(s):
    661-670

    Recently network bandwidth becomes a performance concern particularly for collective communication since bisection bandwidths of supercomputers become far less than their full bisection bandwidths. In this context we propose the use of a network coding technique to reduce the number of unicasts and the size of data transferred in latency-sensitive collective communications in supercomputers. Our proposed network coding scheme has a hierarchical multicasting structure with intra-group and inter-group unicasts. Quantitative analysis show that the aggregate path hop counts by our hierarchical network coding decrease as much as 94% when compared to conventional unicast-based multicasts. We validate these results by cycle-accurate network simulations. In 1,024-switch networks, the network reduces the execution time of collective communications as much as 70%. We also show that our hierarchical network coding is beneficial for any packet size.

  • An Optimized Algorithm for Dynamic Routing and Wavelength Assignment in WDM Networks with Sparse Wavelength Conversion

    Liangrui TANG  Sen FENG  Jianhong HAO  Bin LI  Xiongwen ZHAO  Xin WU  

     
    PAPER-Fiber-Optic Transmission for Communications

      Vol:
    E98-B No:2
      Page(s):
    296-302

    The dynamic routing and wavelength assignment (RWA) problem in wavelength division multiplexing (WDM) optical networks with sparse wavelength conversion has been a hot research topic in recent years. An optimized algorithm based on a multiple-layered interconnected graphic model (MIG) for the dynamic RWA is presented in this paper. The MIG is constructed to reflect the actual WDM network topology. Based on the MIG, the link cost is given by the conditions of available lightpath to calculate an initial solution set of optimal paths, and by combination with path length, the optimized solution using objective function is determined. This approach simultaneously solves the route selection and wavelength assignment problem. Simulation results demonstrate the proposed MIG-based algorithm is effective in reducing blocking probability and boosting wavelength resource utilization compared with other RWA methods.

  • Completely Independent Spanning Trees on Some Interconnection Networks

    Kung-Jui PAI  Jinn-Shyong YANG  Sing-Chen YAO  Shyue-Ming TANG  Jou-Ming CHANG  

     
    LETTER-Information Network

      Vol:
    E97-D No:9
      Page(s):
    2514-2517

    Let T1,T2,...,Tk be spanning trees in a graph G. If, for any two vertices u,v of G, the paths joining u and v on the k trees are mutually vertex-disjoint, then T1,T2,...,Tk are called completely independent spanning trees (CISTs for short) of G. The construction of CISTs can be applied in fault-tolerant broadcasting and secure message distribution on interconnection networks. Hasunuma (2001) first introduced the concept of CISTs and conjectured that there are k CISTs in any 2k-connected graph. Unfortunately, this conjecture was disproved by Péterfalvi recently. In this note, we give a necessary condition for k-connected k-regular graphs with ⌊k/2⌋ CISTs. Based on this condition, we provide more counterexamples for Hasunuma's conjecture. By contrast, we show that there are two CISTs in 4-regular chordal rings CR(N,d) with N=k(d-1)+j under the condition that k ≥ 4 is even and 0 ≤ j ≤ 4. In particular, the diameter of each constructed CIST is derived.

41-60hit(320hit)