The search functionality is under construction.

Keyword Search Result

[Keyword] interconnect(320hit)

21-40hit(320hit)

  • Recent Progress in the Development of Large-Capacity Integrated Silicon Photonics Transceivers Open Access

    Yu TANAKA  

     
    INVITED PAPER

      Vol:
    E102-C No:4
      Page(s):
    357-363

    We report our recent progress in silicon photonics integrated device technology targeting on-chip-level large-capacity optical interconnect applications. To realize high-capacity data transmission, we successfully developed on-package-type silicon photonics integrated transceivers and demonstrated simultaneous 400 Gbps operation. 56 Gbps pulse-amplitude-modulation (PAM) 4 and wavelength-division-multiplexing technologies were also introduced to enhance the transmission capacity.

  • Optimizing Slot Utilization and Network Topology for Communication Pattern on Circuit-Switched Parallel Computing Systems

    Yao HU  Michihiro KOIBUCHI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/11/16
      Vol:
    E102-D No:2
      Page(s):
    247-260

    In parallel computing systems, the interconnection network forms the critical infrastructure which enables robust and scalable communication between hundreds of thousands of nodes. The traditional packet-switched network tends to suffer from long communication time when network congestion occurs. In this context, we explore the use of circuit switching (CS) to replace packet switches with custom hardware that supports circuit-based switching efficiently with low latency. In our target CS network, a certain amount of bandwidth is guaranteed for each communication pair so that the network latency can be predictable when a limited number of node pairs exchange messages. The number of allocated time slots in every switch is a direct factor to affect the end-to-end latency, we thereby improve the slot utilization and develop a network topology generator to minimize the number of time slots optimized to target applications whose communication patterns are predictable. By a quantitative discrete-event simulation, we illustrate that the minimum necessary number of slots can be reduced to a small number in a generated topology by our design methodology while maintaining network cost 50% less than that in standard tori topologies.

  • Accelerating Large-Scale Interconnection Network Simulation by Cellular Automata Concept

    Takashi YOKOTA  Kanemitsu OOTSU  Takeshi OHKAWA  

     
    PAPER-Computer System

      Pubricized:
    2018/10/05
      Vol:
    E102-D No:1
      Page(s):
    52-74

    State-of-the-art parallel systems employ a huge number of computing nodes that are connected by an interconnection network. An interconnection network (ICN) plays an important role in a parallel system, since it is responsible to communication capability. In general, an ICN shows non-linear phenomena in its communication performance, most of them are caused by congestion. Thus, designing a large-scale parallel system requires sufficient discussions through repetitive simulation runs. This causes another problem in simulating large-scale systems within a reasonable cost. This paper shows a promising solution by introducing the cellular automata concept, which is originated in our prior work. Assuming 2D-torus topologies for simplification of discussion, this paper discusses fundamental design of router functions in terms of cellular automata, data structure of packets, alternative modeling of a router function, and miscellaneous optimization. The proposed models have a good affinity to GPGPU technology and, as representative speed-up results, the GPU-based simulator accelerates simulation upto about 1264 times from sequential execution on a single CPU. Furthermore, since the proposed models are applicable in the shared memory model, multithread implementation of the proposed methods achieve about 162 times speed-ups at the maximum.

  • A Genetic Approach for Accelerating Communication Performance by Node Mapping

    Takashi YOKOTA  Kanemitsu OOTSU  Takeshi OHKAWA  

     
    LETTER-Architecture

      Pubricized:
    2018/09/18
      Vol:
    E101-D No:12
      Page(s):
    2971-2975

    This paper intends to reduce duration times in typical collective communications. We introduce logical addressing system apart from the physical one and, by rearranging the logical node addresses properly, we intend to reduce communication overheads so that ideal communication is performed. One of the key issues is rearrangement of the logical addressing system. We introduce genetic algorithm (GA) as meta-heuristic solution as well as the random search strategy. Our GA-based method achieves at most 2.50 times speedup in three-traffic-pattern cases.

  • The Panpositionable Pancyclicity of Locally Twisted Cubes

    Hon-Chan CHEN  

     
    PAPER-Graph Algorithms

      Pubricized:
    2018/09/18
      Vol:
    E101-D No:12
      Page(s):
    2902-2907

    In a multiprocessor system, processors are connected based on various types of network topologies. A network topology is usually represented by a graph. Let G be a graph and u, v be any two distinct vertices of G. We say that G is pancyclic if G has a cycle C of every length l(C) satisfying 3≤l(C)≤|V(G)|, where |V(G)| denotes the total number of vertices in G. Moreover, G is panpositionably pancyclic from r if for any integer m satisfying $r leq m leq rac{|V(G)|}{2}$, G has a cycle C containing u and v such that dC(u,v)=m and 2m≤l(C)≤|V(G)|, where dC(u,v) denotes the distance of u and v in C. In this paper, we investigate the panpositionable pancyclicity problem with respect to the n-dimensional locally twisted cube LTQn, which is a popular topology derived from the hypercube. Let D(LTQn) denote the diameter of LTQn. We show that for n≥4 and for any integer m satisfying $D(LTQ_n) + 2 leq m leq rac{|V(LTQ_n)|}{2}$, there exists a cycle C of LTQn such that dC(u,v)=m, where (i) 2m+1≤l(C)≤|V(LTQn)| if m=D(LTQn)+2 and n is odd, and (ii) 2m≤l(C)≤|V(LTQn)| otherwise. This improves on the recent result that u and v can be positioned with a given distance on C only under the condition that l(C)=|V(LTQn)|. In parallel and distributed computing, if cycles of different lengths can be embedded, we can adjust the number of simulated processors and increase the flexibility of demand. This paper demonstrates that in LTQn, the cycle embedding containing any two distinct vertices with a feasible distance is extremely flexible.

  • Cycle Embedding in Generalized Recursive Circulant Graphs

    Shyue-Ming TANG  Yue-Li WANG  Chien-Yi LI  Jou-Ming CHANG  

     
    PAPER-Graph Algorithms

      Pubricized:
    2018/09/18
      Vol:
    E101-D No:12
      Page(s):
    2916-2921

    Generalized recursive circulant graphs (GRCGs for short) are a generalization of recursive circulant graphs and provide a new type of topology for interconnection networks. A graph of n vertices is said to be s-pancyclic for some $3leqslant sleqslant n$ if it contains cycles of every length t for $sleqslant tleqslant n$. The pancyclicity of recursive circulant graphs was investigated by Araki and Shibata (Inf. Process. Lett. vol.81, no.4, pp.187-190, 2002). In this paper, we are concerned with the s-pancyclicity of GRCGs.

  • NEST: Towards Extreme Scale Computing Systems

    Yunfeng LU  Huaxi GU  Xiaoshan YU  Kun WANG  

     
    LETTER-Information Network

      Pubricized:
    2018/08/20
      Vol:
    E101-D No:11
      Page(s):
    2827-2830

    High-performance computing (HPC) has penetrated into various research fields, yet the increase in computing power is limited by conventional electrical interconnections. The proposed architecture, NEST, exploits wavelength routing in arrayed waveguide grating routers (AWGRs) to achieve a scalable, low-latency, and high-throughput network. For the intra pod and inter pod communication, the symmetrical topology of NEST reduces the network diameter, which leads to an increase in latency performance. Moreover, the proposed architecture enables exponential growth of network size. Simulation results demonstrate that NEST shows 36% latency improvement and 30% throughput improvement over the dragonfly on an average.

  • Waffle: A New Photonic Plasmonic Router for Optical Network on Chip

    Chao TANG  Huaxi GU  Kun WANG  

     
    LETTER-Computer System

      Pubricized:
    2018/05/29
      Vol:
    E101-D No:9
      Page(s):
    2401-2403

    Optical interconnect is a promising candidate for network on chip. As the key element in the network on chip, the routers greatly affect the performance of the whole system. In this letter, we proposed a new router architecture, Waffle, based on compact 2×2 hybrid photonic-plasmonic switching elements. Also, an optimized architecture, Waffle-XY, was designed for the network employed XY routing algorithm. Both Waffle and Waffle-XY are strictly non-blocking architectures and can be employed in the popular mesh-like networks. Theoretical analysis illustrated that Waffle and Waffle-XY possessed a better performance compared with several representative routers.

  • A Design for Testability of Open Defects at Interconnects in 3D Stacked ICs

    Fara ASHIKIN  Masaki HASHIZUME  Hiroyuki YOTSUYANAGI  Shyue-Kung LU  Zvi ROTH  

     
    PAPER-Dependable Computing

      Pubricized:
    2018/05/09
      Vol:
    E101-D No:8
      Page(s):
    2053-2063

    A design-for-testability method and an electrical interconnect test method are proposed to detect open defects occurring at interconnects among dies and input/output pins in 3D stacked ICs. As part of the design method, an nMOS and a diode are added to each input interconnect. The test method is based on measuring the quiescent current that is made to flow through an interconnect to be tested. The testability is examined both by SPICE simulation and by experimentation. The test method enabled the detection of open defects occurring at the newly designed interconnects of dies at experiments test speed of 1MHz. The simulation results reveal that an open defect generating additional delay of 279psec is detectable by the test method at a test speed of 200MHz beside of open defects that generate no logical errors.

  • Si-Photonics-Based Layer-to-Layer Coupler Toward 3D Optical Interconnection Open Access

    Nobuhiko NISHIYAMA  JoonHyun KANG  Yuki KUNO  Kazuto ITOH  Yuki ATSUMI  Tomohiro AMEMIYA  Shigehisa ARAI  

     
    INVITED PAPER

      Vol:
    E101-C No:7
      Page(s):
    501-508

    To realize three-dimensional (3D) optical interconnection on large-scale integration (LSI) circuits, layer-to-layer couplers based on Si-photonics platform were reviewed. In terms of optical cross talk, more than 1 µm layer distance is required for 3D interconnection. To meet this requirement for the layer-to-layer optical coupler, we proposed two types of couplers: a pair of grating couplers with metal mirrors for multi-layer distance coupling and taper-type directional couplers for neighboring layer distance coupling. Both structures produced a high coupling efficiency with relatively compact (∼100 µm) device sizes with a complementary metal oxide semiconductor (CMOS) compatible fabrication process.

  • An Optimization Algorithm to Build Low Congestion Multi-Ring Topology for Optical Network-on-Chip

    Lijing ZHU  Kun WANG  Duan ZHOU  Liangkai LIU  Huaxi GU  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/04/20
      Vol:
    E101-D No:7
      Page(s):
    1835-1842

    Ring-based topology is popular for optical network-on-chip. However, the network congestion is serious for ring topology, especially when optical circuit-switching is employed. In this paper, we proposed an algorithm to build a low congestion multi-ring architecture for optical network-on-chip without additional wavelength or scheduling overhead. A network congestion model is established with new network congestion factor defined. An algorithm is developed to optimize the low congestion multi-ring topology. Finally, a case study is shown and the simulation results by OPNET verify the superiority over the traditional ONoC architecture.

  • Cyclic Vertex Connectivity of Trivalent Cayley Graphs

    Jenn-Yang KE  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/03/30
      Vol:
    E101-D No:7
      Page(s):
    1828-1834

    A vertex subset F ⊆ V(G) is called a cyclic vertex-cut set of a connected graph G if G-F is disconnected such that at least two components in G-F contain cycles. The cyclic vertex connectivity is the cardinality of a minimum cyclic vertex-cut set. In this paper, we show that the cyclic vertex connectivity of the trivalent Cayley graphs TGn is equal to eight for n ≥ 4.

  • 25-Gbps 3-mW/Gbps/ch VCSEL Driver Circuit in 65-nm CMOS for Multichannel Optical Transmitter

    Toru YAZAKI  Norio CHUJO  Takeshi TAKEMOTO  Hiroki YAMASHITA  Akira HYOGO  

     
    PAPER

      Vol:
    E101-A No:2
      Page(s):
    402-409

    This paper describes the design and experiment results of a 25Gbps vertical-cavity surface emitting laser (VCSEL) driver circuit for a multi channel optical transmitter. To compensate for the non-linearity of the VCSEL and achieve high speed data rate communication, an asymmetric pre-emphasis technique is proposed for the VCSEL driver. An asymmetric pre-emphasis signal can be created by adjusting the duty ratio of the emphasis signal. The VCSEL driver adopts a double cascode connection that can apply a drive current from a high voltage DC bias and feed-forward compensation that can enhance the band-width for common-cathode VCSEL. For the design of the optical module structure, a two-tier low temperature co-fired ceramics (LTCC) package is adopted to minimize the wire bonding between the signal pad on the LTCC and the anode pad on the VCSEL. This structure and circuit reduces the simulated deterministic jitter from 12.7 to 4.1ps. A test chip was fabricated with the 65-nm standard CMOS process and demonstrated to work as an optical transmitter. An experimental evaluation showed that this VCSEL driver with asymmetric pre-emphasis reduced the total deterministic jitter up to 8.6ps and improved the vertical eye opening ratio by 3% compared with symmetric pre-emphasis at 25Gbps with a PRBS=29-1 test signal. The power consumption of the VCSEL driver was 3.0mW/Gbps/ch at 25Gbps. An optical transmitter including the VCSEL driver achieved 25-Gbps, 4-ch fully optical links.

  • A Static Packet Scheduling Approach for Fast Collective Communication by Using PSO

    Takashi YOKOTA  Kanemitsu OOTSU  Takeshi OHKAWA  

     
    PAPER-Interconnection networks

      Pubricized:
    2017/07/14
      Vol:
    E100-D No:12
      Page(s):
    2781-2795

    Interconnection network is one of the inevitable components in parallel computers, since it is responsible to communication capabilities of the systems. It affects the system-level performance as well as the physical and logical structure of the systems. Although many studies are reported to enhance the interconnection network technology, we have to discuss many issues remaining. One of the most important issues is congestion management. In an interconnection network, many packets are transferred simultaneously and the packets interfere to each other in the network. Congestion arises as a result of the interferences. Its fast spreading speed seriously degrades communication performance and it continues for long time. Thus, we should appropriately control the network to suppress the congested situation for maintaining the maximum performance. Many studies address the problem and present effective methods, however, the maximal performance in an ideal situation is not sufficiently clarified. Solving the ideal performance is, in general, an NP-hard problem. This paper introduces particle swarm optimization (PSO) methodology to overcome the problem. In this paper, we first formalize the optimization problem suitable for the PSO method and present a simple PSO application as naive models. Then, we discuss reduction of the size of search space and introduce three practical variations of the PSO computation models as repetitive model, expansion model, and coding model. We furthermore introduce some non-PSO methods for comparison. Our evaluation results reveal high potentials of the PSO method. The repetitive and expansion models achieve significant acceleration of collective communication performance at most 1.72 times faster than that in the bursty communication condition.

  • Implementing Exchanged Hypercube Communication Patterns on Ring-Connected WDM Optical Networks

    Yu-Liang LIU  Ruey-Chyi WU  

     
    PAPER-Interconnection networks

      Pubricized:
    2017/08/04
      Vol:
    E100-D No:12
      Page(s):
    2771-2780

    The exchanged hypercube, denoted by EH(s,t), is a graph obtained by systematically removing edges from the corresponding hypercube, while preserving many of the hypercube's attractive properties. Moreover, ring-connected topology is one of the most promising topologies in Wavelength Division Multiplexing (WDM) optical networks. Let Rn denote a ring-connected topology. In this paper, we address the routing and wavelength assignment problem for implementing the EH(s,t) communication pattern on Rn, where n=s+t+1. We design an embedding scheme. Based on the embedding scheme, a near-optimal wavelength assignment algorithm using 2s+t-2+⌊2t/3⌋ wavelengths is proposed. We also show that the wavelength assignment algorithm uses no more than an additional 25 percent of (or ⌊2t-1/3⌋) wavelengths, compared to the optimal wavelength assignment algorithm.

  • A Bitwidth-Aware High-Level Synthesis Algorithm Using Operation Chainings for Tiled-DR Architectures

    Kotaro TERADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E100-A No:12
      Page(s):
    2911-2924

    As application hardware designs and implementations in a short term are required, high-level synthesis is more and more essential EDA technique nowadays. In deep-submicron era, interconnection delays are not negligible even in high-level synthesis thus distributed-register and -controller architectures (DR architectures) have been proposed in order to cope with this problem. It is also profitable to take data-bitwidth into account in high-level synthesis. In this paper, we propose a bitwidth-aware high-level synthesis algorithm using operation chainings targeting Tiled-DR architectures. Our proposed algorithm optimizes bitwidths of functional units and utilizes the vacant tiles by adding some extra functional units to realize effective operation chainings to generate high performance circuits without increasing the total area. Experimental results show that our proposed algorithm reduces the overall latency by up to 47% compared to the conventional approach without area overheads by eliminating unnecessary bitwidths and adding efficient extra FUs for Tiled-DR architectures.

  • A Layout-Oriented Routing Method for Low-Latency HPC Networks

    Ryuta KAWANO  Hiroshi NAKAHARA  Ikki FUJIWARA  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-Interconnection networks

      Pubricized:
    2017/07/14
      Vol:
    E100-D No:12
      Page(s):
    2796-2807

    End-to-end network latency has become an important issue for parallel application on large-scale high performance computing (HPC) systems. It has been reported that randomly-connected inter-switch networks can lower the end-to-end network latency. This latency reduction is established in exchange for a large amount of routing information. That is, minimal routing on irregular networks is achieved by using routing tables for all destinations in the networks. In this work, a novel distributed routing method called LOREN (Layout-Oriented Routing with Entries for Neighbors) to achieve low-latency with a small routing table is proposed for irregular networks whose link length is limited. The routing tables contain both physically and topologically nearby neighbor nodes to ensure livelock-freedom and a small number of hops between nodes. Experimental results show that LOREN reduces the average latencies by 5.8% and improves the network throughput by up to 62% compared with a conventional compact routing method. Moreover, the number of required routing table entries is reduced by up to 91%, which improves scalability and flexibility for implementation.

  • Sub-fF-Capacitance Photonic-Crystal Photodetector Towards fJ/bit On-Chip Receiver Open Access

    Kengo NOZAKI  Shinji MATSUO  Koji TAKEDA  Takuro FUJII  Masaaki ONO  Abdul SHAKOOR  Eiichi KURAMOCHI  Masaya NOTOMI  

     
    INVITED PAPER

      Vol:
    E100-C No:10
      Page(s):
    750-758

    An ultra-compact InGaAs photodetector (PD) is demonstrated based on a photonic crystal (PhC) waveguide to meet the demand for a photoreceiver for future dense photonic integration. Although the PhC-PD has a length of only 1.7µm and a capacitance of less than 1fF, a high responsivity of 1A/W was observed both theoretically and experimentally. This low capacitance PD allows us to expect a resistor-loaded receiver to be realized that requires no electrical amplifiers. We fabricated a resistor-loaded PhC-PD for light-to-voltage conversion, and demonstrated a kV/W efficiency with a GHz bandwidth without using amplifiers. This will lead to a photoreceiver with an ultralow energy consumption of less than 1fJ/bit, which is a step along the road to achieving a dense photonic network and processor on a chip.

  • The Performance Evaluation of a 3D Torus Network Using Partial Link-Sharing Method in NoC Router Buffer

    Naohisa FUKASE  Yasuyuki MIURA  Shigeyoshi WATANABE  M.M. HAFIZUR RAHMAN  

     
    PAPER-Computer System

      Pubricized:
    2017/06/30
      Vol:
    E100-D No:10
      Page(s):
    2478-2492

    The high performance network-on-chip (NoC) router using minimal hardware resources to minimize the layout area is very essential for NoC design. In this paper, we have proposed a memory sharing method of a wormhole routed NoC architecture to alleviate the area overhead of a NoC router. In the proposed method, a memory is shared by multiple physical links by using a multi-port memory. In this paper, we have proposed a partial link-sharing method and evaluated the communication performance using the proposed method. It is revealed that the resulted communication performance by the proposed methods is higher than that of the conventional method, and the progress ratio of the 3D-torus network is higher than that of 2D-torus network. It is shown that the improvement of communication performance using partial link sharing method is achieved with slightly increase of hardware cost.

  • Stochastic Fault-Tolerant Routing in Dual-Cubes

    Junsuk PARK  Nobuhiro SEKI  Keiichi KANEKO  

     
    LETTER-Dependable Computing

      Pubricized:
    2017/05/10
      Vol:
    E100-D No:8
      Page(s):
    1920-1921

    In the topologies for interconnected nodes, it is desirable to have a low degree and a small diameter. For the same number of nodes, a dual-cube topology has almost half the degree compared to a hypercube while increasing the diameter by just one. Hence, it is a promising topology for interconnection networks of massively parallel systems. We propose here a stochastic fault-tolerant routing algorithm to find a non-faulty path from a source node to a destination node in a dual-cube.

21-40hit(320hit)