The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] wormhole(11hit)

1-11hit
  • TTN: A High Performance Hierarchical Interconnection Network for Massively Parallel Computers

    M.M. Hafizur RAHMAN  Yasushi INOGUCHI  Yukinori SATO  Susumu HORIGUCHI  

     
    PAPER-Computer Systems

      Vol:
    E92-D No:5
      Page(s):
    1062-1078

    Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Tori connected Torus Network (TTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 2D-torus networks that are hierarchically interconnected for higher-level networks. This paper addresses the architectural details of the TTN and explores aspects such as node degree, network diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the TTN using four virtual channels and evaluate the network's dynamic communication performance using the proposed routing algorithm under uniform and various non-uniform traffic patterns. We evaluate the dynamic communication performance of TTN, TESH, MH3DT, mesh, and torus networks by computer simulation. It is shown that the TTN possesses several attractive features, including constant node degree, small diameter, low cost, small average distance, moderate (neither too low, nor too high) bisection width, and high throughput and very low zero load latency, which provide better dynamic communication performance than that of other conventional and hierarchical networks.

  • A Binary Tree Based Methodology for Designing an Application Specific Network-on-Chip (ASNOC)

    Yuan-Long JEANG  Jer-Min JOU  Win-Hsien HUANG  

     
    PAPER-VLSI Architecture

      Vol:
    E88-A No:12
      Page(s):
    3531-3538

    In this paper, a methodology based on a mix-mode interconnection architecture is proposed for constructing an application specific network on chip to minimize the total communication time. The proposed architecture uses a globally asynchronous communication network and a locally synchronous bus (or cross-bar or multistage interconnection network MIN). First, a local bus is given for a group of IP cores so that the communications within this local bus can be arranged to be exclusive in time. If the communications of some IP cores should be required to be completed within a given amount of time, then a non-blocking MIN or a crossbar switch should be made for those IP cores instead of a bus. Then, a communication ratio (CR) for each pair of local buses is provided by users, and based on the Huffman coding philosophy, a process is applied to construct a binary tree (BT) with switches on the internal nodes and buses on the leaves. Since the binary tree system is deadlock free (no cycle exists in any path), the router is just a relatively simple and cheap switch. Simulation results show that the proposed methodology and architecture of NOC is better on switching circuit cost and performance than the SPIN and the mesh architecture using our developed deadlock-free router.

  • Modified Hierarchical 3D-Torus Network

    M.M. Hafizur RAHMAN  Yasushi INOGUCHI  Susumu HORIGUCHI  

     
    PAPER-Computer Systems

      Vol:
    E88-D No:2
      Page(s):
    177-186

    Three-dimensional (3D) wafer stacked implementation (WSI) has been proposed as a promising technology for massively parallel computers. A hierarchical 3D-torus (H3DT) network, which is a 3D-torus network of multiple basic modules in which the basic modules are 3D-mesh networks, has been proposed for efficient 3D-WSI. However, the restricted use of physical links between basic modules in the higher level networks reduces the dynamic communication performance of this network. A torus network has better dynamic communication performance than a mesh network. Therefore, we have modified the H3DT network by replacing the 3D-mesh modules by 3D-tori, calling it a Modified H3DT (MH3DT) network. This paper addresses the architectural details of the MH3DT network and explores aspects such as degree, diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the MH3DT network using two virtual channels and evaluate the network's dynamic communication performance under the uniform traffic pattern, using the proposed routing algorithm. It is shown that the MH3DT network possesses several attractive features including small diameter, small cost, small average distance, better bisection width, and better dynamic communication performance.

  • Dynamic Communication Performance of a Hierarchical Torus Network under Non-uniform Traffic Patterns

    M. M. Hafizur RAHMAN  Susumu HORIGUCHI  

     
    PAPER-Computer Systems

      Vol:
    E87-D No:7
      Page(s):
    1887-1896

    Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Hierarchical Torus Network (HTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 3D-torus networks that are hierarchically interconnected for higher level networks. The static network performance of the HTN has already been studied and has been shown to be good. Dynamic communication performance has been evaluated under uniform traffic pattern but not under non-uniform traffic patterns. In this paper, we present a deadlock-free routing algorithm for the HTN using 3 virtual channels and evaluate the network's dynamic communication performance under three non-uniform traffic patterns, using the proposed routing algorithm. We evaluate the dynamic communication performance of HTN, H3D-mesh, H3D-torus, TESH, and mesh networks by computer simulation. We find that the dynamic communication performance of HTN is better than that of the H3D-mesh, H3D-torus, TESH, and mesh networks.

  • An Improvement of Tree-Based Multicasting for Irregular Switch-Based Networks with Wormhole Routing

    Nen-Chung WANG  Tzung-Shi CHEN  Chih-Ping CHU  

     
    PAPER-Computer Systems

      Vol:
    E85-D No:5
      Page(s):
    812-823

    In this paper, we propose an efficient dual-tree-based multicasting scheme with three destination-switch partition strategies on irregular switch-based networks. The dual-tree-based routing scheme supports adaptive, distributed, and deadlock-free multicast on irregular networks with double channels. We first describe a dual-tree structure constructed from the irregular networks and prove that the multicasting based on such a structure is deadlock-free. Then, an efficient multicast routing algorithm with three destination-switch partition strategies: source-switch-based partition, destination-switch-based partition, and all-switches-based partition, is proposed. Finally, we perform simulations to evaluate our proposed algorithm under various impact parameters: system size, message length, and startup time. The experimental results show that the improved tree-based multicasting scheme outperforms the usual tree-based multicasting scheme. The dual-tree-based multicasting scheme with destination-switch-based partition is shown to be the best for all situations.

  • Complete Exchange Algorithms in Wormhole-Routed Torus Networks

    Si-Gwan KIM  Seung Ryoul MAENG  Jung Wan CHO  

     
    PAPER-Algorithms

      Vol:
    E83-D No:4
      Page(s):
    766-776

    We present efficient all-to-all personalized communication algorithms for a 2D torus in wormhole-routed networks. Our complete exchange algorithms reduce the number of start-up by a factor of up to 2, which is a good metric for network performance in wormhole networks. Our algorithms divide the whole network into 22 networks, giving two contention-free networks with N/2N/2. After specially designated nodes called master nodes have collected messages, whose destinations are the rest of the basic cells, only master nodes perform complete exchange with a reduced network size. When finished with this complete exchange among master nodes, these nodes distribute messages to the rest of the master nodes, which results in the desired complete exchange. Then, we present a modified algorithm that further reduces the data transmission time sacrificing the start-up time. After we present our new algorithms, we analyze time complexities and compare several algorithms. We show that our practical algorithm is efficient by a factor of 2 in the required start-up time which means that our algorithms are suitable for wormhole-routed networks.

  • A Fault-Tolerant Deadlock-Free Multicast Algorithm for Wormhole Routed Hypercubes

    Shih-Chang WANG  Jeng-Ping LIN  Sy-Yen KUO  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E82-D No:3
      Page(s):
    677-686

    In this paper, we propose a novel fault-tolerant multicast algorithm for n-dimensional wormhole routed hypercubes. The multicast algorithm will remain functional if the number of faulty nodes in an n-dimensional hypercube is less than n. Multicast is the delivery of the same message from one source node to an arbitrary number of destination nodes. Recently, wormhole routing has become one of the most popular switching techniques in new generation multicomputers. Previous researches have focused on fault-tolerant one-to-one routing algorithms for n-dimensional meshes. However, little research has been done on fault-tolerant one-to-many (multicast) routing algorithms due to the difficulty in achieving deadlock-free routing on faulty networks. We will develop such an algorithm for faulty hypercubes. Our approach is not based on adding physical or virtual channels to the network topology. Instead, we integrate several techniques such as partitioning of nodes, partitioning of channels, node label assignments, and dual-path multicast to achieve fault tolerance. Both theoretical analysis and simulation are performed to demonstrate the effectiveness of the proposed algorithm.

  • Fault-Tolerant Adaptive Wormhole Routing in 2D Mesh

    Seong-Pyo KIM  Taisook HAN  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E81-D No:10
      Page(s):
    1064-1071

    A fault-tolerant wormhole routing algorithm on mesh-connected processors is proposed. The proposed algorithm is based on the solid fault model and allows the fault polygons to be overlapped. The algorithm compares the position of fault region relative to current channel with the fault direction field of a misrouted message to route around overlapped fault polygons. A node deactivating algorithm to convert non-solid fault region into solid fault region is also proposed. The proposed routing algorithm uses four virtual channels and is deadlock and livelock free.

  • A Fault-Tolerant Wormhole Routing Algorithm in Two Dimensional Mesh Networks

    Jinsoo KIM  Ji-Yun KIM  Hyunsoo YOON  Seung Ryoul MAENG  Jung Wan CHO  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E81-D No:6
      Page(s):
    532-544

    We propose a fault-tolerant routing algorithm for 2D meshes. Our routing algorithm can tolerate any number of concave fault regions. It is based on xy-routing and uses the concept of the fault ring/chain composed of fault-free elements surrounding faults. Three virtual channels per physical link are used for deadlock-free routing on a fault ring. Four virtual channels are needed for a fault chain. For a concave fault ring, fault-free nodes in the concave region have been deactivated to avoid deadlock in the previous algorithms, which results in excessive loss of the computational power. Our algorithm ensures deadlock-freedom by restricting the virtual channel usage in the concave region, and it minimizes the loss of the computational power. We also extend the proposed routing scheme for adaptive fault-tolerant routing. The adaptive version requires the same number of virtual channels as the deterministic one.

  • A Link-Disjoint Submesh for Processor Allocation in Mesh Computers

    Kyu-Hyun SHIM  Sung Hoon JUNG  Kyu Ho PARK  

     
    PAPER-Computer Systems

      Vol:
    E80-D No:12
      Page(s):
    1155-1165

    A processor allocation scheme for mesh computers greatly affects their system utilization. The performance of an allocation scheme is largely dependent on its ability to detect available submeshes. We propose a new type of submesh, called a link-disjoint submesh, for processor allocation in mesh computers. This type of submesh increases the submesh recognition capability of an allocation scheme. A link-disjoint submesh is not a contiguous submesh as in the previous scheme, but this submesh still has no common communication link with any other submesh. When wormhole routing or circuit switching is used, the communication delay caused by non-contiguous processor allocation is minor. Through simulation, the performance of our scheme is measured and compared to the previous schemes in terms of such parameters as finish time and system utilization. It is shown through simulation that the link-disjoint submesh increases the performance of an allocation scheme.

  • A Comparison of Blocking and Non-blocking Packet Switching Techniques in Hierarchical Ring Networks

    Govindan RAVINDRAN  Michael STUMM  

     
    PAPER-Interconnection Networks

      Vol:
    E79-D No:8
      Page(s):
    1130-1138

    This paper presents the results of a simulation study of blocking and non-blocking switching for hierarchical ring networks. The switching techniques include wormhole, virtual cut-through, and slotted ring. We conclude that slotted ring network performs better than the more popular wormhole and virtual cut-through networks. We also show that the size of the node buffers is an important parameter and that choosing them too large can hurt performance in some cases. Slotted rings have the advantage that the choice of buffer size is easier in that larger than necessary buffers do not hurt performance and hence a single choice of buffer size performs well for all system configurations. In contrast, the optimal buffer size for virtual cut-through and wormhole switching nodes varies depending on the system configuration and the level in the hierarchy in which the switching node lies.