1-13hit |
Ryuta KAWANO Ryota YASUDO Hiroki MATSUTANI Michihiro KOIBUCHI Hideharu AMANO
Recently proposed irregular networks can reduce the latency for both on-chip and off-chip systems with a large number of computing nodes and thus can improve the performance of parallel applications. However, these networks usually suffer from deadlocks in routing packets when using a naive minimal path routing algorithm. To solve this problem, we focus attention on a lately proposed theory that generalizes the turn model to maintain the network performance with deadlock-freedom. The theorems remain a challenge of applying themselves to arbitrary topologies including fully irregular networks. In this paper, we advance the theorems to completely general ones. Moreover, we provide a feasible implementation of a deadlock-free routing method based on our advanced theorem. Experimental results show that the routing method based on our proposed theorem can improve the network throughput by up to 138 % compared to a conventional deterministic minimal routing method. Moreover, when utilized as the escape path in Duato's protocol, it can improve the throughput by up to 26.3 % compared with the conventional up*/down* routing.
Ryuta KAWANO Hiroshi NAKAHARA Seiichi TADE Ikki FUJIWARA Hiroki MATSUTANI Michihiro KOIBUCHI Hideharu AMANO
Inter-switch networks for HPC systems and data-centers can be improved by applying random shortcut topologies with a reduced number of hops. With minimal routing in such networks; however, deadlock-freedom is not guaranteed. Multiple Virtual Channels (VCs) are efficiently used to avoid this problem. However, previous works do not provide good trade-offs between the number of required VCs and the time and memory complexities of an algorithm. In this work, a novel and fast algorithm, named ACRO, is proposed to endorse the arbitrary routing functions with deadlock-freedom, as well as consuming a small number of VCs. A heuristic approach to reduce VCs is achieved with a hash table, which improves the scalability of the algorithm compared with our previous work. Moreover, experimental results show that ACRO can reduce the average number of VCs by up to 63% when compared with a conventional algorithm that has the same time complexity. Furthermore, ACRO reduces the time complexity by a factor of O(|N|⋅log|N|), when compared with another conventional algorithm that requires almost the same number of VCs.
Lian ZENG Tieyuan PAN Xin JIANG Takahiro WATANABE
As the semiconductor technology continues to develop, hundreds of cores will be deployed on a single die in the future Chip-Multiprocessors (CMPs) design. Three-Dimensional Network-on-Chips (3D NoCs) has become an attractive solution which can provide impressive high performance. An efficient and deadlock-free routing algorithm is a critical to achieve the high performance of network-on-chip. Traditional methods based on deterministic and turn model are deadlock-free, but they are unable to distribute the traffic loads over the network. In this paper, we propose an efficient, adaptive and deadlock-free algorithm (EAR) based on a novel routing selection strategy in 3D NoC, which can distribute the traffic loads not only in intra-layers but also in inter-layers according to congestion information and path diversity. Simulation results show that the proposed method achieves the significant performance improvement compared with others.
Zewen SHI Xiaoyang ZENG Zhiyi YU
Manufacturing defects in the deep sub-micron VLSI process and aging resulted problems of devices during lifecycle are inevitable, and fault-tolerant routing algorithms are important to provide the required communication for NoCs in spite of failures. The proposed algorithm, referred to as scalable and reconfigurable fault-tolerant distributed routing (RFDR), partitions the system into nine regions using the concept of divide-and-conquer. It is a distributed algorithm, and each router guarantees fault-tolerance within one's own region and the system can be still sustained with multiple fault areas. The proposed RFDR has excellent scalability with hardware cost keeping constant independent of system size. Also it is completely reconfigurable when new nodes fail. Simulations under various synthetic traffic patterns show its better performance compared to Extended-XY routing algorithm. Moreover, there is almost no hardware overhead compared to Logic-Based Distributed Routing (LBDR), but the fault-tolerance capacity is enhanced in the proposed algorithm. Hardware cost is reduced 37% compared to Reconfigurable Distributed Scalable Predictable Interconnect Network (R-DSPIN) which only supports single fault region.
M.M. Hafizur RAHMAN Yukinori SATO Yasushi INOGUCHI
A Modified Hierarchical 3D-Torus (MH3DT) network is a 3D-torus network consisting of multiple basic modules, in which each basic module itself is a 3D-torus network. Inter-node communication performance has been evaluated using dimension-order routing and 2 virtual channels (VCs) under uniform traffic patterns but not under non-uniform traffic patterns. In this paper, we evaluate the inter-node communication performance of MH3DT under five non-uniform traffic patterns and compare it with other networks. We found that under non-uniform traffic patterns, the MH3DT yields high throughput and low latency, providing better inter-node communication performance compared to H3DT, TESH, mesh, and torus networks. Also, we found that non-uniform traffic patterns have higher throughput than uniform traffic in the MH3DT network.
M.M. Hafizur RAHMAN Yasushi INOGUCHI Yukinori SATO Susumu HORIGUCHI
Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Tori connected Torus Network (TTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 2D-torus networks that are hierarchically interconnected for higher-level networks. This paper addresses the architectural details of the TTN and explores aspects such as node degree, network diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the TTN using four virtual channels and evaluate the network's dynamic communication performance using the proposed routing algorithm under uniform and various non-uniform traffic patterns. We evaluate the dynamic communication performance of TTN, TESH, MH3DT, mesh, and torus networks by computer simulation. It is shown that the TTN possesses several attractive features, including constant node degree, small diameter, low cost, small average distance, moderate (neither too low, nor too high) bisection width, and high throughput and very low zero load latency, which provide better dynamic communication performance than that of other conventional and hierarchical networks.
M.M. Hafizur RAHMAN Yasushi INOGUCHI Susumu HORIGUCHI
Three-dimensional (3D) wafer stacked implementation (WSI) has been proposed as a promising technology for massively parallel computers. A hierarchical 3D-torus (H3DT) network, which is a 3D-torus network of multiple basic modules in which the basic modules are 3D-mesh networks, has been proposed for efficient 3D-WSI. However, the restricted use of physical links between basic modules in the higher level networks reduces the dynamic communication performance of this network. A torus network has better dynamic communication performance than a mesh network. Therefore, we have modified the H3DT network by replacing the 3D-mesh modules by 3D-tori, calling it a Modified H3DT (MH3DT) network. This paper addresses the architectural details of the MH3DT network and explores aspects such as degree, diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the MH3DT network using two virtual channels and evaluate the network's dynamic communication performance under the uniform traffic pattern, using the proposed routing algorithm. It is shown that the MH3DT network possesses several attractive features including small diameter, small cost, small average distance, better bisection width, and better dynamic communication performance.
This paper addresses the scheduling problem of a class of automated manufacturing systems with multiple resource requests. In the automated manufacturing system model, a set of jobs is to be processed and each job requires a sequence of operations. Each operation may need more than one resource type and multiple identical units with the same resource type. Upon the completion of an operation, resources needed in the next operation of the same job cannot be released and the remaining resources cannot be released until the start of the next operation. The scheduling problem is formulated by Timed Petri nets model under which the scheduling goal consists in sequencing the transition firing sequence in order to avoid the deadlock situation and to minimize the makespan. In the proposed genetic algorithm with deadlock-free constraint, Petri net transition sequence is coded and a deadlock detection method based on D-siphon technology is proposed to reschedule the sequence of transitions. The enabled transitions should be fired as early as possible and thus the quality of solutions can be improved. In the fitness computation procedure, a penalty item for the infeasible solution is involved to prevent the search process from converging to the infeasible solution. The method proposed in this paper can get a feasible scheduling strategy as well as enable the system to achieve good performance. Numerical results presented in the paper show the efficiency of the proposed algorithm.
M. M. Hafizur RAHMAN Susumu HORIGUCHI
Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Hierarchical Torus Network (HTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 3D-torus networks that are hierarchically interconnected for higher level networks. The static network performance of the HTN has already been studied and has been shown to be good. Dynamic communication performance has been evaluated under uniform traffic pattern but not under non-uniform traffic patterns. In this paper, we present a deadlock-free routing algorithm for the HTN using 3 virtual channels and evaluate the network's dynamic communication performance under three non-uniform traffic patterns, using the proposed routing algorithm. We evaluate the dynamic communication performance of HTN, H3D-mesh, H3D-torus, TESH, and mesh networks by computer simulation. We find that the dynamic communication performance of HTN is better than that of the H3D-mesh, H3D-torus, TESH, and mesh networks.
Nen-Chung WANG Tzung-Shi CHEN Chih-Ping CHU
In this paper, we propose an efficient dual-tree-based multicasting scheme with three destination-switch partition strategies on irregular switch-based networks. The dual-tree-based routing scheme supports adaptive, distributed, and deadlock-free multicast on irregular networks with double channels. We first describe a dual-tree structure constructed from the irregular networks and prove that the multicasting based on such a structure is deadlock-free. Then, an efficient multicast routing algorithm with three destination-switch partition strategies: source-switch-based partition, destination-switch-based partition, and all-switches-based partition, is proposed. Finally, we perform simulations to evaluate our proposed algorithm under various impact parameters: system size, message length, and startup time. The experimental results show that the improved tree-based multicasting scheme outperforms the usual tree-based multicasting scheme. The dual-tree-based multicasting scheme with destination-switch-based partition is shown to be the best for all situations.
A fault-tolerant wormhole routing algorithm on mesh-connected processors is proposed. The proposed algorithm is based on the solid fault model and allows the fault polygons to be overlapped. The algorithm compares the position of fault region relative to current channel with the fault direction field of a misrouted message to route around overlapped fault polygons. A node deactivating algorithm to convert non-solid fault region into solid fault region is also proposed. The proposed routing algorithm uses four virtual channels and is deadlock and livelock free.
Jinsoo KIM Ji-Yun KIM Hyunsoo YOON Seung Ryoul MAENG Jung Wan CHO
We propose a fault-tolerant routing algorithm for 2D meshes. Our routing algorithm can tolerate any number of concave fault regions. It is based on xy-routing and uses the concept of the fault ring/chain composed of fault-free elements surrounding faults. Three virtual channels per physical link are used for deadlock-free routing on a fault ring. Four virtual channels are needed for a fault chain. For a concave fault ring, fault-free nodes in the concave region have been deactivated to avoid deadlock in the previous algorithms, which results in excessive loss of the computational power. Our algorithm ensures deadlock-freedom by restricting the virtual channel usage in the concave region, and it minimizes the loss of the computational power. We also extend the proposed routing scheme for adaptive fault-tolerant routing. The adaptive version requires the same number of virtual channels as the deterministic one.
Recursive Diagonal Torus (RDT) is a class of interconnection network for massively parallel computers with 216 nodes. In this paper, message transfer algorithms on the RDT are proposed and discussed. First, a simple one-to-one message routing algorithm called the vector routing is introduced and its practical extension called the floating vector routing is proposed. In the floating vector routing both the diameter and average distance are improved compared with the fixed vector routing. Next, broadcasting and hypercube emulation algorithm scheme on the RDT are shown. Finally, deadlock-free message routing algorithms on the RDT are discussed. By a simple modification of the e-cube routing and a small numbers of additional virtual channels, both one-to-one message transfer and broadcast can be achieved without deadlock.