Masashi MIZOGUCHI Toshimitsu USHIO
The Smith method has been used to control physical plants with dead time components, where plant states after the dead time is elapsed are predicted and a control input is determined based on the predicted states. We extend the method to the symbolic control and design a symbolic Smith controller to deal with a nondeterministic embedded system. Due to the nondeterministic transitions, the proposed controller computes all reachable plant states after the dead time is elapsed and determines a control input that is suitable for all of them in terms of a given control specification. The essence of the Smith method is that the effects of the dead time are suppressed by the prediction, however, which is not always guaranteed for nondeterministic systems because there may exist no control input that is suitable for all predicted states. Thus, in this paper, we discuss the existence of a deadlock-free symbolic Smith controller. If it exists, it is guaranteed that the effects of the dead time can be suppressed and that the controller can always issue the control input for any reachable state of the plant. If it does not exist, it is proved that the deviation from the control specification is essentially inevitable.
Ryuta KAWANO Ryota YASUDO Hiroki MATSUTANI Michihiro KOIBUCHI Hideharu AMANO
Recently proposed irregular networks can reduce the latency for both on-chip and off-chip systems with a large number of computing nodes and thus can improve the performance of parallel applications. However, these networks usually suffer from deadlocks in routing packets when using a naive minimal path routing algorithm. To solve this problem, we focus attention on a lately proposed theory that generalizes the turn model to maintain the network performance with deadlock-freedom. The theorems remain a challenge of applying themselves to arbitrary topologies including fully irregular networks. In this paper, we advance the theorems to completely general ones. Moreover, we provide a feasible implementation of a deadlock-free routing method based on our advanced theorem. Experimental results show that the routing method based on our proposed theorem can improve the network throughput by up to 138 % compared to a conventional deterministic minimal routing method. Moreover, when utilized as the escape path in Duato's protocol, it can improve the throughput by up to 26.3 % compared with the conventional up*/down* routing.
Bo ZHAO Guangliang REN Huining ZHANG
Pre-weighting based Contention Resolution Diversity Slotted ALOHA-like (PW-CRDSA-like) schemes with joint multi-user multi-slot detection (JMMD) algorithm are proposed to improve the throughput of random access (RA) in geostationary earth orbit (GEO) satellite networks. The packet and its replicas are weighted by different pre-weighting factors at each user terminal, and are sent in randomly selected slots within a frame. The correlation of channels between user terminals and satellite node in different slots are removed by using pre-weighting factors. At the gateway station, after the decoding processing of CRDSA, the combinations of remained signals in slots that can construct virtual multiple-input multiple-output (MIMO) signal models are found and decoded by the JMMD algorithm. Deadlock problems that can be equivalent to virtual MIMO signal models in the conventional CRDSA-like schemes can be effectively resolved, which improves the throughput of these CRDSA-like schemes. Simulation results show that the PW-CRDSA-like schemes with the JMMD significantly outperform the conventional CRDSA-like schemes in terms of the throughput under equal packet loss ratio (PLR) conditions (e.g. PLR =10-2), and as the number of the transmitted replicas increases, the throughput of the PW-CRDSA-like schemes also increases, and the normalized maximum throughput of the PW-CRDSA-5 (i.e., PW-CRDSA with 5 replicas) scheme can reach 0.95.
Ryuta KAWANO Hiroshi NAKAHARA Seiichi TADE Ikki FUJIWARA Hiroki MATSUTANI Michihiro KOIBUCHI Hideharu AMANO
Inter-switch networks for HPC systems and data-centers can be improved by applying random shortcut topologies with a reduced number of hops. With minimal routing in such networks; however, deadlock-freedom is not guaranteed. Multiple Virtual Channels (VCs) are efficiently used to avoid this problem. However, previous works do not provide good trade-offs between the number of required VCs and the time and memory complexities of an algorithm. In this work, a novel and fast algorithm, named ACRO, is proposed to endorse the arbitrary routing functions with deadlock-freedom, as well as consuming a small number of VCs. A heuristic approach to reduce VCs is achieved with a hash table, which improves the scalability of the algorithm compared with our previous work. Moreover, experimental results show that ACRO can reduce the average number of VCs by up to 63% when compared with a conventional algorithm that has the same time complexity. Furthermore, ACRO reduces the time complexity by a factor of O(|N|⋅log|N|), when compared with another conventional algorithm that requires almost the same number of VCs.
Petri Net (PN) is a frequently-used model for deadlock detection. Among various detection methods on PN, reachability analysis is the most accurate one since it never produces any false positive or false negative. Although suffering from the well-known state space explosion problem, reachability analysis is appropriate for small- and medium-scale programs. In order to mitigate the explosion problem several kinds of techniques have been proposed aiming at accelerating the reachability analysis, such as net reduction and abstraction. However, these techniques are for general PN and do not take the particularity of application into consideration, so their optimization potential is not adequately developed. In this paper, the feature of mutual exclusion-based program is considered, therefore several strategies are proposed to accelerate the reachability analysis. Among these strategies a customized net reduction rule aims at reducing the scale of PN, two marking compression methods and two pruning methods can reduce the volume of reachability graph. Reachability analysis on PN can only report one deadlock on each path. However, the reported deadlock may be a false alarm in which situation real deadlocks may be hidden. To improve the detection efficiency, we proposed a deadlock recovery algorithm so that more deadlocks can be detected in a shorter time. To validate the efficiency of these methods, a prototype is implemented and applied to SPLASH2 benchmarks. The experimental results show that these methods accelerate the reachability analysis for mutual exclusion-based deadlock detection significantly.
Lian ZENG Tieyuan PAN Xin JIANG Takahiro WATANABE
As the semiconductor technology continues to develop, hundreds of cores will be deployed on a single die in the future Chip-Multiprocessors (CMPs) design. Three-Dimensional Network-on-Chips (3D NoCs) has become an attractive solution which can provide impressive high performance. An efficient and deadlock-free routing algorithm is a critical to achieve the high performance of network-on-chip. Traditional methods based on deterministic and turn model are deadlock-free, but they are unable to distribute the traffic loads over the network. In this paper, we propose an efficient, adaptive and deadlock-free algorithm (EAR) based on a novel routing selection strategy in 3D NoC, which can distribute the traffic loads not only in intra-layers but also in inter-layers according to congestion information and path diversity. Simulation results show that the proposed method achieves the significant performance improvement compared with others.
GuanJun LIU ChangJun JIANG MengChu ZHOU Atsushi OHTA
Petri nets are a kind of formal language that are widely applied in concurrent systems associated with resource allocation due to their abilities of the natural description on resource allocation and the precise characterization on deadlock. Weighted System of Simple Sequential Processes with Resources (WS3PR) is an important subclass of Petri nets that can model many resource allocation systems in which 1) multiple processes may run in parallel and 2) each execution step of each process may use multiple units from a single resource type but cannot use multiple resource types. We first prove that the liveness problem of WS3PR is co-NP-hard on the basis of the partition problem. Furthermore, we present a necessary and sufficient condition for the liveness of WS3PR based on two new concepts called Structurally Circular Wait (SCW) and Blocking Marking (BM), i.e., a WS3PR is live iff each SCW has no BM. A sufficient condition is also proposed to guarantee that an SCW has no BM. Additionally, we show some advantages of using SCW to analyze the deadlock problem compared to other siphon-based ones, and discuss the relation between SCW and siphon. These results are valuable to the further research on the deadlock prevention or avoidance for WS3PR.
Zewen SHI Xiaoyang ZENG Zhiyi YU
Manufacturing defects in the deep sub-micron VLSI process and aging resulted problems of devices during lifecycle are inevitable, and fault-tolerant routing algorithms are important to provide the required communication for NoCs in spite of failures. The proposed algorithm, referred to as scalable and reconfigurable fault-tolerant distributed routing (RFDR), partitions the system into nine regions using the concept of divide-and-conquer. It is a distributed algorithm, and each router guarantees fault-tolerance within one's own region and the system can be still sustained with multiple fault areas. The proposed RFDR has excellent scalability with hardware cost keeping constant independent of system size. Also it is completely reconfigurable when new nodes fail. Simulations under various synthetic traffic patterns show its better performance compared to Extended-XY routing algorithm. Moreover, there is almost no hardware overhead compared to Logic-Based Distributed Routing (LBDR), but the fault-tolerance capacity is enhanced in the proposed algorithm. Hardware cost is reduced 37% compared to Reconfigurable Distributed Scalable Predictable Interconnect Network (R-DSPIN) which only supports single fault region.
M.M. Hafizur RAHMAN Yukinori SATO Yasushi INOGUCHI
A Modified Hierarchical 3D-Torus (MH3DT) network is a 3D-torus network consisting of multiple basic modules, in which each basic module itself is a 3D-torus network. Inter-node communication performance has been evaluated using dimension-order routing and 2 virtual channels (VCs) under uniform traffic patterns but not under non-uniform traffic patterns. In this paper, we evaluate the inter-node communication performance of MH3DT under five non-uniform traffic patterns and compare it with other networks. We found that under non-uniform traffic patterns, the MH3DT yields high throughput and low latency, providing better inter-node communication performance compared to H3DT, TESH, mesh, and torus networks. Also, we found that non-uniform traffic patterns have higher throughput than uniform traffic in the MH3DT network.
M.M. Hafizur RAHMAN Yasushi INOGUCHI Yukinori SATO Susumu HORIGUCHI
Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Tori connected Torus Network (TTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 2D-torus networks that are hierarchically interconnected for higher-level networks. This paper addresses the architectural details of the TTN and explores aspects such as node degree, network diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the TTN using four virtual channels and evaluate the network's dynamic communication performance using the proposed routing algorithm under uniform and various non-uniform traffic patterns. We evaluate the dynamic communication performance of TTN, TESH, MH3DT, mesh, and torus networks by computer simulation. It is shown that the TTN possesses several attractive features, including constant node degree, small diameter, low cost, small average distance, moderate (neither too low, nor too high) bisection width, and high throughput and very low zero load latency, which provide better dynamic communication performance than that of other conventional and hierarchical networks.
M.M. Hafizur RAHMAN Yasushi INOGUCHI Susumu HORIGUCHI
Three-dimensional (3D) wafer stacked implementation (WSI) has been proposed as a promising technology for massively parallel computers. A hierarchical 3D-torus (H3DT) network, which is a 3D-torus network of multiple basic modules in which the basic modules are 3D-mesh networks, has been proposed for efficient 3D-WSI. However, the restricted use of physical links between basic modules in the higher level networks reduces the dynamic communication performance of this network. A torus network has better dynamic communication performance than a mesh network. Therefore, we have modified the H3DT network by replacing the 3D-mesh modules by 3D-tori, calling it a Modified H3DT (MH3DT) network. This paper addresses the architectural details of the MH3DT network and explores aspects such as degree, diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the MH3DT network using two virtual channels and evaluate the network's dynamic communication performance under the uniform traffic pattern, using the proposed routing algorithm. It is shown that the MH3DT network possesses several attractive features including small diameter, small cost, small average distance, better bisection width, and better dynamic communication performance.
This paper addresses the scheduling problem of a class of automated manufacturing systems with multiple resource requests. In the automated manufacturing system model, a set of jobs is to be processed and each job requires a sequence of operations. Each operation may need more than one resource type and multiple identical units with the same resource type. Upon the completion of an operation, resources needed in the next operation of the same job cannot be released and the remaining resources cannot be released until the start of the next operation. The scheduling problem is formulated by Timed Petri nets model under which the scheduling goal consists in sequencing the transition firing sequence in order to avoid the deadlock situation and to minimize the makespan. In the proposed genetic algorithm with deadlock-free constraint, Petri net transition sequence is coded and a deadlock detection method based on D-siphon technology is proposed to reschedule the sequence of transitions. The enabled transitions should be fired as early as possible and thus the quality of solutions can be improved. In the fitness computation procedure, a penalty item for the infeasible solution is involved to prevent the search process from converging to the infeasible solution. The method proposed in this paper can get a feasible scheduling strategy as well as enable the system to achieve good performance. Numerical results presented in the paper show the efficiency of the proposed algorithm.
Jaeyong SHIM Minkyu LEE Dongsoo HAN
A workflow definition containing errors might cause serious problems for an enterprise especially when it involves mission critical business processes or inter-organizational interaction. So workflow definitions should be defined in a strict and rigorous way. In this paper, we develop a workflow definition language and analysis methods for the language to support strict and rigorous workflow definitions. Faults or mistakes causing communication deadlock, access conflicts, and improper exception specification in workflow definitions can be detected and notified automatically using the methods. The proposed workflow definition language borrows structured constructs of conventional programming languages because many good features of conventional programming languages also can be used effectively in expressing workflow processes. With slight modifications and scope restrictions, the developed analysis techniques in this paper can be used in any workflow definition languages and they can help workflow designers define workflow processes in much more safe and reliable manner.
M. M. Hafizur RAHMAN Susumu HORIGUCHI
Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Hierarchical Torus Network (HTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 3D-torus networks that are hierarchically interconnected for higher level networks. The static network performance of the HTN has already been studied and has been shown to be good. Dynamic communication performance has been evaluated under uniform traffic pattern but not under non-uniform traffic patterns. In this paper, we present a deadlock-free routing algorithm for the HTN using 3 virtual channels and evaluate the network's dynamic communication performance under three non-uniform traffic patterns, using the proposed routing algorithm. We evaluate the dynamic communication performance of HTN, H3D-mesh, H3D-torus, TESH, and mesh networks by computer simulation. We find that the dynamic communication performance of HTN is better than that of the H3D-mesh, H3D-torus, TESH, and mesh networks.
Concurrency in the computing systems using a deadlock avoidance strategy varies largely according to the way that resource usage plan of process is used in testing the possibility of deadlocks. If a detailed resource usage plan of process is to be taken into account, the deadlock avoidance problem is known to be NP-complete. A decomposition model to manage resources is proposed, where process is logically partitioned into a number of segments each of which uses at least one resource. It is shown that one of our deadlock avoidance algorithms achieves the best concurrency among the polynomial-bounded algorithms. We also present a heuristic algorithm that achieves concurrency close to the optimal one. Finally, we analyze concurrency of various algorithms.
We suggest a new probe message structure and an efficient probe-based deadlock detection and recovery algorithm that can be used in distributed database systems. We determine the characteristics of the probe messages and suggest an algorithm that can reduce the communication cost required for deadlock detection and recovery.
This paper describes an efficient symbolic model checking algorithm for verification of deadlock free property of high level robot control program called Task Control Architecture (TCA). TCA is a model of concurrent robot control processes. The verification tool we used is the Symbolic Model Verifier (SMV). Since the SMV is not so efficient for verification of liveness properties of many concurrent processes such as deadlock free property, we first described the deadlock free property by using safety properties that SMV can verify efficiently. In addition, we modify the symbolic model checking algorithm of the SMV so that it can handle many concurrent processes efficiently. Experimental measurements show that we can obtain more than 1000 times speed-up by these methods.
Nen-Chung WANG Tzung-Shi CHEN Chih-Ping CHU
In this paper, we propose an efficient dual-tree-based multicasting scheme with three destination-switch partition strategies on irregular switch-based networks. The dual-tree-based routing scheme supports adaptive, distributed, and deadlock-free multicast on irregular networks with double channels. We first describe a dual-tree structure constructed from the irregular networks and prove that the multicasting based on such a structure is deadlock-free. Then, an efficient multicast routing algorithm with three destination-switch partition strategies: source-switch-based partition, destination-switch-based partition, and all-switches-based partition, is proposed. Finally, we perform simulations to evaluate our proposed algorithm under various impact parameters: system size, message length, and startup time. The experimental results show that the improved tree-based multicasting scheme outperforms the usual tree-based multicasting scheme. The dual-tree-based multicasting scheme with destination-switch-based partition is shown to be the best for all situations.
Toshinori TAKABATAKE Masato KITAKAMI Hideo ITO
In interconnection networks, deadlock recovery has been studied in routing strategy. The routing strategy for the deadlock recovery is intended to optimize the routing performance when deadlocks do not occur. On the other hand, it is important to improve the routing performance by handling deadlocks if they occur. In this paper, a routing strategy for suspensive deadlock recovery called an escape-restoration routing is proposed and its performance is evaluated. In the principle of the proposed techniques, a small amount of exclusive buffer (escape-buffer) at each router is prepared for handling one of deadlocked packets. The transmission of the packet is suspended by temporarily escaping it to the escape-buffer. After the other deadlocked packets were sent, the suspended transmission resumes by restoring the escaped packet. Evaluation results show that the proposed techniques can improve the routing performance more than that of the previous recovery-based techniques in handling deadlocks.
Deogkyoo LEE Daekeun MOON Ilgu YUN Hagbae KIM
Since components faults occurring at arbitrary places (primarily on the links) affect seriously network performance and reliability, the multicomputers operating in harsh environments should be designed to guarantee normal network-missions in presence of those faults. One solution to the end is a fault-tolerant routing scheme, which enables messages to safely reach their destinations avoiding failed links when transmission of messages is blocked by certain faults. In the paper, we develop a fault-tolerant routing algorithm with deadlock freedom in an n-dimensional meshed network, and validate its efficiency and effectiveness through proper simulations. The aspects of fault-tolerance is adopted by appending partial-adaptiveness and detouring to the e-cube algorithm, while using a wormhole routing for the backbone routing method. The phenomenon of deadlock incurred due to its adaptiveness is eliminated by classifying a physical channel into a couple of virtual channels.