1-9hit |
In this paper, we extend the notion of bijective connection graphs to introduce directed bijective connection graphs. We propose algorithms that solve the node-to-set node-disjoint paths problem and the node-to-node node-disjoint paths problem in a directed bijective connection graph. The time complexities of the algorithms are both O(n4), and the maximum path lengths are both 2n-1.
Manabu KOBAYASHI Toshiyasu MATSUSHIMA Shigeichi HIRASAWA
F.P. Preparata et al. have proposed a fault diagnosis model to find all faulty units in the multicomputer system by using outcomes which each unit tests some other units. In this paper, for probabilistic diagnosis models, we show an efficient diagnosis algorithm to obtain a posteriori probability that each of units is faulty given the test outcomes. Furthermore, we propose a method to analyze the diagnostic error probability of this algorithm.
Junsuk PARK Nobuhiro SEKI Keiichi KANEKO
In the topologies for interconnected nodes, it is desirable to have a low degree and a small diameter. For the same number of nodes, a dual-cube topology has almost half the degree compared to a hypercube while increasing the diameter by just one. Hence, it is a promising topology for interconnection networks of massively parallel systems. We propose here a stochastic fault-tolerant routing algorithm to find a non-faulty path from a source node to a destination node in a dual-cube.
The Möbius cube is a variant of the hypercube. Its advantage is that it can connect the same number of nodes as a hypercube but with almost half the diameter of the hypercube. We propose an algorithm to solve the node-to-node disjoint paths problem in n-Möbius cubes in polynomial-order time of n. We provide a proof of correctness of the algorithm and estimate that the time complexity is O(n2) and the maximum path length is 3n-5.
David KOCIK Yuki HIRAI Keiichi KANEKO
This paper proposes an algorithm that solves the node-to-set disjoint paths problem in an n-Möbius cube in polynomial-order time of n. It also gives a proof of correctness of the algorithm as well as estimating the time complexity, O(n4), and the maximum path length, 2n-1. A computer experiment is conducted for n=1,2,...,31 to measure the average performance of the algorithm. The results show that the average time complexity is gradually approaching to O(n3) and that the maximum path lengths cannot be attained easily over the range of n in the experiment.
In many parallel programs, run-time array redistribution is usually required to enhance data locality and reduce remote memory access on the distributed memory multicomputers. In general, array distribution can be classified into regular distribution and irregular distribution according to the distribution fashion. Many methods for performing regular array redistribution have been presented in the literature. However, for the heterogeneous computation environment, irregular array redistributions can be used to adjust data assignment at run-time. In this paper, an Essential Cycle Calculation method for unequal block sizes array redistribution is presented. In the ECC method, a processor first computes the source/destination processor/data sets of array elements in the first essential cycle of the local array it owns. From the source/destination processor/data sets of array elements in the first essential cycle, we can construct packing/unpacking pattern tables. Since each essential cycle has the same communication pattern, based on the packing/unpacking pattern tables, a processor can pack/unpack array elements efficiently. To evaluate the performance of the ECC method, we have implemented this method on an IBM SP2 parallel machine and compare it with the Sequence method. The cost models for these methods are also presented. The experimental results show that the ECC method greatly outperforms the Sequence method for all test samples.
In most cases of distributed memory computations, node programs are executed on processors according to the owner computes rule. However, owner computes rule is not best suited for irregular application codes. In irregular application codes, use of indirection in accessing left hand side array makes it difficult to partition the loop iterations, and because of use of indirection in accessing right hand side elements, we may reduce total communication by using heuristics other than owner computes rule. In this paper, we propose a communication cost reduction computes rule for irregular loop partitioning, called least communication computes rule. We partition a loop iteration to a processor on which the minimal communication cost is ensured when executing that iteration. Then, after all iterations are partitioned into various processors, we give global vs. local data transformation rule, indirection arrays remapping and communication optimization methods. The experimental results show that, in most cases, our approaches achieved better performance than other loop partitioning rules.
In this paper, we present deterministic parallel algorithms for the convex hull of sorted points and their application to a related problem. The algorithms are proposed for the coarse grained multicomputer (CGM) model. We first propose a cost optimal parallel algorithm for computing the problem with a constant number of communication rounds for n/p p2, where n is the size of an input and p is the number of processors. Next we propose a cost optimal algorithm, which is more complicated, for n/p pε, where 0 < ε < 2. From the above two results, we can compute the convex hull of sorted points with O(n/p) computation time and a constant number of communication rounds for n/p pε, where ε > 0. Finally we show an application of our convex hull algorithms. We solve the convex layers for d lines in O((n log n)/p) computation time with a constant number of communication rounds. The algorithm is also cost optimal for the problem.
In this paper, we propose an efficient task scheduling scheme, called CTS (Class-based Task Scheduling), to obtain high performance in terms of high system utilization and low waiting times for tasks. While a better submesh allocation scheme can improve system performance, an allocation policy alone cannot improve performance significantly. This is due to the fact that the FCFS task scheduling policy leads to large external fragmentation. The CTS strategy maintains four separate queues, one for each incoming task class. This avoids the blacking property incurred in the FCFS scheduling. To reduce the external fragmentation, a job tends to wait for an occupied submesh of the same size instead of using a new submesh in the CTS strategy. Simulation results indicate that the proposed scheduling strategy improves the performance compared to the FCFS scheduling policy by reducing the average waiting delay significantly.