1-6hit |
Tree task structures occur frequently in many applications where parallelization may be desirable. We present a formal treatment of non-preemptively scheduling task trees on distributed memory multiprocessors and show that the fundamental problems of scheduling (i) a task tree in absence of any inter-task communication on a fixed number of processors and (ii) a task tree with inter-task communication on an unbounded number of processors are NP-complete. For task trees that satisfy certain constraints, we present an optimal scheduling algorithm. The algorithm is shown optimal over a wider set of task trees than previous works.
We consider the problem of placing resources in a distributed computing system so that certain performance requirements may be met while minimizing the number of resource copies needed. Resources include special I/O processors, expensive peripheral devices, or such software modules as compilers, library routines, and data files. Due to the delay in accessing each of these resources, system performance degrades as the distance between each processor and its nearest resource copy increases. Thus, every processor must be within a given distance k1 of at least one resource copy, which is called the k-bounded placement problem. The structure of a distributed computing system is represented by a graph. The k-bounded placement problem is first transformed into the problem of finding smallest k-dominating sets in a graph. Searching for smallest k-dominating sets is formulated as a state-space search problem. We derive heuristic information to speed up the search, which is then used to solve the problem with the well-known A* algorithm. An illustrative example and some experimental results are presented to demonstrate the effectiveness of the heuristic search.
Dingchao LI Yuji IWAHORI Tatsuya HAYASHI Naohiro ISHII
Reducing communication overhead is a key goal of program optimization for current scalable multiprocessors. A well-known approach to achieving this is to map tasks (indivisible units of computation) to processors so that communication and computation overlap as much as possible. In an earlier work, we developed a look-ahead scheduling heuristic for efficiently reducing communication overhead with the aim of decreasing the completion time of a given parallel program. In this paper, we report on an extension of the algorithm, which fills in the idle time slots created by interprocessor communication without increasing the algorithm's time complexity. The results of experiments emphasize the importance of optimally filling idle time slots in processors.
Minyi GUO Yoshiyuki YAMASHITA Ikuo NAKATA
Array redistribution is required very often in programs on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution, otherwise the performance of programs may degrade considerably. In this paper, we focus on automatic generation of communication routines for multi-dimensional redistribution. The principal advantage of this work is to gain the ability to handle redistribution between arbitrary source and destination processor sets and between arbitrary source and destination distribution schemes. We have implemented these algorithms using Parallelware communication library. Some experimental results show the efficiency and flexibility of our techniques compared to the other redistribution works.
Tomoo INOUE Takaharu FUJII Hideo FUJIWARA
The problem of test generation for VLSI circuits computationally requires prohibitive costs. Parallel processing on a multiprocessor system is one of available methods in order to speedup the process for such time-consuming problems. In this paper, we analyze the performance of parallel test generation for combinational circuits. We present two types of parallel test generation systems in which the communication methods are different; vector broadcasting (VB) and fault broadcasting (FB) systems, and analyze the number of generated test vectors, the costs of test vector generation, fault simulation and communication, and the speedup of these parallel test generation systems, where the two types of communication factors; the communication cut-off factor and the communication period, are applied. We also present experimental results on the VB and FB systems implemented on a network of workstations using ISCAS'85 and ISCAS'89 benchmark circuits. The analytical and experimental results show that the total number of test vectors generated in the VB system is the same as that in the FB system, the speedup of the FB system is larger than that of the VB, and it is effective in reducing the communication cost to switch broadcasted data from vectors to faults.
Modeling and performance analysis have played an important role in the economical design and efficient operation of switching systems, and is currently becoming more important because the switching systems should handle a wide range of traffic characteristics, meeting the grade of service requirements of each traffic type. Without these techniques we could no longer achieve economy and efficiency of the switching systems in complex traffic characteristic environments. From the beginning of research on electronic switching systems offering circuit-switched applications, Stored Program Control (SPC) technology has posed challenges in the area of modeling and performance analysis as well as queueing structure, efficient scheduling, and overload control strategy design. Not only teletraffic engineers and performance analysts, but also queueing theorists have been attracted to this new field, and intensive research activities, both in theory and in practice, have continued over the past two decades, now evolving to even a broader technical field including traditional performance analysis. This article reviews a number of important issues that have been raised and solved, and whose solutions have been reflected in the design of SPC switching systems. It first discusses traffic problems for centralized control systems. It next discusses traffic problems inherent in distributed switching systems.