1-2hit |
Jyh-Chang UENG Ce-Kuen SHIEH Su-Cheong MAC An-Chow LAI Tyng-Yue LIANG
This paper describes the design and implementation of a multi-threaded Distributed Shared Memory (DSM) system, called Cohesion, which provides high programming flexibility and latency masking, and supports load balancing. Cohesion offers a parallel programming environment which is very similar to that on a multiprocessors system. Threads could be created recursively in this environment, and users are not required to handle the locations of the threads. Instead of supporting a shared variable model, Cohesion provides a global shared address space among all nodes in the system. The space is further divided into three regions, i. e. , release, conventional, and object-based memory, each is applied with different consistency protocol. In this paper, the design issues in an ordinary thread system, such as thread management, load balancing, and synchronization, have been reconsidered with the memory management provided by the DSM system. Several real applications have been used to evaluate the performance of the system. The results show that multi-threading usually has better performance than single-threading because the network latency can be masked by overlapping communication and computation. However, the gain depends on program behavior and the number of threads executed on each node in the system.
Tyng-Yeu LIANG Ce-Kuen SHIEH Deh-Cheng LIU
This paper first examines the issues related to scheduling loop applications on a software distributed shared memory (DSM) system. Then, a dynamic scheduling scheme is developed based on the examined issues to enhance the performance of loop applications on DSM. Compared with previous works, the proposed scheme has several specialties. The first is that the workload of processors can be effectively balanced even when the computational capabilities of processors and the computational needs of threads are not identical. The second is it divides thread mapping into two phases, each with one consideration, i.e., load balance or communication cost, and adopts thread migration and exchange in the two phases, respectively. The third is the exploitation of data sharing among threads to reduce data-consistency communication, and the last is to attack the negative effect of the unnecessary inter-node sharing caused by thread re-mapping. The proposed scheme has been implemented on a page-based DSM system called Cohesion. Our experiments show that the proposed scheme is more effective to improve the performance of the test programs than related schemes.