1-5hit |
Chen CHEN Kai LU Xiaoping WANG Xu ZHOU Zhendong WU
Most existing deterministic multithreading systems are costly on pipeline parallel programs due to load imbalance. In this letter, we propose a Load-Balanced Deterministic Runtime (LBDR) for pipeline parallelism. LBDR deterministically takes some tokens from non-synchronization-intensive threads to synchronization-intensive threads. Experimental results show that LBDR outperforms the state-of-the-art design by an average of 22.5%.
Wenzhe ZHANG Kai LU Xiaoping WANG Jie JIAN
New volatile memory (e.g. Phase Change Memroy) presents fast access, large capacity, byte-addressable, and non-volatility features. These features will bring impacts on the design of current software system. It has become a hot research topic of how to manage it and provide what kind of interface for upper application to use it. This paper proposes FP-Heap. FP-Heap supports direct access to non-volatile memory through a persistent heap interface. With FP-Heap, traditional persistent object systems can benefit directly from the byte-persistency of non-volatile memory. FP-Heap extends current virtual memory manager (VMM) to manage non-volatile memory and maintain a persistent mapping relationship. Also, FP-Heap offers a lightweight transaction mechanism to support atomic update of persistent data, a simple namespace to facilitate data indexing, and a basic access control mechanism to support data sharing. Compared with previous work Mnemosyne, FP-Heap achieves higher performance by its customized VMM and optimized transaction mechanism.
Xu ZHOU Kai LU Xiaoping WANG Wenzhe ZHANG Kai ZHANG Xu LI Gen LI
The nondeterminism of message-passing communication brings challenges to program debugging, testing and fault-tolerance. This paper proposes a novel deterministic message-passing implementation (DMPI) for parallel programs in the distributed environment. DMPI is compatible with the standard MPI in user interface, and it guarantees the reproducibility of message with high performance. The basic idea of DMPI is to use logical time to solve message races and control asynchronous transmissions, and thus we could eliminate the nondeterministic behaviors of the existing message-passing mechanism. We apply a buffering strategy to alleviate the performance slowdown caused by mismatch of logical time and physical time. To avoid deadlocks introduced by deterministic mechanisms, we also integrate DMPI with a lightweight deadlock checker to dynamically detect and solve these deadlocks. We have implemented DMPI and evaluated it using NPB benchmarks. The results show that DMPI could guarantee determinism with incurring modest runtime overhead (14% on average).
Chen CHEN Kai LU Xiaoping WANG Xu ZHOU Zhendong WU
Strongly deterministic multithreading provides determinism for multithreaded programs even in the presence of data races. A common way to guarantee determinism for data races is to isolate threads by buffering shared memory accesses. Unfortunately, buffering all shared accesses is prohibitively costly. We propose an approach called DRDet to efficiently make data races deterministic. DRDet leverages the insight that, instead of buffering all shared memory accesses, it is sufficient to only buffer memory accesses involving data races. DRDet uses a sound data-race detector to detect all potential data races. These potential data races, along with all accesses which may access the same set of memory objects, are flagged as data-race-involved accesses. Unsurprisingly, the imprecision of static analyses makes a large fraction of shared accesses to be data-race-involved. DRDet employs two optimizations which aim at reducing the number of accesses to be sent to query alias analysis. We implement DRDet on CoreDet, a state-of-the-art deterministic multithreading system. Our empirical evaluation shows that DRDet reduces the overhead of CoreDet by an average of 1.6X, without weakening determinism and scalability.
Xu LI Kai LU Xiaoping WANG Bin DAI Xu ZHOU
Existing large-scale systems suffer from various hardware/software failures, motivating the research of fault-tolerance techniques. Checkpoint-restart techniques are widely applied fault-tolerance approaches, especially in scientific computing systems. However, the overhead of checkpoint largely influences the overall system performance. Recently, the emerging byte-addressable, persistent memory technologies, such as phase change memory (PCM), make it possible to implement checkpointing in arbitrary data granularity. However, the impact of data granularity on the checkpointing cost has not been fully addressed. In this paper, we investigate how data granularity influences the performance of a checkpoint system. Further, we design and implement a high-performance checkpoint system named AG-ckpt. AG-ckpt is a hybrid-granularity incremental checkpointing scheme through: (1) low-cost modified-memory detection and (2) fine-grained memory duplication. Moreover, we also formulize the performance-granularity relationship of checkpointing systems through a mathematical model, and further obtain the optimum solutions. We conduct the experiments through several typical benchmarks to verify the performance gain of our design. Compared to conventional incremental checkpoint, our results show that AG-ckpt can reduce checkpoint data amount up to 50% and provide a speedup of 1.2x-1.3x on checkpoint efficiency.