1-5hit |
Xiangyu ZHANG Yangdong DENG Shuai MU
General purpose computing on GPU (GPGPU) has become a popular computing model for high-performance, data-intensive applications. Accordingly, there is a strong need to develop highly efficient data structures to ease the development of GPGPU applications. In this work, we proposed an efficient concurrent queue data structure for GPU computing. The GPU based provably correct, lock-free FIFO queue allows a massive number of concurrent producers and consumers. Warp-centric en-queue and de-queue procedures are introduced to better match the underlying Single-Instruction, Multiple-Thread execution model of modern GPUs. It outperforms the best previous GPU queues by up to 40 fold. The correctness of the proposed queue operations is formally validated by linearizability criteria.
Hisashi IWAMOTO Yuji YANO Yasuto KURODA Koji YAMAMOTO Shingo ATA Kazunari INOUE
Network traffic keeps increasing due to the increasing popularity of video streaming services. Routers and switches in wire-line networks require guaranteed line rates as high as 20 Gbp/s as well as advanced quality of service (QoS). Hybrid SRAM and DRAM architecture previously presented with the benefit of high-speed and high-density, but it requires complex memory management. As a result, it has hardly supported large numbers of queue, which is an effective approach to satisfying the QoS requirements. This paper proposes an intelligent memory management unit (MMU) which is based on the hybrid architecture, where over 16k multi queues are integrated. The performance examined by the system board is zero-packet loss under the seamless traffic with 60–1.5 kByte packet-length (deterministic manner). Noticeable feature in this paper's architecture is eliminating the need for any premium memories but only low-cost commodity SRAMs and DRAMs are used. The intelligent MMU employs the head buffer architecture, which is suitable for supporting a large numbers of FIFO queues. An experimental board based on this architecture is embedded into a Router system to evaluate the performance. Using 16k queues at 20 Gbps, zero-packet loss is examined with 64-Byte to 1,500-Byte packet-length.
Changwoo MIN Hyung Kook JUN Won Tae KIM Young Ik EOM
A concurrent FIFO queue is a widely used fundamental data structure for parallelizing software. In this letter, we introduce a novel concurrent FIFO queue algorithm for multicore architecture. We achieve better scalability by reducing contention among concurrent threads, and improve performance by optimizing cache-line usage. Experimental results on a server with eight cores show that our algorithm outperforms state-of-the-art algorithms by a factor of two.
Recently, the Guaranteed Frame Rate (GFR) service was proposed as a new service category of ATM to support non-realtime data applications and to provide the minimum rate guarantee. To keep the simplicity of GFR as much as possible and overcome defects of FIFO-based mechanisms, we propose a FIFO-based algorithm extending DFBA one to improve the fairness and provide the minimum rate guarantee for a wider range of Minimum Cell Rate (MCR). The key idea is controlling the number of CLP1 cells which are occupying more buffer space than the fair share even when the queue length is below Low Buffer Occupancy (LBO).
Michiko INOUE Toshimitsu MASUZAWA Nobuki TOKURA
We consider linearizable implementations of shared FIFO queues and general deterministic objects on a distributed message-passing system which provides a real-time timer. The efficiency of an implementation is measured by the worst-case response time res_time(op) for each operation op of the implemented objects. We show the following results under the assumption that all message delays are in the range [d-u,d] for some constants d and u (0 u d). We first present an implementation of deterministic objects with res_time(opa)=u for any ack-type operation opa and res_time(opv)=2d for any val-type operation opv, where an ack-type operation is an operation which always returns a unique response and a val-type operation is an operation which is not ack-type. We also consider an implementation of FIFO queues, which have two kinds of operations, enq(v) and deq. We show that, for any implementation of FIFO queues, (1) res_time(enq(v)) u(n-1)/n holds for some v where n is the number of processes, and (2) res_time(deq) d+u/2 holds in the case of u (2/3)d.