IEICE global.ieice.org Site

Keyword Search Result

[Keyword] FIFO(11hit)

1-11hit

A SOI Multi-V_DD Dual-Port SRAM Macro for Serial Access Applications
Nobutaro SHIBATA Mayumi WATANABE Takako ISHIHARA

PAPER-Integrated Electronics

Vol:
E100-C No:11
Page(s):
1061-1068
Multiport SRAMs are frequently installed in network and/or telecommunication VLSIs to implement smart functions. This paper presents a high speed and low-power dual-port (i.e., 1W+1R two-port) SRAM macro customized for serial access operations. To reduce the wasted power dissipation due to subthreshold leakage currents, the supply voltage for 10T memory cells is lowered to 1 V and a power switch is prepared for every 64 word drivers. The switch is activated with look-ahead decoder-segment activation logic, so there is no penalty when selecting a wordline. The data I/O circuitry with a new column-based configuration makes it possible to hide the bitline precharge operation with the sensing operation in the read cycle ahead of it; that is, we have successfully reduced the read latency by a half clock cycle, resulting in a pure two-stage pipeline. The SRAM macro installed in a 4K-entry × 33-bit FIFO memory, fabricated with a 0.3-µm fully-depleted-SOI CMOS process, achieved a 500-MHz operation in the typical conditions of 2- and 1-V power supplies, and 25°C. The power consumption during the standby time was less than 1.0 mW, and that at a practical operating frequency of 400 MHz was in a range of 47-57 mW, depending on the bit-stream data pattern.
Toward Concurrent Lock-Free Queues on GPUs
Xiangyu ZHANG Yangdong DENG Shuai MU

LETTER-Fundamentals of Information Systems

Vol:
E97-D No:7
Page(s):
1901-1904
General purpose computing on GPU (GPGPU) has become a popular computing model for high-performance, data-intensive applications. Accordingly, there is a strong need to develop highly efficient data structures to ease the development of GPGPU applications. In this work, we proposed an efficient concurrent queue data structure for GPU computing. The GPU based provably correct, lock-free FIFO queue allows a massive number of concurrent producers and consumers. Warp-centric en-queue and de-queue procedures are introduced to better match the underlying Single-Instruction, Multiple-Thread execution model of modern GPUs. It outperforms the best previous GPU queues by up to 40 fold. The correctness of the proposed queue operations is formally validated by linearizability criteria.
Design and Demonstration of a Single-Flux-Quantum Multi-Stop Time-to-Digital Converter for Time-of-Flight Mass Spectrometry
Kyosuke SANO Yuki YAMANASHI Nobuyuki YOSHIKAWA

PAPER

Vol:
E97-C No:3
Page(s):
182-187
We have been developing a superconducting time-of-flight mass spectrometry (TOF-MS) system, which utilizes a superconductive strip ion detector (SSID) and a single-flux-quantum (SFQ) multi-stop time-to-digital converter (TDC). The SFQ multi-stop TDC can measure the time intervals between multiple input signals and directly convert them into binary data. In this study, we designed and implemented 24-bit SFQ multi-stop TDCs with a 3×24-bit FIFO buffer using the AIST Nb standard process (STP2), whose time resolution and dynamic range are 100ps and 1.6ms, respectively. The timing jitter of the TDC was investigated by comparing two types of TDCs: one uses an on-chip SFQ clock generator (CG) and the other uses a microwave oscillator at room temperature. We confirmed the correct operation of both TDCs and evaluated their timing jitter. The experimentally-obtained timing jitter is about 40ns and 700ps for the TDCs with and without the on-chip SFQ CG, respectively, for the measured time interval of 50µs, which linearly increases with increase of the measured time interval.
Deterministic Packet Buffer System with Multi FIFO Queues for the Advanced QoS
Hisashi IWAMOTO Yuji YANO Yasuto KURODA Koji YAMAMOTO Shingo ATA Kazunari INOUE

PAPER-Network System

Vol:
E96-B No:7
Page(s):
1819-1825
Network traffic keeps increasing due to the increasing popularity of video streaming services. Routers and switches in wire-line networks require guaranteed line rates as high as 20 Gbp/s as well as advanced quality of service (QoS). Hybrid SRAM and DRAM architecture previously presented with the benefit of high-speed and high-density, but it requires complex memory management. As a result, it has hardly supported large numbers of queue, which is an effective approach to satisfying the QoS requirements. This paper proposes an intelligent memory management unit (MMU) which is based on the hybrid architecture, where over 16k multi queues are integrated. The performance examined by the system board is zero-packet loss under the seamless traffic with 60–1.5 kByte packet-length (deterministic manner). Noticeable feature in this paper's architecture is eliminating the need for any premium memories but only low-cost commodity SRAMs and DRAMs are used. The intelligent MMU employs the head buffer architecture, which is suitable for supporting a large numbers of FIFO queues. An experimental board based on this architecture is embedded into a Router system to evaluate the performance. Using 16k queues at 20 Gbps, zero-packet loss is examined with 64-Byte to 1,500-Byte packet-length.
Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures
Changwoo MIN Hyung Kook JUN Won Tae KIM Young Ik EOM

LETTER

Vol:
E95-D No:12
Page(s):
2956-2957
A concurrent FIFO queue is a widely used fundamental data structure for parallelizing software. In this letter, we introduce a novel concurrent FIFO queue algorithm for multicore architecture. We achieve better scalability by reducing contention among concurrent threads, and improve performance by optimizing cache-line usage. Experimental results on a server with eight cores show that our algorithm outperforms state-of-the-art algorithms by a factor of two.
Two-Level FIFO Buffer Design for Routers in On-Chip Interconnection Networks
Po-Tsang HUANG Wei HWANG

PAPER-VLSI Design Technology and CAD

Vol:
E94-A No:11
Page(s):
2412-2424
The on-chip interconnection network (OCIN) is an integrated solution for system-on-chip (SoC) designs. The buffer architecture and size, however, dominate the performance of OCINs and affect the design of routers. This work analyzes different buffer architectures and uses a data-link two-level FIFO (first-in first-out) buffer architecture to implement high-performance routers. The concepts of shared buffers and multiple accesses for buffers are developed using the two-level FIFO buffer architecture. The proposed two-level FIFO buffer architecture increases the utilities of the storage elements via the centralized buffer organization and reduces the area and power consumption of routers to achieve the same performance achieved by other buffer architectures. Depending on a cycle-accurate simulator, the proposed data-link two-level FIFO buffer can realize performance similar to that of the conventional virtual channels, while using 25% of the buffers. Consequently, the two-level FIFO buffer can achieve about 22% power reduction compared with the similar performance of the conventional virtual channels using UMC 65 nm CMOS technology.
Power-Efficient LDPC Decoder Architecture Based on Accelerated Message-Passing Schedule
Kazunori SHIMIZU Tatsuyuki ISHIKAWA Nozomu TOGAWA Takeshi IKENAGA Satoshi GOTO

PAPER-VLSI Architecture

Vol:
E89-A No:12
Page(s):
3602-3612
In this paper, we propose a power-efficient LDPC decoder architecture based on an accelerated message-passing schedule. The proposed decoder architecture is characterized as follows: (i) Partitioning a pipelined operation not to read and write intermediate messages simultaneously enables the accelerated message-passing schedule to be implemented with single-port SRAMs. (ii) FIFO-based buffering reduces the number of SRAM banks and words of the LDPC decoder based on the accelerated message-passing schedule. The proposed LDPC decoder keeps a single message for each non-zero bit in a parity check matrix as well as a classical schedule while achieving the accelerated message-passing schedule. Implementation results in 0.18 [µm] CMOS technology show that the proposed decoder architecture reduces an area of the LDPC decoder by 43% and a power dissipation by 29% compared to the conventional architecture based on the accelerated message-passing schedule.
A Low Latency Asynchronous FIFO Combining a Wave Pipeline with a Handshake Scheme
Jeong-Gun LEE Suk-Jin KIM Jeong-A LEE Kiseon KIM

PAPER-VLSI Design Technology and CAD

Vol:
E88-A No:4
Page(s):
1031-1037
This paper presents a new asynchronous FIFO design to reduce forward latency in a linear structure. The operation mode for each cell can be reconfigured dynamically as either of the two schemes, wave pipelining or handshaking, according to the data flow in the FIFO. The adoption of wave pipelining to the conventional self-timed FIFO can reduce the overhead of the handshaking as well as latching control in each stage. Initial pre-layout simulations indicate about two times of improvement on latency performance over a state-of-art asynchronous FIFO, while retaining its throughput.
Performance Evaluation and Fairness Improvement of TCP over ATM GFR in FIFO-Based Mechanisms
Yong-Gu JEON Hong-Shik PARK

PAPER-Switching

Vol:
E84-B No:8
Page(s):
2227-2236
Recently, the Guaranteed Frame Rate (GFR) service was proposed as a new service category of ATM to support non-realtime data applications and to provide the minimum rate guarantee. To keep the simplicity of GFR as much as possible and overcome defects of FIFO-based mechanisms, we propose a FIFO-based algorithm extending DFBA one to improve the fairness and provide the minimum rate guarantee for a wider range of Minimum Cell Rate (MCR). The key idea is controlling the number of CLP1 cells which are occupying more buffer space than the fair share even when the queue length is below Low Buffer Occupancy (LBO).
A Partial Order Semantics for FIFO-Nets
Cinzia BERNARDESCHI Nicoletta De FRANCESCO Gigliola VAGLINI

PAPER-Automata,Languages and Theory of Computing

Vol:
E81-D No:8
Page(s):
773-782
In this work, we give a true concurrency semantics for FIFO-nets, that are Petri nets in which places behave as queues, tokens take values in a finite alphabet and the firing of a transition depends on sequences on the alphabet. We introduce fn-processes to represent the concurrent behavior of a FIFO-net N during a sequence of transition firings. Fn-processes are modeled by a mapping from a simple FIFO-net without queue sharing and cycles, named FIFO-occurrence net, to N. Moreover, the relation among the firings expressed by the FIFO-occurrence net has been enriched by an ordering relation among the elements of the FIFO-occurrence net representing values entered into a same queue of N. We give a way to build fn-processes step by step in correspondance with a sequence of transition firings and the fn-processes operationally built are all those abstractly defined. The FIFO-occurrence nets of fn-processes have some interesting properties; for example, such nets are always discrete and, consequently, there is at least a transition sequence corresponding to each fn-process.
Efficient Linearizable Implementation of Shared FIFO Queues and General Objects on a Distributed System
Michiko INOUE Toshimitsu MASUZAWA Nobuki TOKURA

PAPER

Vol:
E81-A No:5
Page(s):
768-775
We consider linearizable implementations of shared FIFO queues and general deterministic objects on a distributed message-passing system which provides a real-time timer. The efficiency of an implementation is measured by the worst-case response time res_time(op) for each operation op of the implemented objects. We show the following results under the assumption that all message delays are in the range [d-u,d] for some constants d and u (0 u d). We first present an implementation of deterministic objects with res_time(opa)=u for any ack-type operation opa and res_time(opv)=2d for any val-type operation opv, where an ack-type operation is an operation which always returns a unique response and a val-type operation is an operation which is not ack-type. We also consider an implementation of FIFO queues, which have two kinds of operations, enq(v) and deq. We show that, for any implementation of FIFO queues, (1) res_time(enq(v)) u(n-1)/n holds for some v where n is the number of processes, and (2) res_time(deq) d+u/2 holds in the case of u (2/3)d.

Keyword Search Result

[Keyword] FIFO(11hit)

A SOI Multi-V_DD Dual-Port SRAM Macro for Serial Access Applications

Toward Concurrent Lock-Free Queues on GPUs

Design and Demonstration of a Single-Flux-Quantum Multi-Stop Time-to-Digital Converter for Time-of-Flight Mass Spectrometry

Deterministic Packet Buffer System with Multi FIFO Queues for the Advanced QoS

Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures

Two-Level FIFO Buffer Design for Routers in On-Chip Interconnection Networks

Power-Efficient LDPC Decoder Architecture Based on Accelerated Message-Passing Schedule

A Low Latency Asynchronous FIFO Combining a Wave Pipeline with a Handshake Scheme

Performance Evaluation and Fairness Improvement of TCP over ATM GFR in FIFO-Based Mechanisms

A Partial Order Semantics for FIFO-Nets

Efficient Linearizable Implementation of Shared FIFO Queues and General Objects on a Distributed System

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles