IEICE global.ieice.org Site

Keyword Search Result

[Keyword] content addressable memory(24hit)

1-20hit(24hit)

Reducing Aging Effects on Ternary CAM
Ing-Chao LIN Yen-Han LEE Sheng-Wei WANG

PAPER-Integrated Electronics

Vol:
E99-C No:7
Page(s):
878-891
Ternary content addressable memory (TCAM), which can store 0, 1, or X in its cells, is widely used to store routing tables in network routers. Negative bias temperature instability (NBTI) and positive bias temperature instability (PBTI), which increase Vth and degrade transistor switching speed, have become major reliability challenges. This study analyzes the signal probability of routing tables. The results show that many cells retain static stress and suffer significant degradation caused by NBTI and PBTI effects. The bit flipping technique is improved and proactive power gating recovery is proposed to mitigate NBTI and PBTI effects. In order to maintain the functionality of TCAM after bit flipping, a novel TCAM cell design is proposed. Simulation results show that compared to the original architecture, the bit flipping technique improves read static noise margin (SNM) for data and mask cells by 16.84% and 29.94%, respectively, and reduces search time degradation by 12.95%. The power gating technique improves read SNM for data and mask cells by 12.31% and 20.92%, respectively, and reduces search time degradation by 17.57%. When both techniques are used, read SNM for data and mask cells is improved by 17.74% and 30.53%, respectively, and search time degradation is reduced by 21.01%.
Design a Fast CAM-Based Exact Pattern Matching System on FPGA and 0.18µm CMOS Process
Duc-Hung LE Katsumi INOUE Cong-Kha PHAM

LETTER-VLSI Design Technology and CAD

Vol:
E96-A No:9
Page(s):
1883-1888
A CAM-based matching system for fast exact pattern matching is implemented on a hardware system with FPGA and ASIC. The system has a simple structure, and does not employ any Central Processor Unit (CPU) as well as complicated computations. We take advantage of Content Addressable Memory (CAM) which has an ability of parallel multi-match mode for designing the system. The system is applied to fast pattern matching with various required search patterns without using search principles. In this paper, the authors present a CAM-based system for fast exact pattern matching on 2-D data.
A Fast Power Estimation Method for Content Addressable Memory by Using SystemC Simulation Environment
Kun-Lin TSAI I-Jui TUNG Feipei LAI

PAPER-VLSI Design Technology and CAD

Vol:
E96-A No:8
Page(s):
1723-1729
Content addressable memory is widely used for fast lookup table data searching, but it often consumes considerable power. Moreover, designing the suitable content addressable memory architecture for a specific application also consumes lots of time, since the behavioral simulation is often done in the transistor level. SystemC is a system-level modeling language and simulation platform, providing high simulation efficiency for hardware software co-design. Unfortunately, SystemC does not provide the function for estimating power dissipation of a structure design. In this paper, a SystemC-based fast content addressable memory power estimation method is presented for estimating the power dissipation of the match-line circuit, the search-line circuit, and the storage cell array of content addressable memory in the early design stage. The mathematical equations and behavioral patterns are used as the inputs of power estimation model. The simulation results based on 10 Mibench benchmarks show that the simulation time of the proposed method is in average 1233 times faster than that of HSPICE simulator with only 3.51% error rate.
An FPGA-Based Information Detection Hardware System Employing Multi-Match Content Addressable Memory
Duc-Hung LE Katsumi INOUE Masahiro SOWA Cong-Kha PHAM

PAPER-VLSI Design Technology and CAD

Vol:
E95-A No:10
Page(s):
1708-1717
A new information detection method has been proposed for a very fast and efficient search engine. This method is implemented on hardware system using FPGA. We take advantages of Content Addressable Memory (CAM) which has an ability of matching mode for designing the system. The CAM blocks have been designed using available memory blocks of the FPGA device to save access times of the whole system. The entire memory can return multi-match results concurrently. The system operates based on the CAMs for pattern matching, in a parallel manner, to output multiple addresses of multi-match results. Based on the parallel multi-match operations, the system can be applied for pattern matching with various required constraint conditions without using any search principles. The very fast multi-match results are achieved at 60 ns with the operation frequency 50 MHz. This increases the search performance of the information detection system which uses this method as the core system.
FPS-RAM: Fast Prefix Search RAM-Based Hardware for Forwarding Engine
Kazuya ZAITSU Koji YAMAMOTO Yasuto KURODA Kazunari INOUE Shingo ATA Ikuo OKA

PAPER-Network System

Vol:
E95-B No:7
Page(s):
2306-2314
Ternary content addressable memory (TCAM) is becoming very popular for designing high-throughput forwarding engines on routers. However, TCAM has potential problems in terms of hardware and power costs, which limits its ability to deploy large amounts of capacity in IP routers. In this paper, we propose new hardware architecture for fast forwarding engines, called fast prefix search RAM-based hardware (FPS-RAM). We designed FPS-RAM hardware with the intent of maintaining the same search performance and physical user interface as TCAM because our objective is to replace the TCAM in the market. Our RAM-based hardware architecture is completely different from that of TCAM and has dramatically reduced the costs and power consumption to 62% and 52%, respectively. We implemented FPS-RAM on an FPGA to examine its lookup operation.
Design of an 8-nsec 72-bit-Parallel-Search Content-Addressable Memory Using a Phase-Change Device
Satoru HANZAWA Takahiro HANYU

PAPER-Integrated Electronics

Vol:
E94-C No:8
Page(s):
1302-1310
This paper presents a content-addressable memory (CAM) using a phase-change device. A hierarchical match-line structure and a one-hot-spot block code are indispensable to suppress the resistance ratio of the phase-change device and the area overhead of match detectors. As a result, an 8-nsec 72-bit-parallel-search CAM is implemented using a phase-change-device/MOS-hybrid circuitry, where high and low resistances are higher than 2.3 MΩ and lower than 97 kΩ, respectively, while maintaining one-day retention.
A New TCAM Architecture for Managing ACL in Routers
Haesung HWANG Shingo ATA Koji YAMAMOTO Kazunari INOUE Masayuki MURATA

PAPER-Network

Vol:
E93-B No:11
Page(s):
3004-3012
Ternary Content Addressable Memory (TCAM) is a special type of memory used in routers to achieve high-speed packet forwarding and classification. Packet forwarding is done by referring to the rules written in the routing table, whereas packet classification is performed by referring to the rules in the Access Control List (ACL). TCAM uses more transistors than Random Access Memory (RAM), resulting in high power consumption and high production cost. Therefore, it is necessary to reduce the entries written in the TCAM to reduce the transistor count. In this paper, we propose a new TCAM architecture by using Range Matching Devices (RMD) integrated within the TCAM's control logic with an optimized prefix expansion algorithm. The proposed method reduces the number of entries required to express ACL rules, especially when specifying port ranges. With less than 10 RMDs, the total number of lines required to write port ranges in the TCAM can be reduced to approximately 50%.
Routing Table Compaction for TCAM-Based IP Address Lookup
Pi-Chung WANG Yi-Ting FANG Tzung-Chian HUANG

LETTER-Network

Vol:
E93-B No:5
Page(s):
1272-1275
In this work, we propose a scheme of routing table compaction for IP forwarding engines based on ternary content addressable memory (TCAM). Our scheme transforms the original routing table into a form with only disjoint prefixes. The most prevalent next hop of the routing table is then calculated and the route prefixes corresponding to the next hop are replaced by one TCAM entry. In combination with Espresso-II logic minimization algorithm, the proposed scheme reduces the TCAM storage requirements by more than 75% compared to the original routing tables. We also present an effective approach to support incremental updates.
Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Takayuki GYOHTEN Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO

PAPER

Vol:
E91-C No:9
Page(s):
1409-1418
This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.
Race-Free Mixed Serial-Parallel Comparison for Low Power Content Addressable Memory
Seong-Ook JUNG Sei-Seung YOON

LETTER-VLSI Design Technology and CAD

Vol:
E91-A No:3
Page(s):
895-898
This letter presents a race-free mixed serial-parallel comparison (RFMSPC) scheme which uses both serial and parallel CAMs in a match line. A self-reset search line scheme for the serial CAM is proposed to avoid the timing race problem and additional timing penalties. Various 32 entry CAMs are designed using 90 nm 1.2 V CMOS process to verify the proposed RFMSPC scheme. It shows that the RFMSPC saves power consumption by 40%, 53% and 63% at the cost of a 4%, 6% and 16% increase in search time according to 1, 2, and 4 serial CAM bits in a match line.
Effective Bit Selection Methods for Improving Performance of Packet Classifications on IP Routers
Gang QIN Shingo ATA Ikuo OKA Chikato FUJIWARA

PAPER-Switching for Communications

Vol:
E90-B No:5
Page(s):
1090-1097
This paper investigates fast Packet Classification techniques, where a large routing table is divided into many much smaller tables by an index key at first; the resulting small tables are much easier to search. A traditional way is to use the front bits as the index key, but we show it's not an effective way to divide a routing table. In this paper, we propose three bit selection methods for division. They can be implemented by CAM or hash structure. Simulations show that the bit selection methods decrease the delay of classification 50% compared to the traditional method. We also propose an optimized method which is adapted to the biased traffic pattern, which shows 70% improvement in our simulation.
Real-Time Huffman Encoder with Pipelined CAM-Based Data Path and Code-Word-Table Optimizer
Takeshi KUMAKI Yasuto KURODA Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO

PAPER-Image Processing and Video Processing

Vol:
E90-D No:1
Page(s):
334-345
This paper presents a novel optimized real-time Huffman encoder using a pipelined data path based on CAM technology and a parallel code-word-table optimizer. The exploitation of CAM technology enables fast parallel search of the code word table. At the same time, the code word table is optimized according to the frequency of received input symbols and is up-dated in real-time. Since these two functions work in parallel, the proposed architecture realizes fast parallel encoding and keeps a constantly high compression ratio. Evaluation results for the JPEG application show that the proposed architecture can achieve up to 28% smaller encoded picture sizes than the conventional architectures. The obtained encoding time can be reduced by 95% in comparison to a conventional SRAM-based architecture, which is suitable even for the latest end-user-devices requiring fast frame-rates. Furthermore, the proposed architecture provides the only encoder that can simultaneously realize small compressed data size and fast processing speed.
Scalable FPGA/ASIC Implementation Architecture for Parallel Table-Lookup-Coding Using Multi-Ported Content Addressable Memory
Takeshi KUMAKI Yutaka KONO Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH

PAPER-Image Processing and Video Processing

Vol:
E90-D No:1
Page(s):
346-354
This paper presents a scalable FPGA/ASIC implementation architecture for high-speed parallel table-lookup-coding using multi-ported content addressable memory, aiming at facilitating effective table-lookup-coding solutions. The multi-ported CAM adopts a Flexible Multi-ported Content Addressable Memory (FMCAM) technology, which represents an effective parallel processing architecture and was previously reported in [1]. To achieve a high-speed parallel table-lookup-coding solution, FMCAM is improved by additional schemes for a single search mode and counting value setting mode, so that it permits fast parallel table-lookup-coding operations. Evaluation results for Huffman encoding within the JPEG application show that a synthesized semi-custom ASIC implementation of the proposed architecture can already reduce the required clock-cycle number by 93% in comparison to a conventional DSP. Furthermore, the performance per area unit, measured in MOPS/mm2, can be improved by a factor of 3.8 in comparison to parallel operated DSPs. Consequently, the proposed architecture is very suitable for FPGA/ASIC implementation, and is a promising solution for small area integrated realization of real-time table-lookup-coding applications.
Hierarchical Multi-Chip Architecture for High Capacity Scalability of Fully Parallel Hamming-Distance Associative Memories
Yusuke OIKE Makoto IKEDA Kunihiro ASADA

PAPER

Vol:
E87-C No:11
Page(s):
1847-1855
In this paper, we present a hierarchical multi-chip architecture which employs fully digital and word-parallel associative memories based on Hamming distance. High capacity scalability is critically important for associative memories since the required database capacity depends on the various applications. A multi-chip structure is most efficient for the capacity scalability as well as the standard memories, however, it is difficult for the conventional nearest-match associative memories. The present digital implementation is capable of detecting all the template data in order of the exact Hamming distance. Therefore, a hierarchical multi-chip structure is simply realized by using extra register buffers and an inter-chip pipelined priority decision circuit hierarchically embedded in multiple chips. It achieves fully chip- and word-parallel Hamming distance search with no throughput decrease, additional clock latency of O(log P), and inter-chip wires of O(P) in a P-chip structure. The feasibility of the architecture and circuit implementation has been demonstrated by post-layout simulations. The performance has been also estimated based on measurement results of a single-chip implementation.
A Hardware/Software Cosynthesis System for Processor Cores with Content Addressable Memories
Nozomu TOGAWA Takao TOTSUKA Tatsuhiko WAKUI Masao YANAGISAWA Tatsuo OHTSUKI

PAPER

Vol:
E86-A No:5
Page(s):
1082-1092
Content addressable memory (CAM) is one of the functional memories which realize word-parallel equivalence search. Since a CAM unit is generally used in a particular application program, we consider that appropriate design for CAM units is required depending on the requirements for the application program. This paper proposes a hardware/software cosynthesis system for CAM processors. The input of the system is an application program written in C including CAM functions and a constraint for execution time (or CAM processor area). Its output is hardware descriptions of a synthesized processor and a binary code executed on it. Based on the branch-and-bound method, the system determines which CAM function is realized by a hardware and which CAM function is realized by a software with meeting the given timing constraint (or area constraint) and minimizing the CAM processor area (or execution time of the application program). We expect that we can realize optimal CAM processor design for an application program. Experimental results for several application programs show that we can obtain a CAM processor whose area is minimum with meeting the given timing constraint.
CAM Processor Synthesis Based on Behavioral Descriptions
Nozomu TOGAWA Tatsuhiko WAKUI Tatsuhiko YODEN Makoto TERAJIMA Masao YANAGISAWA Tatsuo OHTSUKI

PAPER-Co-design and High-level Synthesis

Vol:
E83-A No:12
Page(s):
2464-2473
CAM (Content Addressable Memory) units are generally designed so that they can be applied to variety of application programs. However, if a particular application runs on CAM units, some functions in CAM units may be often used and other functions may never be used. We consider that appropriate design for CAM units is required depending on the requirements for a given application program. This paper proposes a CAM processor synthesis system based on behavioral descriptions. The input of the system is an application program written in C including CAM functions, and its output is hardware descriptions of a synthesized processor and a binary code executed on it. Since the system determines functions in CAM units and synthesizes a CAM processor depending on the requirements of an application program, we expect that a synthesized CAM processor can execute the application program with small processor area and delay. Experimental results demonstrate its efficiency and effectiveness.
CAM-Based Array Converter for URR Floating-Point Arithmetic
Kuei-Ming LU Keikichi TAMARU

PAPER-Computer Applications

Vol:
E81-D No:10
Page(s):
1120-1130
In order to lessen overflow or underflow problem in numerical computation, several new floating-point arithmetics have been proposed. The significant advantage of these new arithmetics is that a number can be represented in a wider range since the fields of exponent and mantissa are changed depending on the magnitude of number. The main issues of these arithmetics are how to find the boundary between exponent and mantissa as well as to convert the formats between new floating-point arithmetic and fixed-point arithmetic quickly. In this paper, a CAM-based array converter based on the Universal Representation of Real number (URR) floating-point arithmetic is described. Using match retrieval device CAM, the detection of the boundary can be accomplished faster than conventional circuits. Arranging the basic cells into iterative array structure, the fast separation/connection operation is achieved. The speed, area and power consumption of the converter is estimated.
CAM-Based Highly-Parallel Image Processing Hardware
Takeshi OGURA Mamoru NAKANISHI

INVITED PAPER

Vol:
E80-C No:7
Page(s):
868-874
This paper describes content addressable memory (CAM) -based hardware that serves as a highly parallel, compact and real-time image-processing system. The novel concept of a highly-parallel integrated circuits and system (HiPIC), in which a large-capacity CAM tuned for parallel data processing is a key element, is introduced. Several hardware algorithms for highly-parallel image processing based on a HiPIC with a CAM are presented in order to demonstrate that the HiPIC concept is effective for compact and real-time image processing. Two kinds of HiPIC-dedicated CAM have been developed. One is embedded on a 0.5-µm CMOS gate array. An embedded CAM up to 64 kbit and logic up to 40 kgate can be integrated on a single chip. The other is a 0.5-µm CMOS full-custom CAM LSI tuned for parallel data processing. A fully-parallel 336-kbit CAM LSI has been successfully developed. The HiPIC concept and CAM-based hardware described here promises to be an important step towards the realization of a compact and real-time image-processing system.
A CAM-Based Parallel Fault Simulation Algorithm with Minimal Storage Size
Shinsuke OHNO Masao SATO Tatsuo OHTSUKI

PAPER

Vol:
E78-A No:12
Page(s):
1755-1764
CAMs (Content Addressable Memories) are functional memories which have functions such as word-parallel equivalence search, bilateral 1-bit data shifting between consecutive words, and word-parallel writing. Since CAMs can be integrated because of their regular structure, massively parallel CAM functions can be executed. Taking advantage of CAMs, Ishiura and Yajima have proposed a parallel fault simulation algorithm using a CAM. This algorithm, however, requires a large amount of CAM storage to simulate large-scale circuits. In this paper, we propose a new massively parallel fault simulation algorithm requiring less CAM storage, and compare it with Ishiura and Yajima's algorithm. Experimental results of the algorithm on CHARGE --the CAM-based hardware engine developed in our laboratory--are also reported.
A Flexible Search Managing Circuitry for High-Density Dynamic CAMs
Takeshi HAMAMOTO Tadato YAMAGATA Masaaki MIHARA Yasumitsu MURAI Toshifumi KOBAYASHI Hideyuki OZAKI

PAPER-General Technology

Vol:
E77-C No:8
Page(s):
1377-1384
New circuit techniques were proposed to realize a high-density and high-performance content addressable memory (CAM). A dynamic register which functions as a status flag, and some logic circuits are organically combined and flexibly perform complex search operations, despite the compact layout area. Any kind of logic operations for the search results, that are AND, OR, INVERT, and the combinations of them, can be implemented in every word simultaneously. These circuits are implemented in an experimental 288 kbit dynamic CAM using 0.8 µm CMOS process technology. We consider these techniques to be indispensable for high-density and high-performance dynamic CAM.

1-20hit(24hit)

Keyword Search Result

[Keyword] content addressable memory(24hit)

Reducing Aging Effects on Ternary CAM

Design a Fast CAM-Based Exact Pattern Matching System on FPGA and 0.18µm CMOS Process

A Fast Power Estimation Method for Content Addressable Memory by Using SystemC Simulation Environment

An FPGA-Based Information Detection Hardware System Employing Multi-Match Content Addressable Memory

FPS-RAM: Fast Prefix Search RAM-Based Hardware for Forwarding Engine

Design of an 8-nsec 72-bit-Parallel-Search Content-Addressable Memory Using a Phase-Change Device

A New TCAM Architecture for Managing ACL in Routers

Routing Table Compaction for TCAM-Based IP Address Lookup

Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor

Race-Free Mixed Serial-Parallel Comparison for Low Power Content Addressable Memory

Effective Bit Selection Methods for Improving Performance of Packet Classifications on IP Routers

Real-Time Huffman Encoder with Pipelined CAM-Based Data Path and Code-Word-Table Optimizer

Scalable FPGA/ASIC Implementation Architecture for Parallel Table-Lookup-Coding Using Multi-Ported Content Addressable Memory

Hierarchical Multi-Chip Architecture for High Capacity Scalability of Fully Parallel Hamming-Distance Associative Memories

A Hardware/Software Cosynthesis System for Processor Cores with Content Addressable Memories

CAM Processor Synthesis Based on Behavioral Descriptions

CAM-Based Array Converter for URR Floating-Point Arithmetic

CAM-Based Highly-Parallel Image Processing Hardware

A CAM-Based Parallel Fault Simulation Algorithm with Minimal Storage Size

A Flexible Search Managing Circuitry for High-Density Dynamic CAMs

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles