This paper presents an architecture for a table-lookup (TLU) engine that allows the real-time operation of complicated TLU for telecommunications, such as the longest prefix match (LPM) and the long-bit match in packet classification. The engine consists of many CAM (Content Addressable Memory) chips, which are classified into several groups. When actual TLU is performed, the entries in each CAM group are searched simultaneously, and the best entry candidate in each group is selected by an intra-group arbiter. The final output, the entry desired, is decided by an inter group arbiter that selects one group. This hierarchical structure of arbitration is the key to the scalability of the engine. To accelerate the operation speed of the engine, we introduce a novel mechanism called "hit-flag look-ahead" that sends a hit-flag signal from each matched CAM chip to the inter group arbiter before each intra group arbiter calculates the best CAM output in the group. We show that a TLU engine based on the above architecture achieves significantly fast performance compared to engines based on conventional techniques, especially in the case of a large number of entries with long-bit matching. Furthermore, our architecture can realize an 33.3 Mlps (lookups per second) within a 128 bit 300,000-entry table at wire speed.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Tsunemasa HAYASHI, Toshiaki MIYAZAKI, "FLASH: Fast and Scalable Table-Lookup Engine Architecture for Telecommunications" in IEICE TRANSACTIONS on Information,
vol. E85-D, no. 10, pp. 1636-1644, October 2002, doi: .
Abstract: This paper presents an architecture for a table-lookup (TLU) engine that allows the real-time operation of complicated TLU for telecommunications, such as the longest prefix match (LPM) and the long-bit match in packet classification. The engine consists of many CAM (Content Addressable Memory) chips, which are classified into several groups. When actual TLU is performed, the entries in each CAM group are searched simultaneously, and the best entry candidate in each group is selected by an intra-group arbiter. The final output, the entry desired, is decided by an inter group arbiter that selects one group. This hierarchical structure of arbitration is the key to the scalability of the engine. To accelerate the operation speed of the engine, we introduce a novel mechanism called "hit-flag look-ahead" that sends a hit-flag signal from each matched CAM chip to the inter group arbiter before each intra group arbiter calculates the best CAM output in the group. We show that a TLU engine based on the above architecture achieves significantly fast performance compared to engines based on conventional techniques, especially in the case of a large number of entries with long-bit matching. Furthermore, our architecture can realize an 33.3 Mlps (lookups per second) within a 128 bit 300,000-entry table at wire speed.
URL: https://global.ieice.org/en_transactions/information/10.1587/e85-d_10_1636/_p
Copy
@ARTICLE{e85-d_10_1636,
author={Tsunemasa HAYASHI, Toshiaki MIYAZAKI, },
journal={IEICE TRANSACTIONS on Information},
title={FLASH: Fast and Scalable Table-Lookup Engine Architecture for Telecommunications},
year={2002},
volume={E85-D},
number={10},
pages={1636-1644},
abstract={This paper presents an architecture for a table-lookup (TLU) engine that allows the real-time operation of complicated TLU for telecommunications, such as the longest prefix match (LPM) and the long-bit match in packet classification. The engine consists of many CAM (Content Addressable Memory) chips, which are classified into several groups. When actual TLU is performed, the entries in each CAM group are searched simultaneously, and the best entry candidate in each group is selected by an intra-group arbiter. The final output, the entry desired, is decided by an inter group arbiter that selects one group. This hierarchical structure of arbitration is the key to the scalability of the engine. To accelerate the operation speed of the engine, we introduce a novel mechanism called "hit-flag look-ahead" that sends a hit-flag signal from each matched CAM chip to the inter group arbiter before each intra group arbiter calculates the best CAM output in the group. We show that a TLU engine based on the above architecture achieves significantly fast performance compared to engines based on conventional techniques, especially in the case of a large number of entries with long-bit matching. Furthermore, our architecture can realize an 33.3 Mlps (lookups per second) within a 128 bit 300,000-entry table at wire speed.},
keywords={},
doi={},
ISSN={},
month={October},}
Copy
TY - JOUR
TI - FLASH: Fast and Scalable Table-Lookup Engine Architecture for Telecommunications
T2 - IEICE TRANSACTIONS on Information
SP - 1636
EP - 1644
AU - Tsunemasa HAYASHI
AU - Toshiaki MIYAZAKI
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E85-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2002
AB - This paper presents an architecture for a table-lookup (TLU) engine that allows the real-time operation of complicated TLU for telecommunications, such as the longest prefix match (LPM) and the long-bit match in packet classification. The engine consists of many CAM (Content Addressable Memory) chips, which are classified into several groups. When actual TLU is performed, the entries in each CAM group are searched simultaneously, and the best entry candidate in each group is selected by an intra-group arbiter. The final output, the entry desired, is decided by an inter group arbiter that selects one group. This hierarchical structure of arbitration is the key to the scalability of the engine. To accelerate the operation speed of the engine, we introduce a novel mechanism called "hit-flag look-ahead" that sends a hit-flag signal from each matched CAM chip to the inter group arbiter before each intra group arbiter calculates the best CAM output in the group. We show that a TLU engine based on the above architecture achieves significantly fast performance compared to engines based on conventional techniques, especially in the case of a large number of entries with long-bit matching. Furthermore, our architecture can realize an 33.3 Mlps (lookups per second) within a 128 bit 300,000-entry table at wire speed.
ER -