The search functionality is under construction.

Author Search Result

[Author] Kazunari INOUE(10hit)

1-10hit
  • Deterministic Packet Buffer System with Multi FIFO Queues for the Advanced QoS

    Hisashi IWAMOTO  Yuji YANO  Yasuto KURODA  Koji YAMAMOTO  Shingo ATA  Kazunari INOUE  

     
    PAPER-Network System

      Vol:
    E96-B No:7
      Page(s):
    1819-1825

    Network traffic keeps increasing due to the increasing popularity of video streaming services. Routers and switches in wire-line networks require guaranteed line rates as high as 20 Gbp/s as well as advanced quality of service (QoS). Hybrid SRAM and DRAM architecture previously presented with the benefit of high-speed and high-density, but it requires complex memory management. As a result, it has hardly supported large numbers of queue, which is an effective approach to satisfying the QoS requirements. This paper proposes an intelligent memory management unit (MMU) which is based on the hybrid architecture, where over 16k multi queues are integrated. The performance examined by the system board is zero-packet loss under the seamless traffic with 60–1.5 kByte packet-length (deterministic manner). Noticeable feature in this paper's architecture is eliminating the need for any premium memories but only low-cost commodity SRAMs and DRAMs are used. The intelligent MMU employs the head buffer architecture, which is suitable for supporting a large numbers of FIFO queues. An experimental board based on this architecture is embedded into a Router system to evaluate the performance. Using 16k queues at 20 Gbps, zero-packet loss is examined with 64-Byte to 1,500-Byte packet-length.

  • Embedded Low-Power Dynamic TCAM Architecture with Transparently Scheduled Refresh

    Hideyuki NODA  Kazunari INOUE  Hans Jurgen MATTAUSCH  Tetsushi KOIDE  Katsumi DOSAKA  Kazutami ARIMOTO  Kazuyasu FUJISHIMA  Kenji ANAMI  Tsutomu YOSHIHARA  

     
    PAPER-Memory

      Vol:
    E88-C No:4
      Page(s):
    622-629

    This paper describes a dynamic TCAM architecture with planar complementary capacitors, transparently scheduled refresh (TSR), autonomous power management (APM) and address-input-free writing scheme. The complementary cell structure of the planar dynamic TCAM (PD-TCAM) allows small cell size of 4.79 µm2 in 130 nm CMOS technology, and realizes stable TCAM operation even with very small storage capacitance. Due to the TSR architecture, the PD-TCAM maintains functional compatibility with a conventional SRAM-based TCAM. The combined effects of the compact PD-TCAM array matrix and the APM technique result in up to 50% reduction of the total power consumption during search operation. In addition, an intelligent address-input-free writing scheme is also introduced to facilitate the PD-TCAM application for the user. Consequently the proposed architecture is quite attractive for realizing compact and low-power embedded TCAM macros for the design of system VLSI solutions in the field of networking applications.

  • A New TCAM Architecture for Managing ACL in Routers

    Haesung HWANG  Shingo ATA  Koji YAMAMOTO  Kazunari INOUE  Masayuki MURATA  

     
    PAPER-Network

      Vol:
    E93-B No:11
      Page(s):
    3004-3012

    Ternary Content Addressable Memory (TCAM) is a special type of memory used in routers to achieve high-speed packet forwarding and classification. Packet forwarding is done by referring to the rules written in the routing table, whereas packet classification is performed by referring to the rules in the Access Control List (ACL). TCAM uses more transistors than Random Access Memory (RAM), resulting in high power consumption and high production cost. Therefore, it is necessary to reduce the entries written in the TCAM to reduce the transistor count. In this paper, we propose a new TCAM architecture by using Range Matching Devices (RMD) integrated within the TCAM's control logic with an optimized prefix expansion algorithm. The proposed method reduces the number of entries required to express ACL rules, especially when specifying port ranges. With less than 10 RMDs, the total number of lines required to write port ranges in the TCAM can be reduced to approximately 50%.

  • A Low-Voltage 42.4 G-BPS Single-Ended Read-Modify-Write Bus and Programmable Page-Size on a 3D Frame-Buffer

    Kazunari INOUE  Hideaki ABE  Kaori MORI  Shuji FUKAGAWA  

     
    PAPER

      Vol:
    E83-C No:2
      Page(s):
    195-204

    Various kinds of high bandwidth architecture using the embedded DRAM technology have been presented previously. In most cases, they use wide bus implementation and/or fast bus speed, that both have the penalty of die area and much power consumption at the same time. The proposing single-ended read-modify-write bus increases the bandwidth twice as high, while it maintains the same bus size and the same bus speed. The data-bus comprises 1 k-bit read-bus and 1 k-bit write-bus that each works concurrently, and has amplitude from 0 V to 1 V, hence the measured power consumption is only 0.3 W at a frequency of 166 MHz. A programmable page-size reduces the page miss-rate and efficiently improves the bandwidth that is comparable to the wide bus and fast speed approach. All the proposing features are implemented on a 3D frame-buffer to achieve 42.4 G-BPS bandwidth.

  • High-Speed Design of Conflictless Name Lookup and Efficient Selective Cache on CCN Router

    Atsushi OOKA  Shingo ATA  Kazunari INOUE  Masayuki MURATA  

     
    PAPER-Network

      Vol:
    E98-B No:4
      Page(s):
    607-620

    Content-centric networking (CCN) is an innovative network architecture that is being considered as a successor to the Internet. In recent years, CCN has received increasing attention from all over the world because its novel technologies (e.g., caching, multicast, aggregating requests) and communication based on names that act as addresses for content have the potential to resolve various problems facing the Internet. To implement these technologies, however, requires routers with performance far superior to that offered by today's Internet routers. Although many researchers have proposed various router components, such as caching and name lookup mechanisms, there are few router-level designs incorporating all the necessary components. The design and evaluation of a complete router is the primary contribution of this paper. We provide a concrete hardware design for a router model that uses three basic tables — forwarding information base (FIB), pending interest table (PIT), and content store (CS) — and incorporates two entities that we propose. One of these entities is the name lookup entity, which looks up a name address within a few cycles from content-addressable memory by use of a Bloom filter; the other is the interest count entity, which counts interest packets that require certain content and selects content worth caching. Our contributions are (1) presenting a proper algorithm for looking up and matching name addresses in CCN communication, (2) proposing a method to process CCN packets in a way that achieves high throughput and very low latency, and (3) demonstrating feasible performance and cost on the basis of a concrete hardware design using distributed content-addressable memory.

  • FPS-RAM: Fast Prefix Search RAM-Based Hardware for Forwarding Engine

    Kazuya ZAITSU  Koji YAMAMOTO  Yasuto KURODA  Kazunari INOUE  Shingo ATA  Ikuo OKA  

     
    PAPER-Network System

      Vol:
    E95-B No:7
      Page(s):
    2306-2314

    Ternary content addressable memory (TCAM) is becoming very popular for designing high-throughput forwarding engines on routers. However, TCAM has potential problems in terms of hardware and power costs, which limits its ability to deploy large amounts of capacity in IP routers. In this paper, we propose new hardware architecture for fast forwarding engines, called fast prefix search RAM-based hardware (FPS-RAM). We designed FPS-RAM hardware with the intent of maintaining the same search performance and physical user interface as TCAM because our objective is to replace the TCAM in the market. Our RAM-based hardware architecture is completely different from that of TCAM and has dramatically reduced the costs and power consumption to 62% and 52%, respectively. We implemented FPS-RAM on an FPGA to examine its lookup operation.

  • A CAM-Based Signature-Matching Co-processor with Application-Driven Power-Reduction Features

    Kazunari INOUE  Hideyuki NODA  Kazutami ARIMOTO  Hans Jurgen MATTAUSCH  Tetsushi KOIDE  

     
    PAPER-Integrated Electronics

      Vol:
    E88-C No:6
      Page(s):
    1332-1342

    A signature-matching co-processor in 130 nm CMOS technology for application in the network-security field is presented. Two key search technologies, implemented with fully-parallel CAM-based search cores, enable the removal of misused packets from Giga-bit-per-second (G-bps) networks in real-time without disturbing the normal network traffic. The first technology is a thorough search through packet header as well as payload in byte-shifting manner and is capable of detecting viruses, even if they are hidden at an arbitrary position within the packet. A 1.125 Mbit ternary CAM, operated at the speed of 125 Mega-searches per second (M-sps), integrates the primary lookup table for thorough packet search. The second technology applies an additional relational search with programmable logical operations to detect recently appearing more complicated misused packets. A small 192-bit binary CAM operated at 31.25 M-sps is also included for this purpose. Power dissipation, being a major concern of CAM-based application-specific LSIs, is addressed in the light of the signature-matching application, which has a high probability of multiple matches and which doesn't require to mask individual bits of the search word. Consequently, two application-driven power-reduction methods are implemented, namely an improved pipelined search for efficiently reducing power even in the case of a large number of multiple matches, and a search-line encoding for cutting search-line related power dissipation. As a result the signature-matching co-processor features low power dissipation between 0.4 W and 1.1 W for the best case and the worst case search configurations, respectively.

  • A 0.18 µm 32 Mb Embedded DRAM Macro for 3-D Graphics Controller

    Akira YAMAZAKI  Takeshi FUJINO  Kazunari INOUE  Isamu HAYASHI  Hideyuki NODA  Naoya WATANABE  Fukashi MORISHITA  Katsumi DOSAKA  Yoshikazu MOROOKA  Shinya SOEDA  Kazutami ARIMOTO  Setsuo WAKE  Kazuyasu FUJISHIMA  Hideyuki OZAKI  

     
    PAPER-Electronic Circuits

      Vol:
    E85-C No:9
      Page(s):
    1697-1708

    A 23.3 mm2 32 Mb embedded DRAM (eDRAM) macro has been fabricated using 0.18 µm triple-well 4-metal embedded DRAM process technology to realize an accelerated 3-D graphics controller. The array architecture, using a dual-port sense amplifier, achieves the column access latency of two cycles at 222 MHz and a peak data rate of 14.2 4 GB/s at 4 macros. The process cost has been kept low by using VT-MOS circuit technology and taking advantage of a characteristic of dual-gate oxide process technology. A tRAC of 11.6 ns at 2.0 V is achieved using a 'pre-detect redundancy' circuit.

  • A 250 Msps, 0.5 W eDRAM-Based Search Engine Dedicated Low Power FIB Application

    Hisashi IWAMOTO  Yuji YANO  Yasuto KURODA  Koji YAMAMOTO  Kazunari INOUE  Ikuo OKA  

     
    PAPER-Integrated Electronics

      Vol:
    E96-C No:8
      Page(s):
    1076-1082

    Ternary content addressable memory (TCAM) is popular LSI for use in high-throughput forwarding engines on routers. However, the unique structure applied in TCAM consume huge amounts of power, therefore it restricts the ability to handle large lookup table capacity in IP routers. In this paper, we propose a commodity-memory based hardware architecture for the forwarding information base (FIB) application that solves the substantial problems of power and density. The proposed architecture is examined by a fabricated test chip with 40 nm embedded DRAM (eDRAM) technology, and the effect of power reduction verified is greatly lower than conventional TCAM based and the energy metric achieve 0.01 fJ/bit/search. The power consumption is almost 0.5 W at 250 Msps and 8M entries.

  • A Programmable Geometry Processor with Enhanced Four-Parallel SIMD Type Processing Core for PC-Based 3D Graphics

    Hiroyuki KAWAI  Yoshitsugu INOUE  Junko KOBARA  Robert STREITENBERGER  Hiroaki SUZUKI  Hiroyasu NEGISHI  Masatoshi KAMEYAMA  Kazunari INOUE  Yasutaka HORIBA  Kazuyasu FUJISHIMA  

     
    PAPER-Integrated Electronics

      Vol:
    E85-C No:5
      Page(s):
    1200-1210

    This paper describes a kind of 3D graphics geometry processor architecture for high performance/cost 3D graphics, its application to a real chip, and the results of performance evaluation. In order to establish the high speed geometry processing, dedicated hardware is introduced for accelerating special operations, such as power calculations, clip tests, and program address generation. The dedicated hardware consists of a modified floating-point multiplier in a four-parallel SIMD processing core, a clip test unit, and an internal program address generation scheme optimized to geometry processing mode. Special instructions corresponding to the dedicated schemes are also defined and added. The parallelism of the SIMD core is adjusted to a geometry data structure. Employing dedicated hardware and software significantly accelerates these complicated operations deriving from geometry algorithms. The collaboration of the hardware design and the software design considerably reduces instruction step counts for complex processing. Two kinds of program are dealt with in the proposed architecture. One is a special case program containing few conditional jump instructions, and the other is a general case program combining many program routines. The proposed program address generation scheme provides the automatic selection of a program optimized to each geometry processing mode. By this program address generation scheme and the program types, the frequency of the conditional jump operations, that usually disturb a pipeline operation, are minimized under practical use. Additionally, the programmable design and this program address generation scheme facilitate the load balancing of the geometry calculations with the CPU. A programmable geometry processor was fabricated by using 0.35 µm CMOS process as an application of this architecture. One point three million transistors are integrated in a 11.84 12.07 mm2 die. The increase of the gate counts for all the dedicated hardware is a total of 24 K gates and is approximately only a 7.4% increase of the total gate count. This chip operates at 150 MHz, and achieves the processing performance of 5.8 M vertex/sec. The result shows that the proposed programmable architecture (ESIMD: Enhanced SIMD) is 2.3 times more cost effective than a programmable geometry LSI reported previously.