Shigehiko USHIJIMA Hiroyuki ICHIKAWA Katsunori NORITAKE Naoya WATANABE
We propose a hardware-based packet forwarder for multi-gigabit IP backbone networks. The conventional Internet deploys routers as a key block, but its software-controlled architecture makes it hard to scale up the packet forwarders, especially for table-lookup processes. We propose introducing a pure connectionless (CL) switching approach with a hardware-based forwarder to construct the core part of a scalable IP multi-gigabit backbone. Compared to a software-based forwarder, the table-lookup time is reduced to 100 ns by using content-addressable memory. This hardware-based pipeline implementation easily achieves a maximum forwarding performance of up to 9. 6-Gbps, or 23 million packets per second, for applications ranging from traditional best-effort IP applications to newly emerging time-critical ones. We also consider additional processing when transferring IP packets to enhance best-effort quality. This is done using selective packet-level discarding, including early packet discard and its enhancement, to achieve minimum bandwidth guaranteed service at the packet level. We discuss the IP backbone scalability issue from the viewpoint of new IP-forwarder technologies, paying special attention to connection-oriented (CO) vs. CL switching and hardware vs. software implementation. A pure CL switching solution consisting of a CL server (CLS) and a CL client (CLC) is proposed to balance the hardware- and software-based CL transport functions. As a first step to this solution, a compact CLS has been developed. It supports 600-Mbps throughput and up to 9. 6-Gbps forwarding power using a modular architecture. It was evaluated in an ATM field trial using an experimental network. The results show the effectiveness of our approach to providing enhanced best effort services.
Mitsuru MARUYAMA Naohisa TAKAHASHI Takeshi MIEI Tsuyoshi OGURA Tetsuo KAWANO Satoru YAGI
A parallel IP router that uses off-the-shelf wor-kstations and interconnecting switches is presented. This router, called CORErouter-I, is a medium-grained, functionally distributed parallel system consisting of four kinds of processors for routing, routing-table searching, servicing, and line interfacing. Also discussed are issues related to the implementation of CORErouter-I, especially in terms of routing protocol processing and packet-forwarding. Performance characteristics of CORErouter-I are also clarified through several experiments performed to evaluate maximum throughput, analyze packet-forwarding time, and estimate the effect of parallel processing on the route-flapping problem.
Hiroaki NISHI Ken-ichiro ANJO Tomohiro KUDOH Hideharu AMANO
JUMP-1 is currently under development by seven Japanese universities to establish techniques for building an efficient distributed shared memory on a massively parallel processor. It provides a coherent cache with reduced hierarchical bit-map directory scheme to achieve cost effective and high performance management. Messages for coherent cache are transferred through a fat tree on the RDT (Recursive Diagonal Torus) interconnection network. RDT router supports versatile functions including multicast and acknowledge combining for the reduced hierarchical bit-map directory scheme. By using 0.5µm BiCMOS SOG technology, it can transfer all packets synchronized with a unique CPU clock (50MHz). Long coaxial cables (4m at maximum) are directly driven with the ECL interface of this chip. Using the dual port RAM, packet buffers allow to push and pull a flit of the packet simultaneously.
Shigeo MATSUZAWA Ken'ichi NAGAMI Akiyoshi MOGI Tatsuya JINMEI Hiroshi ESAKI Yasuhiro KATSUBE
Overview of Cell Switch Router (CSR) and the CSR prototype system are described. CSR can simultaneously support both connection oriented IP flows and connectionless IP flows. CSR contains cell switch fabric and IP packet switch fabric to achieve high throughput IP forwarding. IP packets are forwarded either through a cut-thru packet transmission, in which packet are forwarded without reassembling IP packet nor IP header processing, or through a conventional hop-by-hop IP packet forwarding. This paper describes and proposes the mechanism to forward the connectionless IP packet flows at the CSR. A CSR prototype system has been developed. The CSR prototype system uses PVC connections to transfer the IP packets. With the CSR prototype system, we can make sure that CSR system can achieve a high throughput, i.e., 2.4 Gbps aggregated throughput. For end-to-end TCP/IP packet transmission, more than 90 Mbps can be achieved and realtime video transmission (30 Mbps video) can be achieved.
Akira WATANABE Yuuji KOUI Shoichiro SENO Tetsuo IDEGUCHI
We propose an architecture of a high-speed internetworking device using central control method. Co-operations of hardware and software is required to realize high relay performance. For the hardware, we have designed an original bus arbitration control method to achieve a high throughput of a data bus. For the software, we have devided a normal relay processing from other processing and built it as a basic function of the monitor. By this method, relay perfomance improves dramatically, because of a multiple effect of the reduction of software overheads and the improvement of cache hit ratio. We have developed the prototype device and confirmed the effects of the proposed method.
Andrew FLAVELL Yoshizo TAKAHASHI
We propose a new high-performance message router for k-ary n-cube multicomputer systems, called the Tokky router. The router utilizes a small number of queues at the outputs of its communication ports to allow fully adaptive routing, misrouting to prevent deadlocks and randomization to prevent livelock. Uncongeste network performance is improved by the inclusion of the packet expressway. Accurate models are developed to predict the switch and buffer performance of routers for varying radix and dimension and these models can be used in the design of routers for networks other than those investigated here. The simulated performance of the router exceeds that of published results for oblivious routers and is equal to or exceeds those reported for other adaptive routers. These performance predictions are especially encouraging when the simplicity of the control structures required to implement the router are taken into consideration.
Masayuki HAYASHI Shuji TSUKIYAMA
In this paper, we propose a hybrid hierarchical global router for multi-layer VLSI's, which executes routing and layering simultaneously. This novel approach, a hybrid hierarchical global router, is a combination of a topdown and a bottomup hierarchical routers, and may be one of interesting routing techniques. We also show experimental results, which demonstrate the superiority of the hybrid hierarchical approach. This approach may have many possibilities to be used in a various fields.
Masahiko TOYONAGA Chie IWASAKI Yoshiaki SAWADA Toshiro AKINO
We present a new multi-layer over-the-cell channel router for standard cell layout design using simulated annealing. This new approach, STANZA-M consists of two key features. The first key feature of our router is a new scheme for simulated annealing in which we use a cost function to evaluate both the total net-length and the channel heights, and an effective simulated annealing process by a limited range to obtain an optimal chnnel wiring in practical time. The second feature of our router is a basic layer assignment procedure in which we assign all horizontal wiring inside a channel to feasible layers by considering the height of channel including cell region with a one dimensional channel compaction process. We implemented our three-layer cannel router in C language on a Solbourne Series 5 Work Station (22 MIPS). Experimental results for benchmarks such as Deutsch's Difficult Example and MCNC's PRIMARY1 channel routing problems indicate that STANZA-M can achieve superior results compared to the conventional routers, and the process times are very fast despite the use of simulated annealing.
Mototaka KURIBAYASHI Masaaki YAMADA Takashi MITSUHASHI Nobuyuki GOTO
A fast and efficient heuristic hierarchical global router for Sea-of-Gates(SOG) with embedded macro-blocks is described. The key point in the method is carry out a new optimal domain decomposition scheduling at every hierarchical level. This scheduling is intended to avoid macro-block-through wirings and to reduce wiring congestion near macro-blocks which may occur at lower levels. The new global router yielded superior results compared with previous hierarchical routers and a non-hierarchical maze router by evaluating with several actual SOG circuits including a 300K gate master chip and benchmark data supplied from MCNC. Overflows were reduced to one-half or one-quarter for macro-block embedded data compared with previous hierarchical routers. Concerning the running time, the router remarkably outperformed the non-hierarchical maze router, which took more than 390 times longer time for the tested large data.
Takashi SHIMAMOTO Isao SHIRAKAWA Hidetaka HANE Nobuyasu YUI Nobuyuki NISHIGUCHI
A distributed processing system is described, which is dedicated to multilayer SOG routing. The system is constructed of global and detailed routers, each based on different rip-up and rerouting procedures, so as to be run on a computer network composed of a number of workstations. Several implementation results attained for five-layer SOG are also shown to reveal the practicability of the system.
Masayuki HAYASHI Hiroyoshi YAMAZAKI Shuji TSUKIYAMA Nobuyuki NISHIGUCHI
We propose a hierarchical multi-layer global router for Sea-Of-Gates VLSI's, which is different from the conventional global routers, in that routing and layering are executed simultaneously. The main problems to be solved in the global routing for a multi-layer VLSI are which wire segments are laid out on upper layers and how they are connected to terminals located on lower layers. The main objective is to minimize the maximum of local congestions of all layers. We solve these problems in a hierarchical manner by routing from upper layers to lower layers.