Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
In this paper, we propose a novel architecture for low-power direct-mapped instruction caches, called "history-based tag-comparison (HBTC) cache. " The cache attempts to reuse tag-comparison results for avoiding unnecessary tag checks. Execution footprints are recorded into an extended BTB (Branch Target Buffer). In our evaluation, it is observed that the energy for tag comparison can be reduced by more than 90% in many applications.
Kiyoshi FUKUCHI Kayato SEKIYA Risato OHHIRA Yutaka YANO Takashi ONO
A 1.6-Tb/s dense WDM signal was successfully transmitted over 480 km using the carrier-suppressed return-to-zero (CS-RZ) modulation format. The CS-RZ format was chosen because it exhibited better transmission performance over a wide fiber-input power window than the NRZ and RZ formats in a 40-Gb/s-based WDM transmission experiment with 100-GHz channel spacing, confirming its nonlinearity-insensitive nature in dense WDM systems. With the wide power window of CS-RZ, we achieved stable transmission of 4040-Gb/s WDM signals over a 480-km (680 km) standard SMF line with only the C-band, in which a spectral ripple remained during transmission. Distributed Raman amplification and forward error correction were not used, providing a margin for already installed transmission lines.
Hidehiro TAKATA Rei AKIYAMA Tadao YAMANAKA Haruyuki OHKUMA Yasue SUETSUGU Toshihiro KANAOKA Satoshi KUMAKI Kazuya ISHIHARA Atsuo HANAMI Tetsuya MATSUMURA Tetsuya WATANABE Yoshihide AJIOKA Yoshio MATSUDA Syuhei IWADE
An on-chip, 64-Mb, embedded, DRAM MPEG-2 encoder LSI with a multimedia processor has been developed. To implement this large-scale and high-speed LSI, we have developed the hierarchical skew control of multi-clocks, with timing verification, in which cross-talk noise is considered, and simple measures taken against the IR drop in the power lines through decoupling capacitors. As a result, the target performance of 263 MHz at 1.5 V has been successfully attained and verified, the cross-talk noise has been considered, and, in addition, it has become possible to restrain the IR drop to 166 mV in the 162 MHz operation block.
Hiroshi NAGAHASHI Mohamed IMINE
This paper develops a simple algorithm for calculating a polynomial curve or surface in a parallel way. The number of arithmetic operations and the necessary time for the calculation are evaluated in terms of polynomial degree and resolution of a curve and the number of processors used. We made some comparisons between our method and a conventional method for generating polynomial curves and surfaces, especially in computation time and approximation error due to the reduction of the polynomial degree. It is shown that our method can perform fast calculation within tolerable error.
Hiroshi KAWAGUCHI Gang ZHANG Seongsoo LEE Youngsoo SHIN Takayasu SAKURAI
An LSI has been fabricated and measured to demonstrate feasibility of VDD-hopping scheme in an embedded system level by executing MPEG4 CODEC. In the VDD-hopping, supply voltage of a processor is dynamically controlled by a hardware-software cooperative mechanism depending on workload of the processor. When the workload is about a half, the VDD-hopping is shown to reduce power to less than a quarter compared to the conventional fixed-VDD scheme. The power saving is achieved without degrading real-time features of MPEG4 CODEC.
Hyun Ho KIM Sang Joon AHN Tai Myoung CHUNG Young Ik EOM
The mobile computing system is a set of functions on a distributed environment organized to support mobile hosts. In this environment, mobile hosts should be able to move without any constraints and should remain connected to the network even while moving. Also, they should be able to get necessary information regardless of their current location and time. Distributed mutual exclusion methods for supporting distributed algorithms have hitherto been designed for networks only with static hosts. However, with the emergence of mobile computing environments, a new distributed mutual exclusion method needs to be developed for integrating mobile hosts with underlying distributed systems. In the sense, many issues that should be considered stem from three essential properties of mobile computing system such as wireless communication, portability, and mobility. Thus far, distributed mutual exclusion methods for mobile computing environments were designed based on a token ring structure, which has the drawback of requiring high costs in order to locate mobile hosts. In this paper, we propose not only a distributed mutual exclusion method that can reduce such costs by structuring the entire system as a tree-based logical structure but also recovery schemes that can be applied when a node failure occurs. Finally, we evaluate the operation costs for the mutual exclusion scheme and the recovery scheme.
Satoru YAMAGUCHI Keiichiro ITOH Yukiharu OHNO Yoshio SHIMODA Tsuyoshi HAYASHI Toshio ASHIDA Tetsuo MIKAZUKI
This paper describes an innovative, high-speed optical backboard bus composed of an optical star coupler, optical-transmitter modules, optical-receiver modules, and optical multi-mode glass fibers. A highly efficient optical coupling structure with an aspherical lens and a laser diode was designed to achieve a coupling efficiency of 90%, enabling distribution of optical signals at up to 1 Gb/s to 50 function boards. Embedded optical fibers in a printed circuit board were used to achieve precise control of the optical propagation delay times and permit a high packaging density. We developed small laser-diode and photo-diode modules suitable for optical coupling with the embedded fibers. A fabricated prototype optical backboard bus controlled by a controller IC mounted on a function board was able to successfully distribute high-speed optical signals to function boards with a high packaging density.
Peter M. KRUMMRICH Erich GOTTWALD Nancy E. HECKER Claus-Jorg WEISKE Andreas SCHOPFLIN Andreas FARBERT Klaus KOTTEN
Channel bit rates of 40 Gbit/s are the next step after 2.5 and 10 Gbit/s in the SONET/SDH hierarchy. They enable multi Tbit/s transmission of live traffic over a single fiber. All recent optical transmission records concerning aggregate capacitiy per fiber were achieved using this technology. Comparing the limiting effects of 2.5, 10 and 40 Gbit/s system configurations reveals that 40 Gbit/s allows for the longest regenerator free distance on NZDSF. In this paper we describe transmitter and receiver designs as well as results from field trials. The first trial demonstrated a transmission of live traffic with a record aggregate capacity of 3.2 Tbit/s, whereas the second successfully demonstrated a doubling of the channel capacity to 80 Gbit/s using polarization multiplexing with automated polarization control.
In 1998, Jan and Tseng proposed two integrated schemes of user authentication and access control which can be used to implement a protection system in distributed computer systems. This paper will analyze the security of both schemes and show that an intruder can easily forge a login, be accepted and logged in as a legal user, and access system resources. We will then propose a modified scheme to withstand our proposed attacks.
Media processing has become one of the dominant computing workloads. In this context, SIMD instructions have been introduced in current processors to raise performance, often the main goal of microprocessor designers. Today, however, designers have become concerned with the power consumption, and in some cases low power is the main design goal (laptops). In this paper, we show that SIMD ISA extensions on a superscalar processor can be one solution to reduce power consumption and keeping a high performance level. We reduce the average power consumption by decreasing the number of instructions, the number of cache references, and using dynamic power management to transform the speedup in performance in power consumption reduction.
Tatsuo TERUYAMA Tetsuo KAMADA Masashi SASAHARA Shardul KAZI
The strong demand for complex and high performance system-on-a-chip requires high performance microprocessor core and quick turn around design methodology. We have developed 128-bit synthesizable core processor and tile based quick turn around design methodology. It is 200 MHz MIPS compatible processor with 128-bit SIMD extension and is targeted for consumer electronics. We also developed an ASSP including the processor core, SDRAM controller, 2 PCI and 2 MAC mainly for network applications. For SOC development, we developed a tile based design methodology aiming at quick design convergence. The initial RTL design is synthesized and partitioned to several tiles by in-house tiling tool. It promises quick turn around from RTL design to tape out using the concurrency of the back-end design.
Hiroyuki TAKANO Takashi MIYAMORI Yasuhiro TANIGUCHI Yoshihisa KONDO
A 4GOPS 3 way-VLIW image recognition processor for an automobile system has been developed. The processor is based on a configurable and extensible media processor enabling optimization for a specific application by means of design-time configuration. Using VLIW coprocessor extension, the processor can satisfy the performance requirements of the system. Overhead by VLIW-mode instructions is only 7%. The VLIW co-processor occupies only 12% of the die area. Thus, good cost-performance for media processing in each embedded system can be achieved by this configurable media processor.
Sheng-He SUN Xiao-Dan MEI Zhao-Li ZHANG
A novel rough neural network (RNN) structure and its application are proposed in this paper. We principally introduce its architecture and training algorithms: the genetic training algorithm (GA) and the tabu search training algorithm (TSA). We first compare RNN with the conventional NN trained by the BP algorithm in two-dimensional data classification. Then we compare RNN with NN by the same training algorithm (TSA) in functional approximation. Experiment results show that the proposed RNN is more effective than NN, not only in computation time but also in performance.
Kouji WADA Kouichi NAKAGAWA Osamu HASHIMOTO Hiroshi HARADA
A simple method for improving out-of-band characteristics of a planar microwave filter is proposed. We clarify the close relationship among 'tap connection,' 'attenuation pole' and 'spurious responses' in filter design, theoretically and experimentally. Firstly, the basic characteristics of the resonator depending on the excitation method are examined. We show that skirt characteristics can be improved and spurious responses can be suppressed by using the tap connection technique. Secondly, the application examples of bandpass filters (BPFs) on the basis of the resonator with our principle are provided. It is confirmed that the resonator depending on the excitation method is useful for improving out-of-band characteristics of the planar microwave filter.
This paper proposes constructive timing-violation (CTV) and evaluates its potential. It can be utilized both for increasing clock frequency and for reducing energy consumption. Increasing clock frequency over that determined by the critical paths causes timing violations. On the other hand, while supply voltage reduction can result in substantial power savings, it also causes larger gate delay and thus clock must be slow down in order not to violate timing constraints of critical paths. However, if any tolerant mechanisms are provided for the timing violations, it is not necessary to keep the constraints. Rather, the violations would be constructive for high clock frequency or for energy savings. From these observations, we propose the CTV, which is supported by the tolerant mechanism based on contemporary speculative execution mechanisms. We evaluate the CTV using a cycle-by-cycle simulator and present its considerably promising potential.
Much has been said and written about the changes in analog IC technology such as shrinking line widths, vanishingly low supply voltages, severe power limitations, and digital noise. But beyond these technology changes and their subsequent methodology changes, a far more subtle revolution is happening in the nature of the profession itself. Technology, software, and product evolution have all conspired to create a new kind of analog IC designer, one very different from the IC designers of the past.
Sangook MOON Yong Joo LEE Jae Min PARK Byung In MOON Yong Surk LEE
A new approach on designing a finite field multiplier architecture is proposed. The proposed architecture trades reduction in the number of clock cycles with resources. This architecture features high performance, simple structure, scalability and independence on the choice of the finite field, and can be used in high security cryptographic applications such as elliptic curve crypto-systems in large prime Galois Fields (GF(2m)).
Jonggil LEE Hyunchul KANG Seung-Kuk CHOI
The jitter characteristics of synchronous residual time stamp (SRTS) method used in ATM adaptation layer type 1 (AAL1) are analyzed. In this letter, the root mean square amplitude of filtered SRTS jitter is calculated and the computer simulation has been carried out to show jitter of SRTS method considering also the phase time error of network clocks.
Yoshiharu FUJISAKU Masatoshi KAGAWA Toshio NAKAMURA Hitoshi MURAI Hiromi T. YAMADA Shigeru TAKASAKI Kozo FUJII
40 Gbit/s optical transceiver using a novel OTDM MUX module has been developed. OTDM (Optical-Time-Division-Multiplexing) MUX module, the core component of the transmitter, consisted of a optical splitter, two electro-absorption (EA) modulators and a combiner in a sealed small package. As the split optical paths run through the "air" in the module, greatly stable optical phase relation between bit-interleaved pulses could be maintained. With the OTDM MUX module, the selection between conventional Return-to-Zero (conventional-RZ) format and carrier-suppressed RZ (CS-RZ) format is performed by slightly changing the wavelength of laser-diode. In a receiver, 40 Gbit/s optical data train is optically demultiplexed to 10 Gbit/s optical train, before detected by the O/E receiver for 10 Gbit/s RZ format. Back-to-back MUX-DEMUX evaluations of the transceiver exhibited good sensitivities of under -30 dBm measured at 40 Gbit/s optical input to achieve the bit-error-rate (BER) of 10-9. Another unique feature of the transceiver system was a spectrum switch capability. The stable RZ and CS-RZ multiplexing operation was confirmed in the experiment. Once we adjust the 40 Gbit/s optical signal to CS-RZ format, the optical spectrum would maintain its CS spectrum shape for a long time to the benefit of the stable long transmission characteristics. In the recirculating loop experiment employing the OTDM MUX transceiver, the larger power margin was successfully observed with CS-RZ format than with conventional-RZ format, indicating that proper encoding of conventional-RZ and CS-RZ was realized with this prototype transceiver. In the case of CS-RZ format, the error free (BER < 10-9) transmission over 720 km was achieved with the long repeater amplifier span of 120 km.
Hiromitsu KIMURA Takahiro HANYU Michitaka KAMEYAMA
A new logic-in-memory circuit is proposed for a fine-grain pipelined VLSI system. Dynamic-storage elements are distributed over a logic-circuit plane. A functional pass gate is a key component, where a linear summation and threshold function are merged compactly using charge-storage and charge-coupling effect with a DRAM-cell-based circuit structure. The use of dynamic logic based on pass-transistor network using functional pass gates makes it possible to realize any logic circuits compactly with small power dissipation. As a typical example, a 54-bit pipelined multiplier is implemented by using the proposed circuit technology. Its power dissipation and chip area are reduced to about 63 percent and 72 percent, respectively, in comparison with those of a corresponding binary CMOS implementation under 0.35-µm CMOS technology.