Takahiro INOUE Tetsuo MOTOMURA Ryoko MATSUO Fumio UENO
New OTA-based analog circuits for realizing fuzzy membership functions and maximum (MAX) and minimum (MIN) operations are proposed. The synthesis of these circuits based on a bounded-difference operation and their SPICE simulations are described.
Kazuaki MURAKAMI Morihiro KUGA Oubong GWUN Shinji TOMITA
Superscalar processors can improve uniprocessor performance further byond RISC performance by exploiting spatial instruction-level parallelism. Superscalar processor design presents more opportunities for tradeoffs than conventional RISC design. In order to utilize processor resources augmented by the superscalar approaches, processors must be carefully designed and implemented. This paper examines the various aspects of superscalar processors and discusses the design features and tradeoffs. Specific aspects of superscalar processors that are examined include: instruction fetch boundary, instruction-cache line crossing, branch prediction, data-hazard resolution, control-hazard resolution, and precise or imprecise interrupts. This paper uses a superscalar simulator that modeled a DDU (Dynamically-hazard-resolved, Dynamic-code-scheduled, Uniform) superscalar architecture, called SIMP (Single Instructions stream/Multiple instruction Pipelining), and evaluate many different SIMP hardware organizations. This paper concludes that a superscalar processor can increase the performance with major five hardwary features: instruction aligning, branch prediction with branch-target buffer, code scheduling, speculative execution with conditional mode, and imprecise interrupts. However, the first three functions are claimed to be performed by compilers rather than by hardware.
Suguru YAMAGUCHI Kiyohiko OKAYAMA Hideo MIYAHARA
In a large scale distributed environment or large open networks like WIDE Internet which is an academic and reserch network in Japan, the authentication system is the fundamental building block for providing security mechanisms. We have developed a trusted third-party authentication system called SPLICE|AS for the WIDE Interet. The authetication protocol adopted in SPLICE|AS is based on the public-key encryptosystem, originally proposed by Needham. We made several extensions to detct some sort of security attacks like replay attacks which were not considered in the original Needham's approach. Furthermore, the domain-based management scheme and protocol extensions are introduced to our system since management principals are scatterd across the WIDE Internet. The whole network is logically subdivided into several domains based on network management policies, and each domain is managed by a single authentication server. Then, the domain concept is applied in a hierarchical manner to provide the inter-domain access. An authentication server existing in an upper domain authorizes and controls inter-domain accesses between subdomains. This paper describes the design of SPLICE|AS, and its implementatins.
Tetsuya MATSUMURA Masahiko YOSHIMOTO Atsushi MAEDA Yasutaka HORIBA
This paper describes a high-performance reconfigurable line memory macrocell for video signal processing ASICs. The macrocell features a three-transistor memory cell array with a divided word line structure for write word lines. The transistor size of the memory cell has been determined by analyzing access time to achieve a more than 50 MHz throughput rate for various aspect ratios. A testing circuit has been embedded in the macrocell, which offers the video-rate testing and high fault coverage with a minimum circuit count. Moreover the macrocell has high reconfigurability of word-length, bit-width and aspect ratio. A 1152 words8 bits line memory has been implemented experimentally using 1.0 µm CMOS technology. As a result, 60 MHz operation has been observed, allowing real time processing of HDTV signal. By applying the macrocells to HDTV system LSIs, the reconfigurability and usefulness of the testing circuits have been verified.
Kunihiko ISHIMA Shuji TSUKIYAMA
In this letter, we consider the following problem: Given a weighted digraph G=[V, EaEb] such that subgraph Ga=[V, Ea] of G is an acyclic graph with a single source vertex and the weight of each edge in Eb is negative, if G has a positive cycle, then locate them as many as possible; otherwise, compute the longest path length from source vertex to each vertex of V. Then, we propose an algorithm to this problem and show some experimental results to demonstrate the efficiency of the proposed algorithm.
Yuliang ZHENG Tsutomu MATSUMOTO Hideki IMAI
A challenge-and-response type identification protocol consists of three moves of messages between a prover and a verifier: Move-1--The prover claims to the verifier that his/her identity is ID. Move-2--The verifier challenges the prover with a question related to the ID. Move-3--The prover responds with the answer of the question. The verifier accepts the prover if the answer is correct. The main contribution of this paper is to show that the folklore can be made provably secure under the sole assumption of the existence of one-way functions.
Hironori YAMAUCHI Hiroshi MIYANAGA
Some dedicated floating-point hardware arithmetic modules designed as processing elements for butterfly operations are described. They consist of Input Data Converters (IDC), Output Data Converters (ODC), and a 2's complementary 24-bit (16E8) floating-point Butterfly Execution Unit (BEU). The BEU executes the four multiplication and six additions/subtractions required for a complex butterfly operation in each 25-ns execution cycle by implementing four multipliers and four 3-input adders/subtracters. The arithmetic modules are fabricated using 0.8-µm CMOS technology. An overview of the hardware unit is presented with special attention given to the BEU for parallel pipelined processing. In addition, module design methodologies for hardware implementation and some sophisticated high-speed execution techniques for floating-point multiplication and addition are discussed.
Ryuichi YAMAGUCHI Joel BONEY Douglas DUSCHATKO Tetsuya TANAKA Jiro MIYAKE Yoshito NISHIMICHI Hisakazu EDAMATSU Shigeru WATARI Shigeo KUNINOBU
This paper describes the performance evaluation and its methodology of a two-level cache which is composed of internal caches of a 64-bit RISC microprocessor, an external cache, and a write buffer. The internal caches of this processor consist of a 6 Kbyte instruction cache and a 2 Kbyte date cache. We configured the two-level cache adding an external 256 Kbyte cache and a write buffer to this processor and analyzed the system performance. Cache behavior is analyzed by using direct measurement as well as software simulation. Direct measurement can shorten the time for a cache analysis compared with software simulation and, by using software simulation together with direct measurement, it becomes possible to model various configurations easily. We used SPEC benchmarks for the performance analysis. Results show that the external cache and the write buffer are effective in increasing the system performance.
Takehisa HAYASHI Toshio DOI Mikio YAMAGISHI Kazuo KOIDE Akira ISHIYAMA Masataka HIRAMATSU Akira YAMAGIWA
A 64 b CMOS mainframe execution unit macrocell with error detecting circuits is proposed. The conventional techniques to maintain high reliability have been the parity checking and the duplication of the ALU (Arithmetic Logic Unit). However, the required time for generating the parity from the sum output of the ALU has been undesirable for high-speed operation. In order to achieve a short ALU delay time, a parity predicting logic structure is newly adopted. By utilizing this structure, a one-bit-error detecting function is integrated without duplicating the every ALU circuit. A novel CMOS precharged circuit is also developed to shorten the time required to precharge the whole circuit. When the number of circuit stages is reduced, the precharge time as well as the delay time restricts the ALU cycle time. This new circuitry solves the precharging time accumulation problem in the conventional circuits. A 64 b BCD ALU adopting this technology has been designed and fabricated. The parity predict architecture and the high-speed-precharge circuit have been effective in reducing the delay time by 23% and the precharge time by 42%. A 30% faster cycle time has been achieved with a small increase (4%) in ALU area. The execution unit macrocell, which includes the ALU described above, contains 45 k transistors and it's area is 4.3 mm4.1 mm using the 0.8 µm CMOS triple metal layer technology.
Kenichi YASUDA Kiyohiro FURUTANI Atsushi MAEDA Shoichi WAKANO Hiroshi NAKASHIMA Yasutaka TAKEDA Michihiro YAMADA
We have newly developed a VLSI intelligent cache memory chip which constitutes one processor element of a Parallel Inference Machine (PIM/m) system. This cache memory chip contains 610 k transistors including 80 kbits memory cells. The chip measures 14.47 mm14.84 mm and is fabricated by using 1.0µm CMOS double metal technology. The cache memory chip implements a hardware support called "Trail Buffer" which is suitable for the execution of logic programming languages. We have determined the cache memory size by practical simulation taking the relationship between the chip size and hitratio of the cache memory into consideration. The scan test method and the special commands to access every memory cell are applied to enhance the testability. This chip itself operates at a cycle time of 30 MHz. The typical power consumption is 2.5 W with a 5.5 V power supply at 16.7 MHz operation. With this cache memory chip, the CPU board of the PIM/m is now tuned for 16.7 MHz operation and has attained 1.5 MLIPS (logical inference per second), which is the highest performance as an inference machine in the world.
Masao MIZUKAMI Yasuo MIKAMI Osamu MATSUBARA Yoichi SATOH Koichi SUDOH
This paper describes circuit techniques of a 5 ns, 4 kw9 b embedded RAM for standard cell ASICs applying 0.8 µm pure CMOS triple metal technology. The design goals of the above techniques were high speed, low power consumption, and access time stability even when the RAM configuration is changed in word and bit numbers. A one-chip 4,096-channel time switch VLSI for digital switching systems is also described as an example of application of these RAMs to standard cell CMOS ASICs. This chip has 600 mW power consumption during 32 MHz operation.
Saed SAMADI Akinori NISHIHARA Nobuo FUJII
The group-delay sensitivity is studied theoretically for Gray and Markel allpass lattice filter realizing complex transfer function. Recursive expressions which were derived for phase and group-delay in real lattice are rederived for the complex case. These expressions are used to obtain upper bounds on group-delay sensitivity. The minimum number of frequencies where group-delay sensitivity becomes zero is discussed. Results corresponding real allpass lattice are also shown. Phase sensitivity properties of these filters are analyzed and compared with existing results. A new bound on phase sensitivity is also obtained.
Fumito SATO Noriaki YOSHIKAI Motoo HOSHI
As the telecommunications network evolves to provide a greater variety of services and incorporate new technologies under the increasing multi-vendor environment, it has become important to ensure that the network evolves gracefully. As part of the endeavor to ensure this graceful evolution, Nodal System Architecture has been studied and applied. It aims to standardize the interface between communications systems, called modules in the Architecture, located in the same building. This paper describes the requirements that are considered in specifying Nodal System Architecture. Namely, the paper describes the classification of network functionality for determining functions allocation to modules, and the needs for defining the conditions for being a module, and estimates the volumes of three types of information that flow through inter-module interfaces for determining the protocol structure to be used for each type of information. Finally, the paper presents examples of inter-module links for different node sizes.
This paper describes a new method of state extimation for the energy stochastic system with decibel observation mechanism. The problem here is to get the decibel-valued estimate of the energy state variable through the decibel-valued noisy observation data, where it is usual that the stochastic system is physically driven on energy scale. The main attention is paid to the adjustment between the energy quantity at the physical countermeasure side and the decibel quantity at the human evaluation side. The basic principle of state estimation is based on the Bayes' theorem which can be applicable to any non-Gaussian and/or non-linear nature of the real stochastic system. Then, it is expanded into the suitable form adapted to successive decibel-valued observation. Thus, based on the mutual relation between energy and decibel statistics, any kinds of statistics connected with Lx evaluation at the human side can be estimated by using this decibel-valued noisy observation data (Lx is defined in the acoustics field as the (100-x)% point of the sound-level distribution and it is often used as the environmental noise assessment standard because man's sense of hearing is very sensitive to the end of the sound-level distribution form). Finally, the validity and the effectiveness of the proposed method have been confirmed by application to the actually obtained room acoustics data.
Fumihide HATTORI Kazunori SHIRAISHI Nobuo SHIGENO Koichi GEN-EI Koyu CHINEN
A 1.3 µm DFB-LD module was developed for multi-channel AM transmission systems. The relationships between laser characteristics and carrier to noise ratio (CNR), composite second order distortion (CSO) and composite triple beat (CTB) in 42-channel analog AM transmission experiments using a 15km single-mode fiber were investigated. To achieve a CSO value lower than --62dBc the laser l-L curve linearity should be less than 5%. The relation between CSO and the number of second order intermodulation products was theoretically and experimentally investigated. To achieve a CNR bigger than 52dB (4MHz), the relative intensity noise (RIN) value of the laser should be less than --155dB/Hz as predicted in a theoretical model. The lowest CSO and CTB values were obtained by optimizing the laser bias current at a certain modulation depth. There was a trade-off between CSO, CTB and CNR in optimizing the laser bias current and modulation depth.
Katsuyuki KANEKO Ichiro OKABAYASHI Shingo KARINO Yasuhiro NAKAKURA Tetsuji KISHI Manabu MIGITA
VLSI implementation of the network for a highly parallel computer system by three ASIC chip-set and some related results are described. The chip-set consists of two network-component chips, BMU and SRC, with which the crossbar-like network of arbitrary size can be realized, and a versatile network controller, TCU. New FIFO circuit, inter-chip pipelined data transmission scheme and novel cooperation method between computation and communication called 'FIFO emulation' were introduced in conjunction with communication overheads in parallel processing to enhance effective computing performance and actual transfer rate. The chip set employs synchronous 9 bit bus with peak date transmission rate of 20 MB/sec/channel. All chips are fabricated in 1.2 µm two layer AL N-well CMOS technology, containing 350 k, 25 k and 165 k transistors, respectively. Experiments shows that measured communication overhead is less than 30% where the ratio of computing in MFLOPS and communication in Mword/sec was 1, and 'FIFO emulation' scheme reduces communication time by 21% in the same case.
Hideo TAMAMOTO Tatsumi OHTAKA Yuichi NARITA
We considered the random testing for memories, such that addresses are randomly generated but every cell is evenly accessed, and analyzed the fault detection probability. As the result, the random testing turned out to be an effective testing method for the BIST of embedded memories.
Kaoru ARAKAWA Yasuhiko ARAKAWA
A novel digital signal processing technique fuzzy filtering is proposed for estimating nonstationary signals with ambiguous changes, which are contaminated by additive white Gaussian noises. In this filter, fuzzy clustering is utilized for classifying signal components into groups in which the signal characteristics are considered to be similar. Since the boundary between the signal groups is ambiguous, the fuzzy clustering produces a better effect than crisp clustering. Moreover, robust characteristics are obtained for various values of the parameters and types of processed signals. Computer simulations successfully demonstrate its superior capability of filtering.
Takashi SEKI Mitsunori SAITO Mitsunobu MIYAGI
Porous anodic alumina films were studied to realize artificial metal-dielectric composites for polarizers. By the improved anodizing and electroplating procedures, nickel was deposited successfully as thick as50 µm in the pores. Optical transmittance of the film was measured in the wavelength range of 0.63-1.55 µm and a notable polarizing function was confirmed. The extinction ratio and the insertion loss were evaluated to be 41 dB and 1.3 dB, respectively, at the wavelength of 1.55 µm.