Hiroshi ISHII Hiroaki NISHIKAWA Yuji INOUE
This paper discusses and clarifies effectiveness of data-driven implementation of protocol handling system to access TINA (Telecommunications Information Networking Architecture) network and internet. TINA is a networking architecture that achieves networking services and management ubiquitously for users and networks. Many TINA related ACTS (Advanced Communication Technologies and Services) projects have been organized in Europe. In Japan, The TINA Trial (TTT) to achieve ATM network management and services based on TINA architectures was done by NTT and several manufactures from April 1997 to April 1999. In these studies and trials, much effort is devoted to development of software based on service architecture and network architecture being standardized in TINA-C (TINA Consortium). In order to achieve TINA environment universally in customers and network sides, we have to consider how to deploy TINA environment onto user side and how to use access transmission capacity as efficiently as possible. Recent technology can easily achieve application and environment downloading from the network side to user side by use of e. g. , JAVA. In accessing the network, there are several possible bottlenecks in information exchange in customer side such as PC processing capability, access protocol handling capability, intra-house wiring bandwidth. Authors, in parallel with TINA software architecture study, have been studying versatile requirements for hardware platform of TINA network. In those studies, we have clarified that the stream-oriented data-driven processor authors have been studying and developing have high reliability, high multiprocessing and multimedia information processing capability. Based on these studies, this paper first shows Von Neumann-based protocol handler is ineffective in case of multiprocessing through mathematical and emulation studies. Then, we show our data-driven protocol handling can effectively realize access protocol handling by emulation study. Then, we describe a result of first step of implementation of data-driven TCP/IP protocol handling. This result proves our TCP/IP hub based on data-driven processor is applicable not only for TINA/CORBA network but normal internet access. Finally, we show a possible customer premises network configuration which resolves bottleneck to access TINA network through ATM access.
Makiko ITOH Yoshinori TAKEUCHI Masaharu IMAI Akichika SHIOMI
A synthesizable HDL generation method for pipelined processors is proposed. By using the proposed method, data-path and control logic descriptions of a target processor is generated from a clock based instruction set specification. From the experimental results, feasibility of the proposed method is evaluated and the amount of processor design time was drastically reduced than that of conventional RT level manual design in HDL.
Jing-ling YANG Chiu-sing CHOY Cheong-Fat CHAN
Detecting the stuck-at-pass faults in the event-driven latches is the main difficult in testing latch based asynchronous pipeline. In this paper we proposed a parallel test structure to ease this problem.
Katsuya SHINOHARA Norimasa OHTSUKI Yoshinori TAKEUCHI Masaharu IMAI
This paper proposes an ASIP performance optimization method taking clock frequency into account. The performance of an instruction set processor can be measured using the execution time of an application program, which can be determined by the clock cycles to perform the application program divided by the applied clock frequency. Therefore, the clock frequency should also be tuned in order to maximize the performance of the processor under the given design constraints. Experimental results show that the proposed method determines an optimal combination of FUs considering clock frequency.
This paper describes the design of a scalable pipelined memory buffer for a shared scalable buffer ATM switch. The memory architecture provides high speed and scalability, and eliminates the restriction of memory cycle time in a shared buffer ATM switch. It provides versatile performance in a shared buffer ATM switch using its scalability. The architecture consists of a 2-D array configuration of small memory banks. Increasing the array configuration enlarges the entire memory capacity. Maximum cycle time of a designed scalable memory is 4 ns. The designed memory is embedded in the prototype chip of a shared scalable buffer ATM switch with 4 4 configuration of 4160-bit SRAM memory banks. It is integrated in 0.6 µm double-metal single-poly CMOS technology.
Tadaaki KIMIJIMA Kiyoshi NISHIKAWA Hitoshi KIYA
In this paper we propose a new pipelined architecture for the LMS adaptive filter which can be implemented with less than half the amount of calculation needed for the conventional architectures. Even though the proposed architecture reduces the required calculation, it can simultaneously produce good convergence characteristics, a short latency and high throughput characteristics.
Ri-A JU Dong-Ho LEE Sang-Dae YU
This paper describes a 10-bit 40-MS/s pipelined A/D converter implemented in a 0.8-µm double-poly, double-metal CMOS process. This A/D converter achieves low power dissipation of 36-mW at 5-V power supply. A 1.5-bit/stage pipelined architecture allows large correction range for comparator offset, and performs fast interstage signal processing. For high speed and low power operation, the sample-and-hold amplifier is designed using op-amp sharing technique and dynamic comparator. In addition, fully-differential folded-cascode op amp with gain-boosting stage is designed by an automatic design tool. When 10-MHz input signal is applied, SNDR is 55.0 dB, and SNR is 56.7 dB. The DNL and INL exhibit 0.6 LSB, +1/-0.75 LSB respectively.
Hongbing ZHU Mamoru SASAKI Takahiro INOUE
In this paper, by making good use of the parallel-transit-evaluation algorithm and sparsity of the connection between neurons, a pipeline structure is successfully introduced to the sequential Boltzmann machine processor. The novel structure speeds up nine times faster than the previous one, with only the 12% rise in hardware resources under 10,000 neurons. The performance is confirmed by designing it using 1.2 µm CMOS process standard cells and analyzing the probability of state-change.
This paper describes an algorithm and its prototype system--VeriProc/1. 1--which can prove the correctness of pipelined and superscalar processor controls automatically without a pipeline invariant, human interaction, or additional information. This algorithm is based on behavior-covering and partial unfolding. No timing relations such as an abstract function or β-relation is required. The only information required is to specify the location of the selectors in the design. Partial unfolding makes it possible to derive superscalar specifications from conventional specifications. Correctness proof of the partial unfolding is given. The prototype system can verify various superscalar control designs of simple processors.
Akio HARADA Kiyoshi NISHIKAWA Hitoshi KIYA
A pipelined architecture is proposed for the normalized least mean square (NLMS) adaptive digital filter (ADF). Pipelined implementation of the NLMS has not yet been proposed. The proposed architecture is the first attempt to implement the NLMS ADF in the pipelined fashion. The architecture is based on an equivalent expression of the NLMS derived in this study. It is shown that the proposed architecture achieves a constant and a short critical path without producing output latency. In addition, it retains the advantage of the NLMS, i. e. , that the step size that assures the convergence is determined automatically. Computer simulation results that confirm that the proposed architecture achieves convergence characteristics identical to those of the NLMS.
Daisuke MIYAZAKI Shoji KAWAHITO Yoshiaki TADOKORO
This paper presents a new scheme of a low-power area-efficient pipelined A/D converter using a single-ended amplifier. The proposed multiply-by-two single-ended amplifier using switched capacitor circuits has smaller DC bias current compared to the conventional fully-differential scheme, and has a small capacitor mismatch sensitivity, allowing us to use a smaller capacitance. The simple high-gain dynamic-biased regulated cascode amplifier also has an excellent switching response. These properties lead to the low-power area-efficient design of high-speed A/D converters. The estimated power dissipation of the 10-b pipelined A/D converter is less than 12 mW at 20 MSample/s.
Masahiro FUKUI Masakazu TANAKA Masaharu IMAI
This paper proposes a new flexible hardware model for pipelined design optimization. Using together with an RTL floorplanner, the flexible hardware model makes accurate and fine design space exploration possible. It is quite effective for deep submicron technology since estimation at high level has become a difficult problem and the design tuning at lower level of abstraction makes up the full design optimization task. The experimental results show that our approach reduces the slack time in the pipeline stages then achieves higher performance with a smaller area.
Toru SHONAI Kazuhiko MATSUMOTO
A formal verification approach that combines verification based on binary decision diagrams (BDDs) and theorem-prover-based verification has been developed. This approach is called the incremental formal verification approach. It uses an incremental verifier based on BDDs and a conventional theorem-prover-based verifier. Inputs to the incremental verifier are specifications in higher-level descriptions given in terms of arithmetic expressions, lower-level design descriptions given in terms of Boolean expressions, and constraints. The incremental verifier limits the behavior of the design by using the constraints, and compares the partial behavior limited by the constraints with the specifications by using BDD-based Boolean matching. It also replaces the matched part of the lower design description with equivalent constructs in the higher descriptions. Successive uses of the incremental verifier with different constraints can produce higher design descriptions from the lower design descriptions in a step-by-step manner. These higher descriptions are then input to the theorem-prover-based verification which enables faster treatment of larger circuits. Preliminary experimental results show that the incremental verifier can successfully check the partial equivalence and replace the matched parts by higher constructs.
Shinichi YOROZU Yoshihito HASHIMOTO Shuichi TAHARA
We report the state of the art of superconducting network switching circuits and system technology. Mainly, we describe our switching core circuits and challenges to demonstrate superconducting prototype systems. And also, we review other approach to perform the superconducting digital communication briefly. In our switching core circuits, a ring-pipeline architecture has been proposed and the component circuits of the prototype chips have been fabricated and tested successfully. It is very important to demonstrate the prototype system in order to estimate the total performance of the system with superconducting devices. We have designed a multi-processor system with a superconducting network as a prototype system to demonstrate an interprocessor network system.
Hiroshi MATSUOKA Kazuaki OKAMOTO Hideo HIRONO Mitsuhisa SATO Takashi YOKOTA Shuichi SAKAI
In this paper we describe the pipeline design and enhanced hardware for fast message handling in a RICA-1 processor, a processing element (PE) in the RWC-1 multiprocessor. The RWC-1 is based on the reduced inter-processor communication architecture (RICA), in which communications are combined with computation in the processor pipeline. The pipeline is enhanced with hardware mechanisms to support fine-grain parallel execution. The data paths of the RICA-1 super-scalar processor are commonly used for communication as well as instruction execution to minimize its implementation cost. A 128-PE system has been built on January 1998, and it is currently used for hardware debugging, software development and performance evaluation.
Seung Ho OH Han Jun CHOI Moon Key LEE
This paper describes the design of a multistandard video encoder. The proposed encoder accepts conventional NTSC/PAL video signals. The encoder consists of four major building functions which are color space converter, digital filters, color modulator, and timing generator. In order to support multistandard video signals, a programmable systolic architecture is adopted in designing various digital filters. Interpolation digital filters are also used to enhance SNR of encoded video signals. The input to the encoder can be either YCbCr signal or RGB signal. The outputs are luminance (Y), chrominance (C), and composite video baseband (Y+C) signals. The architecture of the encoder is defined by using Matlab program and is modelled by using Verilog-HDL language. The overall operation is verified by using various video signals, such as color bar patterns, ramp signals, and so on. The encoder contains 36 k gates and is implemented by using 0. 65 µm CMOS process.
Shyh-Jong CHEN Rung-Ji SHANG Xian-June HUANG Shang-Jang RUAN Feipei LAI
By treating each different output pattern as a state, we propose a low power architecture for pipelined circuits using bipartition. It is possible that the output of a pipelined circuit transit mainly among some of different states. If some few states dominate most of the time, we could partition the combinational portion of a pipelined circuit into two blocks: one that contains the few states with high activity is small and the other that contains the remainder with low activity is big. The original pipelined circuit is bipartitioned into two individual pipelined circuits. An additional combination logic block is introduced to control which of the two partitioned blocks to work. Power reduction is based on the observation that most time the small block is at work and the big one is at idle. In order to minimize the power consumption of this architecture, we present an algorithm that can improve the efficiency of this additional control block. Experiments with MCNC benchmarks show high percentage of power saving by using our new architecture for low power pipelined circuit design.
Takashi OKUDA Osamu MATSUMOTO Toshio KUMAMOTO Masao ITO Hiroyuki MOMONO Takahiro MIKI Takeshi TOKUDA
This paper describes the 10-bit 50 MS/s pipelined CMOS A/D Converter using a "reference feed-forward architecture." In this architecture, reference voltage generated in a reference generator block and residual voltage from a DA/subtractor block are fed to the next stage. The reference generator block and DA/subtractor block are constructed using resistive-load, low-gain differential amplifiers. The high-gain, high-speed amplifiers consuming much power are not used. Therefore, the power consumption of this ADC is reduced. The gain matching of the reference voltage with the internal signal range is achieved through the introduction of the reference generator block having the same characteristics as a DA/subtractor block. Each offset voltage of the differential amplifier in the reference generator block and the DA/subtractor block is canceled by the offset cancellation technique, individually. In addition, the front-end sample/hold circuit is eliminated to reduce power consumption. Because of the introduction of high-speed comparators based on the source follower and latch circuit into the first stage A/D subconverter, analog bandwidth is not degraded. This ADC has been fabricated in double-polysilicon, double-metal, 0.5µm CMOS technology, and it operates at 50 MS/s with a 300-mW (Vdd=3.0 V) power consumption. The differential linearity error of less than +/-1 LSB is obtained.
Tsuyoshi ISSHIKI Wayne Wei-Ming DAI Hiroaki KUNIEDA
In this paper, we will show some significant results of the routability analysis of bit-serial pipeline datapath designs based on Rent's rule and Donath's observation. Our results show that all of the tested bit-serial benchmarks have Rent exponent of below 0.4, indicating that the average wiring length of the circuit is expected to be independent of the circuit size. This study provides some important implications on the silicon utilization and time-area efficiency of bit-serial pipeline circuits on FPGAs and ASICs.
Hiroyuki OCHI Yoko KAMIDOI Hideyuki KAWABATA
This paper proposes a new approach that makes it possible for every undergraduate student to perform experiments of developing a Ipipelined RISC processor within limited time available for the course. The approach consists of 4 steps. At the first step, every student implements by himself/herself a pipelined RISC processor which is based on a given, very simple model; it has separate buses for instruction and data memory ("Harvard architecture") to avoid structural hazard, while it completely ignores data control hazards to make implementation easy. Although it is such a "defective" processor, we can test its functionality by giving object code containing sufficient amount of NOP instructions to avoid hazards. At the second step, NOP instructions are deleted and behavior of the developed processor is observed carefully to understand data and control hazards. At the third step, benchmark problems are provided, and every student challenges to improve its performance. Finally every student is requested to present how he/she improved the processor. This paper also describes a new educational FPGA board ASAver.1 which is useful for experiments from introductory class to computer architecture/system class. As a feasibility study, a 16-bit pipelined RISC processor "ASAP-O" has been developed which has eight 16-bit general purpose registers, a 16-bit program counter, and a zero flag, with 10 essential instructions.