Japan's PARTNERS Project, one of the programmes of ISY advocated by UN, has just started. This letter is a brief introduction of the trials being carried out by the partners in the University of Electro-communications under the Project. The focus is on the distance education and training via ETS-V overcoming the geographical extent and the cultural diversity of the Asia-Pacific Region.
Yasuo YOSHIDA Kazuyoshi HORIIKE Kazuhiro FUJITA
The matrix whose eigenvectors are the basis vectors of the DCT is introduced. This matrix leads to a convolution-product property using the DCT. Based on the property, the parameter of uniform blur, such as motion blur or out-of-focus blur, is estimated from the local minima of the DCT energy spectrum of a blurred image. Computer experiments confirmed that the DCT is superior to the DFT for estimating the parameter.
Kazutoshi KOBAYASHI Keikichi TAMARU Hiroto YASUURA Hidetoshi ONODERA
We propose a new architecture of Functional Memory type Parallel Processor (FMPP) architectures called bit-parallel block-parallel (BPBP) FMPP. Design details of a prototype BPBP FMPP chip are also shown. FMPP is a massively parallel processor architecture that has a memory-based simple two-dimensional regular array structure suitable for memory VLSI technology. Computation space increases as integration density of memory increases. Computation time does not depend on the number of processors. So far, a bit-serial word-parallel (BSWP) implementation based on a content addressable memory (CAM) is mainly investigated as one of promising architectures of FMPP. In a BSWP FMPP, each word of a CAM works as a processor, and the amount of hardware is minimized by abopting a bit-serial operation, thus maximizing integration scale. The BSWP FMPP, however, does not allow operations between two words, which restriction limits the applicability of the BSWP FMPP. On the other hand, the proposed BPBP FMPP is designed to execute logical and arithmetic operations on two words. These operations are performed simultaneously on every group of words called a block. BPBP FMPP hereby achieves a high performance while maintaining high integration density of the BSWP, and is suitable for various applications.
Hidetoshi ONODERA Kiyoshi TAKESHITA Keikichi TAMARU
We propose a fully digital architecture for Kohonen network suitable for VLSI implementation. The proposed architecture adopts a functional memory type parallel processor (FMPP) architecture which has a structure similar to a content addressable memory (CAM). One word of CAM is regarded as a processing element and a group of elements forms a neuron. All processing elements execute the same operation in bit-serial but in processor-parallel. Thus the number of instructions for realizing the network algorithm is independent of the number of neurons in the network. With reference to a previously reported CAM, we estimate a network with 96 neurons for speech recognition could be integrated on three chips using a 1.2 µm process, and it operates 50 times faster than a sequential hardware. Owing to its highly regular structure of memories, the proposed hardware architecture is well compatible with current VLSI technology.
Three dimensional (3-D) optics offers potential advantages to the massively-parallel systems over electronics from the view point of information transfer. The purpose of this paper is to survey some aspects of the 3-D optical interconnection technology for the future massively-parallel computing systems. At first, the state-of-art of the current optoelectronic array devices to build the interconnection networks are described, with emphasis on those based on the semiconductor technology. Next, the principles, basic architectures, several examples of the 3-D optical interconnection systems in neural networks and multiprocessor systems are described. Finally, the issues that are needed to be solved for putting such technology into practical use are summarized.
A cyclic analog-to-digital (A/D) converter is developed which accomplishes an n-b conversion in n/2 clock cycles. The architecture consists of two 1-b quantizers connected in a loop. A CMOS design of the 1-b quantizer is given to evaluate the performance of the A/D converter when implemented using presently available process. Spice simulations and error analyses show that a resolution higher than 10-b and a sampling rate up to 1.4 Msps are attainable with a 3-µm CMOS process. A prototype converter breadboarded using discrete components has confirmed the principles of operation and error analyses. The device count and the power consumption are small compared to those of a successive-approximation A/D converter. A chip area required for the CMOS implementation is also small because only four unit capacitors are involved. Therefore, the architecture proposed herein is most suited for high accuracy, medium speed A/D conversion.
Bumchul KIM Michitaka KAMEYAMA Tatsuo HIGUCHI
The performance of processing elements can be improved by the progress of VLSI circuit technology, while the communication overhead can not be negligible in parallel processing system. This paper presents a unified scheduling that allocates tasks having different task processing times in multiple processing elements. The objective function is formulated to measure communication time between processing elements. By employing constraint conditions, the scheduling efficiently generates an optimal solution using an integer programming so that minimum communication time can be achieved. We also propose a VLSI processor for robotics whose latency is very small. In the VLSI processor, the data transfer between two processing elements can be done very quickly, so that the communication cycle time is greatly reduced.
In this paper we present an Overlapped Block Gauss-Seidel (OBGS) algorithm for the solution of large scale LSEs (Linear System of Equations) based on array architecture which we have already proposed. Better partitioning for processor array usually requires (1) balanced block size, and (2) minimum coupling between blocks for better convergence. These conditions can well be satisfied by overlapping some variables in computation algorithm. The mathematical implication of overlapped partitioning is discussed at first, and some examples show the effectiveness of OBGS algorithm. Conclusion points out that the convergence properties can well be improved by proper choice of overlapped variables. An efficient algorithm is given for choosing block and variables in order to realize above conditions.
Takao TSUKUTAKI Masaru ISHIDA Yutaka FUKUI
This letter presents a technique to cancel the parasitic effects of operational amplifier (op amp) in active filter design. To minimize the effects, an op amp model considering the parasitics (i.e. both parasitic poles and zeros) is utilized. It is shown that undesirable factors in the transfer function due to the parasitics can be canceled well by predistorting the passive element values of the circuit. As an example, an active-R highpass filter is evaluated both theoretically and numerically. In this way, the proposed technique can be effectively incorporated into the design of active filters.
Behavior of solutions related to an accuracy exp(-1/ε) is studied. Computer results are given, and examined from the view-point of non-standard analysis. The experimental results raise some important questions on the computer study of slow-fast systems.
This paper presents a hardware architecture design methodology for hidden markov model based recognition systems. With the aim of realizing more advanced and user-friendly systems, an effective architecture has been studied not only for decoding, but also learning to make it possible for the system to adapt itself to the user. Considering real-time decoding and the efficient learning procedures, a bi-directional ring array processor is proposed, that can handle various kinds of data and perform a large number of computations efficiently using parallel processing. With the array architecture, HMM sub-algorithms, the forward-backward and Baum-Welch algorithms for learning and the Viterbi algorithm for decoding, can be performed in a highly parallel manner. The indispensable HMM implementation techniques of scaling, smoothing, and estimation for multiple observations can be also carried out in the array without disturbing the regularity of parallel processing. Based on the array processor, we propose the configuration of a system that can realize all HMM processes including vector quantization. This paper also describes that a high PE utilization efficiency of about 70% to 90% can be achieved for a practical left-to-right type HMMs.
Tadao NAKAGAWA Tetsuo HIROTA Takashi OHIRA
A novel sampling comparator circuit is presented for extending the pull-in range of microwave phase-locked oscillators (PLOs). It performs both phase and frequency detection without any frequency dividers, and a GaAs MMIC prototype is developed and tested. The proposed comparator improves the pull-in range by about 10 times more than is possible with conventional sampling phase detectors.
Yoshinori TAKEUCHI Zhao-Chen HUANG Masatomo SAEKI Hiroaki KUNIEDA
This paper introduces the new application specific architecture RHINE (Reconfigurable Hierarchical Image Neo-multiprocessor Engine) that is a multiprocessor system for moving picture CODEC. The array processor is known to be originally suited for data parallel processing such as image signal processing which requires vast amount of computations and has the identical instruction sequences on data. However, the moving picture CODEC algorithm suffers from the large load imbalance in the processings on multi-processors with the separated sub-images. Some load balancing techniques are indispensable in such applications for the highest speed-up. RHINE gives one of the optimal solutions for such a load balancing due to its feature of the self reconfigurable architecture. RHINE consists of Block Processing Units (BPU) hierarchically, in each of which has a common bus architecture of multiprocessors with a block memory. Processors in a BPU move to the other BPU according to the load imbalance between BPUs by switching the bus connection between BPUs. The advantage of RHINE architecture is demonstrated by showing performance simulations for real moving pictures.
Hui ZHAO Xiaokang YUAN Toru SATO Iwane KIMURA
The Viterbi algorithm is a well-established technique for channel and source decoding in high performance digital communication systems. However, excessive time consumption makes it difficult to design an efficient high-speed decoder for practical application. This paper describes the implementation of parallel Viterbi algorithm by multi-microprocessors. Internal computations are performed in a parallel fashion. The use of microprocessors allows low-cost implementation with moderate complexity. The software and hardware implementations of the Viterbi algorithm on parallel multi-microprocessors for real-time decoding are presented. The implemented method is based on a combination of forming a set of tables and calculations. For efficient operation under fully parallel Viterbi decoding by microprocessors, we considered: (1) branch metrics processing, path metrics updating, path memory updating and decoding output for microprocessor, (2) efficient decomposition of the sequential Viterbi algorithm into parallel algorithms, (3) minimization of the communication among the microprocessors. The practical solutions for the problems of synchronization among the miroprocessors, interconnection network for communication among the microprocessors and memory management are discussed. Furthermore the performance and the speed of the parallel Viterbi decoding are given. For a fixed processing speed of given hardwares, parallel Viterbi decoding allows a linear speed up in the throughput rate with a linear increase in hardware complexity.
Hiroshi KIMURA Akira MATSUZAWA Takashi NAKAMURA Shigeki SAWADA
This paper describes a monolithic 10-b A/D converter that realized a maximum conversion frequency of 300 MHz. Through the development of the interpolated-parallel scheme, the severe requirement for the transistor Vbe matching can be alleviated drastically, which improves differential nonlinearity (DNL) significantly to within 0.4 LSB. Furthermore, an extremely small input capacitance of 8 pF can be attained, which translates into better dynamic performance such as SNR of 56 dB and THD of 59 dB for an input frequency of 10 MHz. Additionally, the folded differential logic circuit has been developed to reduce the number of elements, power dissipation, and die area drastically. Consequently, the A/D converter has been implemented as a 9.0 4.2-mm2 chip integrating 36K elements, which consumes 4.0 W using a 1.0-µm-rule, 25-GHz ft, double-polysilicon self-aligned bipolar technology.
Noboru TAKAGI Kyoichi NAKASHIMA Masao MUKAIDONO
Recently, fuzzy logic which is a kind of infinite multiple-valued logic has been studied to treat certain ambiguities, and its algebraic properties have been studied by the name of fuzzy logic functions. In order to treat modality (necessity, possibility) in fuzzy logic, which is an important concept of multiple-valued logic, the intuitionistic logical negation is required in addition to operations of fuzzy logic. Infinite multiple-valued logic functions introducing the intuitionistic logical negation into fuzzy logic functions are called Kleene-Stone logic functions, and they enable us to treat modality. The domain of modality in which Kleene-Stone logic functions can handle, however, is too limited. We will define α-KS logic functions as infinite multiple-valued logic functions using a unary operation instead of the intuitionistic logical negation of Kleene-Stone logic functions. In α-KS logic functions, modality is closer to our feelings. In this paper we will show some algebraic properties of α-KS logic functions. In particular we prove that any n-variable α-KS logic function is determined uniquely by all inputs of 7 values which are 7 specific truth values of the original infinite truth values. This means that there is a bijection between the set of α-KS logic functions and the set of 7-valued α-KS logic functions which are restriction of α-KS logic functions to 7 specific truth values. Finally, we show a necessary and sufficient condition for a 7-valued logic function to be a 7-valued α-KS logic function.
Fumio MURABAYASHI Tatsumi YAMAUCHI Masahiro IWAMURA Takashi HOTTA Tetsuo NAKANO Yutaka KOBAYASHI
With increases in frequency and density of RISC microprocessors due to rapid advances in architecture, circuit and fine device technologies, power consumption becomes a bigger concern. Supply voltage should be reduced from 5 V to 3.3 V. In this paper, several novel circuits using 0.5µm BiCMOS technology are proposed. These can be applied to a superscalar RISC microprocessor at 3.3 V power supply or below. High speed and low power consumption characteristics are achieved in a floating-point data path, an integer data path and a TLB by using the proposed circuits. The three concepts behind the proposed high speed circuit techniques at low voltage are summarized as follows. There are a number of heavy load paths in a microprocessor, and these become critical paths under low voltage conditions. To achieve high speed characteristics under heavy load conditions without increasing circuit area, low voltage swing operation of a circuit is effective. By exploiting the high conductance of a bipolar transistor, instead of using an MOS transistor, low swing operation can be got. This first concept is applied to a single-ended common-base sense circuit with low swing data lines in the register file of a floating and an integer data path. Both multi-series transistor connections and voltage drops by Vth of MOS transistors and Vbe of bipolar transistors also degrade the speed performance of a circuit. Then the second concept employed is a wired-OR logic circuit technique using bipolar transistors which is applied to a comparator in the TLB instead of multi-series transistor connections of CMOS circuits. The third concept to overcome the voltage drops by Vth and Vbe is addition of a pull up PMOS to both the path logic adder and the BiNMOS logic gate to ensure the circuits have full swing operation.
Shuichi MAEDA Takafumi AOKI Tatsuo HIGUCHI
A new computer architecture using multiwavelength optoelectronic integrated circuits (OEICs) is proposed to attack the problems caused by interconnection complexity. Multiwavelength-OEIC architecures, where various wavelengths are employed as information carriers, provide the wavelength as an extra dimension of freedom for parallel processing, so that we can perform several independent computations in parallel in a single optical module using the wavelength space. This multiplex computing" enables us to reduce the wiring area required by a network and improve their complexity. In this paper, we discuss the efficient multiplexing of Batcher's bitonic sorting networks, highly parallel computing architectures that require global interconnections inherently. A systematic multiplexing of interconnection topology is presented using a binary representation of the connectivities of interconnection paths. It is shown that the wiring area can be reduced by a factor of 1/r2 using r kinds of wavelength components.
Saneaki TAMAKI Michitaka KAMEYAMA Tatsuo HIGUCHI
Design of locally computable combinational circuits is a very important subject to implement high-speed compact arithmetic and logic circuits in VLSI systems. This paper describes a multiple-valued code assignment algorithm for the locally computable combinational circuits, when a functional specification for a unary operation is given by the mapping relationship between input and output symbols. Partition theory usually used in the design of sequential circuits is effectively employed for the fast search for the code assignment problem. Based on the partition theory, mathematical foundation is derived for the locally computable circuit design. Moreover, for permutation operations, we propose an efficient code assignment algorithm based on closed chain sets to reduce the number of combinations in search procedure. Some examples are shown to demonstrate the usefulness of the algorithm.
This paper deals with the theory and design method of an efficient radix-4 divider using carry-propagation-free adders based on redundant binary {-1,0,+1} representation. The usual method of normalizing the divisor in the range [1/2,1) eliminates the advantages of using a higher radix than two, bacause many digits of the partial remainder are required to select the quotient digits. In the radix-4 case, it is shown that it is possible to select the quotient digits to refer to only the four (in the usual normalizing method it is seven) most significant digits of the partial remainder, by scaling the divisor in the range [12/8,13/8). This leads to radix-4 dividers more effective than radix-2 ones. We use the hyperstring graph representation proposed in Ref.(18) for redundant binary adders.