Takao ONOYE Gen FUJITA Masamichi TAKATSU Isao SHIRAKAWA Nariyoshi YAMAI
A single chip motion estimator is described dedicatedly for MPEG2 MP@HL moving pictures. Adopting a two-level hierarchical searching algorithm in detecting motion vectors, the computational labor can be reduced by 1/70 in comparison with the conventional algorithm. A novel mechanism is introduced into the full-search procedure, which attempts the maximum possible reuse of reference pixels in order to reduce the bandwidth of the frame memory interface. The proposed motion estimator is integrated in a 0.6 µm triple-metal CMOS chip, which contains 1,450 K transistors on a 12.713.7 mm2 die. The input clock rate can be attained up to 133 MHz, which enables the real time motion estimation for MPEG2 MP@HL.
Nobuhiko SUGINO Satoshi IIMURO Akinori NISHIHARA Nobuo FUJII
In this paper, DSPs, of which memory addresses are pointed by special purpose registers (address registers: ARs), are assumed, and methods to derive an efficient memory access pattern for those DSPs proposed. In such DSPs, programmers must take care for efficient allocation of memory space as well as effective use of registers, in order to derive an efficient program in the sense of execution period. In this paper, memory addresses and AR update operations are modeled by an access graph, and a novel memory allocation method is presented. This method removes cycles and forks in a given access graph, and decides an address location of variables in memory space with less overhead. In order to utileze multiple ARs, methods to assign variables into ARs are investigated. The proposed methods are applied to the compiler for DSP56000 and are proved to be effective by generated codes for several examples.
Hiroshi SHIMOTAHIRA Fumie TAGA
We propose the Kernel MUSIC algorithm as an improvement over the conventional MUSIC algorithm. This algorithm is based on the orthogonality between the image and kernel space of an Hermitian mapping constructed from the received data. Spatial smoothing, needed to apply the MUSIC algorithm to coherent signals, is interpreted as constructing procedure of the Hermitian mapping into the subspace spanned by the constituent vectors of the received data. We also propose a new spatial smoothing technique which can remove the redundancy included in the image space of the mapping and discuss that the removal of redundancy is essential for improvement of resolution. By computer simulation, we show advantages of the Kernel MUSIC algorithm over the conventional one, that is, the reduction of processing time and improvement of resolution. Finally, we apply the Kernel MUSIC algorithm to the Laser Microvision, an optical misroscope we are developing, and verify that this algorithm has about two times higher resolution than that of the Fourier transform method.
Two drawbacks of pyramidal wavelet transforms for finite-length sequences are the lack of conservation of the support and the boundary effect. In this letter, the structure of cyclic wavelet transforms (CWT) is used to permute the input and output data to map them into a linear array. Systolic realization of cyclic wavelet packet transforms (CWPT) is also presented to adequately deal with finite-length sequences which have dominant information on high or median frequency channels. The VLSI architectures designed in this letter are very attractive because adaptive processing can be achieved by just programming the filter coefficients.
THe decimation-in-time (DIT) and the decimation-in-frequency (DIF) algorithms are the most well-known fast algorithms for computing the discrete Fourier transform(DFT). These algorithms constitute the basis of the fast Fourier transform (FFT) implementations, including the pipeline implementation and other parallel configurations. This paper derives an alternative generalization of the algorithms which applies for sequences whose lengths are not a power of two. The treatment is consistent with the radix-two DIF and DIT algorithms, and the generalization is useful for utilizing the accumulated technologies of the FFT algorithm for such sequences.
This paper describes a trial of evaluating the proper characteristics of multiple sound insulatain systems from their output responses contaminated by unknown background noises. The unknown parameters of sound insulation systems are first estimated on the basis of hte linear time series on an intensity scale, describing functionally the input-output relation of the systems. Then, their output probability distributions are predicted when an arbitrary input noise passes through these insulation systems.
This paper proposes a fast timing recovery method with a decision feedback equalizer for baudrate sampling. The proposed method features two special techniques. The first one is for coarse estimation of the sampling phase. Internal signals of the oversampled analog-to-digital converter at different phases are directly taken out for parallel evaluation. The second technique provides fine tuning with a phase-modification stepsize which is adaptively controlled by the residual intersymbol interference. Simulation results by a full-duplex digital transmission system with a multilevel line code show superiority of the proposed method. The coarse timing estimation and the fine tuning reduce 75% and 40% of the time required by the conventional method,respectively. The overall saving in timing recovery is almost 60% over the conventional method. The proposed method could easily be extended to other applications with a decision feedback equalizer.
The principles and design of current sense amplifiers for low-voltage MOS memories are described. The low input impedance of current sense amplifiers is explained using a simple model consisting of negative and positive resistance. A description of the model realized by a common-gate MOS amplifier employing transconductance enhancing techniques is also given. Some current sensing schemes for low-voltage ROM's and/or SRAM's are shown. For SRAM application, a current sensing scheme employing large-gain inverter-type amplifiers is proposed. A test chip including SRAM macrocells was designed and fabricated with 3.3-V 0.5-µm CMOS technology. An SRAM using current sense amplifiers was able to demonstrate that current sensing suppressed bitline delay to half that in conventional current-mirror types. The current sense amplifier had the same operating limit as the current-mirror type for low supply voltages. The measured operating limit of the STSM in this work was 1.3-V for threshold voltages of 0.55-V(n-channel) and -0.65-V(p-channel).
The effect of silicone vapour concentration on the contact failure was examined by using micro relays and motor brush-slip ring(commutator) contacts, [(CH3) 2SiO]4: D4 was used as a vapour source of silicone contamination. Because the influence of the vapour of the silicone on the contact surface can not be avoided at all times due to its gradual evaporation in the atmosphere. The contact failure caused by the silicone vapour was confirmed as formation of SiO2 on the contact surfaceby analysis of EPMA and XPS. A minimum limiting concentration level which does not affect contact reliability was found. This limiting level was 10 ppm(O.13mg/l). Validity of the limiting level was confirmed by the relationships among concentration, temperature, SiO2 film thickness and contact resistance. Furthermore, the effect of the degree of silicone polymerization on the limiting concentration was derived by an empirical formula. This silicone is found to have polymerization degree larger than D7: n=7. These results were confirmed by the contact failure data due to the silicone contamination.
2D (two-dimensional) convolution is a basic operation in image processing and requires intensive computation. Although the SIMD model is considered suitable for 2D convolution, previous 2D convolution algorithms on the SIMD model assume unbounded number of PEs (Processing Elements) available, which we call unbounded case. Unbounded case could not be satisfied on real computers. In this paper, time-optimal data-parallel 2D convolution is studied on mesh-connected SIMD computers with bounded number of PEs. Because the optimal computation complexity is not difficult to achieve, the main concern of this paper is how to achieve optimal communication complexity. Firstly the lower bound computation complexity is analyzed. Then the lower bound communication complexities are analyzed under two typical data-distribution strategies: block-mapping and cyclic-mapping. Based on the analysis result, an optimal algorithm is presented under the block-mapping. The algorithm achieves the lower bound complexity both in computation and in communication.
Akihiro FUJIWARA Michiko INOUE Toshimitsu MASUZAWA Hideo FUJIWARA
The medial axis transform (MAT) is an image representation scheme. For a binary image, the MAT is defined as a set of upright maximal squares which consist of pixels of value l entirely. The MAT plays an important role in image understanding. This paper presents a parallel algorithm for computing the MAT of an n n binary image. We show that the algorithm can be performed in O(log n) time using n2/log n processors on the EREW PRAM and in O(log log n) time using n2/log log n processors on the common CRCW PRAM. We also show that the algorithm can be performed in O(n2/p2 + n) time on a p p mesh and in O(n2/p2 + (n log p)/p) time on a p2 processor hypercube (for 1 p n). The algorithm is cost optimal on the PRAMs, on the mesh (for 1 p n) and on the hypercube (for 1 p n/log n).
Takashi YOKOTA Hiroshi MATSUOKA Kazuaki OKAMOTO Hideo HIRONO Shuichi SAKAI
This paper discusses a massively parallel interconnection scheme for multithreaded architecture and introduces a new class of direct interconnection networks called the hierarchical Multidimensional Directed Cycles Ensemble (hMDCE). Its suitability for massively parallel systems is discussed. The network is evolved from the Multidimensional Directed Cycles Ensemble (MDCE) network, where each node is substituted by lower-level sub-networks. The new network addresses some serious problems caused by the increasing scale of parallel systems, such as longer latency, limited throughput and high implementation cost. This paper first introduces the MDCE network and then presents and examines in detail the hierarchical MDCE network. Bisection bandwidth of hMDCE is considerably reduced from its ancestor MDCE and the network performs significantly higher throughput and lower latency under some practical implementation constraints. The gate count and delay time of the compiled circuit for the routing function are insignificant. These results reveal that the hMDCE network is an important candidate for massively parallel systems interconnection.
Michael JURCZYK Thomas SCHWEDERSKI
Nonuniform traffic can degrade the overall performance of multistage interconnection networks substantially. In this paper, this performance degradation is traced back to blocking effects that are not present under uniform traffic patterns within a network. This blocking phenomenon is not mentioned in the literature and is termed higher order Head-of-Line-blocking (HOLk-blocking) in this paper. Methods to determine the HOL-blocking order of multistage networks in order to classify the networks are presented. The performance of networks under hot-spot traffic as a function of their HOL-blocking characteristics is studied by simulation. It is shown that network bandwidth and packet delay improve under nonuniform traffics with increasing HOL-blocking order of a network.
Fabrizio LOMBARDI Nohpill PARK Susumu HORIGUCHI
This paper proposes new algorithms for diagnosing (detection, identification and location) baseline multistage interconnection networks (MIN) as one of the basic units in a massively parallel system. This is accomplished in the presence of single and multiple faults under a new fault model. This model referred to as the geometric fault model, considers defective crossing connections which are located between adjacent stages, internally to the MIN (therefore, a fault corresponds to a physical bridge fault between two connections). It is shown that this type of fault affects the correct geometry of the network, thus requiring a different testing approach than previous methods. Initially, an algorithm which detects the presence of bridge faults (both in the single and multiple fault cases), is presented. For a single bridge fault, the proposed algorithm locates the fault except in an unique pathological case under which it is logically impossible to differentiate between two equivalent locations of the fault (however, the switching element affected by this fault is uniquely located). The proposed algorithm requires log2 N test vectors to diagnose the MIN as fault free (where N is the number of input lines to the MIN). For fully diagnosing a single bridge fault, this algorithm requires at most 2 log2 N tests and terminates when multiple bridge faults are detected. Subsequently, an algorithm which locates all bridge faults is given. The number of required test vectors is O(N). Fault location of each bridge fault is accomplished in terms of the two lines in the bridge and the numbers of the stages between which it occurs. Illustrative examples are given.
Kazushi MURAKOSHI Tadashi KURATA
We develop a simulation environment for designing and examining a neural network model at the network level. The aim of our research is to enable researchers investigating neural network connective models to save time by being equipped with a graphical user interface and database of the network models. This environment consists of three parts: (1) the kernel of the simulation system, (2) NNDBMS (Neural Networks DataBase Management System), and (3) a system for displaying simulation results in various ways.
Atsushi IWATA Rauf IZMAILOV Duan-Shin LEE Bhaskar SENGUPTA G. RAMAMURTHY Hiroshi SUZUKI
We propose a new QOS routing algorithm for finding a path that guarantees several quality of service (QOS) parameters requested by users, for ATM networks. It is known that a routing problem is NP-complete, if the number of additive QOS parameters, such as delay and cost, are more than or equal to two. Although a number of heuristic algorithms have been proposed recently to solve this problem, the appropriate choice of routing algorithms is still an open issue. In this paper, we propose a new heuristic routing algorithm, while being compliant with PNNI routing and signaling specification in the ATM Forum. The performance of algorithms is evaluated by simulation with a various network topologies and loading scenarios. This simulation results demonstrate that the proposed scheme improves the performance while reducing computational complexity.
Naoto HIRANO Naoyasu IKEDA Shinichi HISHIDA Setsu KANEKO
A 33-cm-Diagonal High-Resolution(1280 1024RGB, which stands for red, green, and blue) TFT-LCD with low, uniform parasitic capacitance between gate electrodes and source/drain electrodes has been developed using Fully Self-Aligned a-Si TFTs. The fabricated TFT-LCD shows no visible seams between block shot exposure regions, even in the display of gray images. In this paper, we describe(1) our full self-alignment technology for the TFTs, including the fabrication process and the technology for reducing OFF current in the TFTs under illumination, (2) SPICE simulation for estimating pixel voltage shift in the fabricated TFT-LCD, and (3) performance results for the fabricated TFT-LCD.
This paper concerns optimized facility design for VLSI production. The methods proposed are applicable in planning LSI production facilities with a good balance between the number of machines and the number of operators. The sequence in each processing step is analyzed in detail. A new algorithm based on the queueing model is developed for estimating the simultaneous requirements for the two kinds of resources, machines and operators. This estimation system can be applied to complicated fabrication schemes, such as batch processing, continuous processing, and mixed technologies. This methodology yields guidelines for ASIC LSI production system design.
Fumie TAGA Hiroshi SHIMOTAHIRA
It is an important problem in fields of radar, sonar, and so on to estimate parameters of closely spaced multiple signals. The MUSIC algorithm with the forward-backward (FB) spatial smoothing is considered as the most effective technique at present for the problem with coherent signals in a variety of fields. We have applied this in Laser Microvision. Recently, Shimotahira has proposed the Kernel MUSIC algorithm, which is applicable to cases when signal vectors and noise vectors are orthogonal. It also utilizes Gaussian elimination of the covariance matrix instead of eigenvalue analysis to estimate noise vectors. Although the amount of computation by the Kernel MUSIC algorithm became lighter than that of the conventional MUSIC algorithm, the covariance matrix was formed to estimate noise vectors and also all noise vectors were used to analyze the MUSIC eigenspectrum. The heaviest amount of computation in the Kernel MUSIC algorithm exists in the transformation of the covariance matrix and the analysis of the MUSIC eigenspectrum. We propose a more straightforward algorithm to estimate noise vectors without forming a covariance matrix, easier algorithm to analyze the MUSIC eigenspectrum. The superior characteristics will be demonstrated by results of numerical simulation.
This paper presets a fast computation algorithm for connection admission control for heterogeneous delay sensitive traffic in ATM networks. Cell loss probability is adopted as the measure of quality of service. In our study, cells of each connection are allowed to have two different loss priorities and the aggregate traffic can have more than two. To cope with multiple quality of service requirements, the link capacity is divided into several bands. For simplicity, each traffic source is assumed to alternate between active and idle periods. However, the results can be extended to traffic sources having more than two states. Upper bounds of actual cell loss probabilities are derived based on the bufferless fluidflow model. Numerical results show that the upper bounds are close to the actual cell loss probabilities.