This paper introduces a new method to recover 3-D road plane from its 2-D monocular perspective image. The research is aimed at the reconstruction of depth information from the 2-D visual input in road following and navigation. Planar road model is considered and the road-centered coordinate system which forms slope and turn angles with camera-centered coordinate system is used to describe boundary points on road plane. We develop approaches to find matching points of boundaries of road and to obtain angular parameters thereafter. A way of finding depth of matching points from the perspective images and angular parameters together is proposed. Therefore the 3-D road reconstruction can be replicated without introducing any parameters of inverse perspective.
Nobuo FUNABIKI Seishi NISHIKAWA
This paper presents a binary neural network approach for link activation problems in multihop radio networks. The goal of the NP-complete problems is to find a conflict-free link activation schedule with the minimum number of time slots for specified communication requirements. The neural network is composed of NM binary neurons for scheduling N links in M time slots. The energy functions and the motion equations are newly defined with heuristic methods. The simulation results through 14 instances with up to 419 links show that the neural network not only surpasses the best existing neural network in terms of the convergence rate and the computation time, but also can solve large scale instances within a constant number of iteration steps.
Hiroaki KOBAYASHI Hitoshi YAMAUCHI Yuichiro TOH Tadao NAKAMURA
This paper proposes a hierarchical parallel processing system for the multipass rendering method. The multipass rendering method based on the integration of radiosity and ray-tracing can synthesize photo-realistic images. However, the method is also computationally expensive. To accelerate the multipass rendering method, the system, called (Mπ)2, employs two kinds of parallel processing schemes. As a coarse-grain parallel processing, object-space parallel processing with multiple processing elements based on the object-space subdivision is adapted, and each processing element (PE) is equipped with multiple pipelined units for a fine-grain parallel processing. To balance load among the system, static load balancing at the PE level and dynamic load balancing at the pipelined unit level within the PE are introduced. Especially, we propose a novel static load allocation scheme, skewed-distributed allocation, which can effectively distribute a three-dimensional object space to one- or two-dimensional processor configuration of the (Mπ)2 system. Simulation experiments show that the two-dimensional (Mπ)2 systems with the skewed-distributed allocation outperform the three-dimensional systems with the non-skewed distributed allocation. Since lower dimensional systems can be built at a lower cost than higher dimensional systems, the skewed-distributed allocation will be meritorious. Besides, by the combination of static load balancing by the skewed-distributed allocation and the dynamic load balancing by dynamic ray allocation within each PE, the system performance can be further boosted. We also propose a cached frame buffer system to relieve access collision on a frame buffer.
Efficient parallel algorithms for several problems on proper circular arc graphs are presented in this paper. These problems include finding a maximum matching, partitioning into a minimum number of induced subgraphs each of which has a Hamiltonian cycle (path), partitioning into induced subgraphs each of which has a Hamiltonian cycle (path) with at least k vertices for a given k, and adding a minimum number of edges to make the graph contain a Hamiltonian cycle (path). It is shown here that the above problems can all be solved in logarithmic time with a linear number of EREW PRAM processors, or in constant time with a linear number of BSR processors. A more important part of this work is perhaps the extension of basic BSR to allow simultaneous multiple BROADCAST instructions.
The purpose of this letter is to investigate the stability of the active two port networks having some restrictions on load and source terminations, and the stability conditions having two inequalities have been obtained. As the terminations making the active two port networks stable can be obtained from these inequalities, these stability conditions are very useful for designing high frequency amplifiers, especially, tuned amplifiers.
We present a minimal lattice realization of MIMO linear discrete-time systems which interpolate the desired Markov and covariance parameters. The minimal lattice realization is derived via a recursive construction algorithm based on the state space description and it parametrizes all the interpolants.
In this work, a new structure of M-channel linear-phase paraunitary filter banks is proposed, where M is even. Our proposed structure can be regarded as a modification of the conventional generalized linear-phase lapped orthogonal transforms (GenLOT) based on the discrete cosine transform (DCT). The main purpose of this work is to overcome the limitation of the conventional DCT-based GenLOT, and improve the performance of the fast implementation. It is shown that our proposed fast GenLOT is superior to that of the conventional technique in terms of the coding gain. This work also provides a recursive initialization design procedure so as to avoid insignificant local-minimum solutions in the non-linear optimization processes. In order to verify the significance of our proposed method, several design examples are given. Furthermore, it is shown that the fast implementation can be used to construct M-band linear-phase orthonormal wavelets with regularity.
Yoshiaki SAITOH Akira KANKE Isamu SHINOZAKI Tohru KIRYU Jun'ichi HORI
Adapting the principle of parametron oscillation, a small implantable temperature sensor requiring no internal power supply is described. Since this sensor's oscillation frequency is half that of the excitation frequency, the oscillated signal can be measured from the reception side, free of any signal, interference, simply by positioning the sensor and the excitation antenna so that; 1) they are separated up to 95 cm in the air; 2) a 41 cm gap, the phantom equivalent of the thickness of the human abdomen maintain between them. In the temperature-dependent quartz resonator sensor, oscillation occurs only when frequency and temperature correspond. The excitation power is then adjusted so that the frequency bandwidth narrows. As a result, the margin of error in measuring the temperature is minimized; (0.07).
Saed SAMADI Akinori NISHIHARA Nobuo FUJII
A classs of type 1 linear phase FIR digital filters is proposed. The filter can be realized using a parallel, modular and regular array structure. It is shown that, under some simple constraints, the consisting modules of the array can be realized free of multiplier coefficients. Such two dimensional mesh arrays are specially suitable for realization with special-purpose systolic hardware for high-speed digital signal processing tasks. Compared to the array structure, proposed by the authors, for multiplierless realization of maximally flat FIR digital filters, this class needs less adders to fulfill the same magnitude response requirements. Another attractive property of the proposed array is that a number of highpass or lowpass filters with different passband widths can be realized simultaneously in a very economical way.
Because the match phase in OPS5-type production systems requires most of the system's execution time and memory accesses, we proposed hash-based parallel production systems, CPPS (Clustered Parallel Production Systems), based on the RETE algorithm for distributed memory parallel computers, or multicomputers to reduce such a bottleneck. CPPS was effective in speeding up the match phase, but still left room for optimizations. In this paper, we introduce software cache techniques to memory nodes in the CPPS as one of the optimizations, and implement it on a multicomputer, nCUBE2. The benchmark results show that the CPPS with the software cache is about 2-fold faster than the original, and more than 7-fold faster than the simple hash method proposed by Acharya et al. for a large scale problem. The speed-up can be attributed to decreased communication costs.
This paper presents a method for mechanically transforming a parallel algorithm on an original network so that the algorithm can work on a target network. It is assumed that the networks are of cube-type such as the shuffle-exchange network, omega network, and hypercube. Were those networks isomorphic to each other, the algorithm transformation is an easy task. The proposed transformation method is based on a novel graphembedding scheme <φ: δ, κ, π, ψ>. In addition to the dilating operation δ of the usual embedding scheme <φ: δ>, the novel scheme uses three primitive graph-transformation operations; κ (= δ-1) for contracting a path into a node, π for pipelining a graph, and ψ (= π-1) for folding a pipelined graph. By applying the primitive operations, the cube-type networks can be transformed so as to be isomorphic to each other. Relationships between the networks are represented by the composition of applied operations. With the isomorphic mapping φ, an algorithm in a node of the original network can be simulated in the corresponding node(s) of the target network. Thus the algorithm transformation is reduced to routine work.
Takashi YOKOTA Hiroshi MATSUOKA Kazuaki OKAMOTO Hideo HIRONO Shuichi SAKAI
This paper discusses a massively parallel interconnection scheme for multithreaded architecture and introduces a new class of direct interconnection networks called the hierarchical Multidimensional Directed Cycles Ensemble (hMDCE). Its suitability for massively parallel systems is discussed. The network is evolved from the Multidimensional Directed Cycles Ensemble (MDCE) network, where each node is substituted by lower-level sub-networks. The new network addresses some serious problems caused by the increasing scale of parallel systems, such as longer latency, limited throughput and high implementation cost. This paper first introduces the MDCE network and then presents and examines in detail the hierarchical MDCE network. Bisection bandwidth of hMDCE is considerably reduced from its ancestor MDCE and the network performs significantly higher throughput and lower latency under some practical implementation constraints. The gate count and delay time of the compiled circuit for the routing function are insignificant. These results reveal that the hMDCE network is an important candidate for massively parallel systems interconnection.
Akihiro FUJIWARA Michiko INOUE Toshimitsu MASUZAWA Hideo FUJIWARA
The medial axis transform (MAT) is an image representation scheme. For a binary image, the MAT is defined as a set of upright maximal squares which consist of pixels of value l entirely. The MAT plays an important role in image understanding. This paper presents a parallel algorithm for computing the MAT of an n n binary image. We show that the algorithm can be performed in O(log n) time using n2/log n processors on the EREW PRAM and in O(log log n) time using n2/log log n processors on the common CRCW PRAM. We also show that the algorithm can be performed in O(n2/p2 + n) time on a p p mesh and in O(n2/p2 + (n log p)/p) time on a p2 processor hypercube (for 1 p n). The algorithm is cost optimal on the PRAMs, on the mesh (for 1 p n) and on the hypercube (for 1 p n/log n).
2D (two-dimensional) convolution is a basic operation in image processing and requires intensive computation. Although the SIMD model is considered suitable for 2D convolution, previous 2D convolution algorithms on the SIMD model assume unbounded number of PEs (Processing Elements) available, which we call unbounded case. Unbounded case could not be satisfied on real computers. In this paper, time-optimal data-parallel 2D convolution is studied on mesh-connected SIMD computers with bounded number of PEs. Because the optimal computation complexity is not difficult to achieve, the main concern of this paper is how to achieve optimal communication complexity. Firstly the lower bound computation complexity is analyzed. Then the lower bound communication complexities are analyzed under two typical data-distribution strategies: block-mapping and cyclic-mapping. Based on the analysis result, an optimal algorithm is presented under the block-mapping. The algorithm achieves the lower bound complexity both in computation and in communication.
Noritaka SHIGEI Hiromi MIYAJIMA Takayuki ISHIZAKA Sadayuki MURASHIMA
To enhance fabrication yield for processor arrays, many reconfiguration schemes for replacing faulty processing elements (PE's) with spare PE's have been proposed. An array grid model based on single-tracks is one of such models. For this model, some algorithms for reconfiguring processor arrays have been proposed. However, an algorithm which can reconfigure the array, whenever the array is reconfigurable, has not been proposed yet. This paper presents two types of methods for reconfiguration of processor arrays. Both the types use indirect replacements for reconfiguring arrays. For an indirect replacement of a faulty non-spare PE, one has a fixed direction, the other has at most four directions among which one is chosen. For the former, we consider the several distribution of spare PE's, and computer simulations show a tendency in the term of difference in the distributions. The latter algorithms consist of two phases. In the first phase, rows and columns of spare PE's are decided in accordance with a rule. Several rules for deciding spare PE's are considered in this paper. In the second phase, faulty non-spare PE's are replaced with healthy spare PE's. By simulations the performance of the algorithms are evaluated and a tendency is shown in the terms of difference in disposition of spare PE's.
A simple algorithm is proposed for representing nonseparable functions by equivalent separable functions. In this algorithm, functions are first represented by computational graphs, which are directed graphs representing the computational process of the functions. Then, the vertices of the computational graphs are searched in preorder or postorder, and the transformation to separable forms is performed at the places where it is necessary. By this repetition of the transformation, nonseparable functions are represented by separable functions automatically. The proposed algorithm will be useful in various fields of science and engineering because funcutions of one variable are easy to deal with.
Osamu AKIZUKI Shingo SUZUKI Kouichi MUTSUURA Shinjirou OOSHITA
In packet radio networks with TDMA, the throughput performance of network should be degraded due to the unequal traffic of each user. To overcome this problem, Mini-Slotted Alternating Priorities (MSAP) and TDMA with Parallel Transmission (TDMA/PT) were proposed. Especially, TDMA/PT can attain the thorughput performance more than one, even under unequal traffic. However, TDMA/PT cannot be used for mobile networks, because each terminal should know the location of every other terminal. In this paper, we propose an entirely new protocol named Slot Reservation TDMA with Parallel Transmissino: SR-TDMA/PT," which is suitable for mobile networks because a central station is able to locate every terminal easily. The central station also reserves time slots for each terminal so as to transmit packets in parallel as much as possible. Therefore, the throughput performance of SR-TDMA/PT is higher than TDMA/PT. We describe SR-TDMA/PT in detail and evaluate the performance of this protocol by simulation under various conditions.
Keisuke NAKANO Naoyuki KARASAWA Masakazu SENGOKU Shoji SHINODA Takeo ABE
This paper describes communication traffic characteristics in cellular systems employing the concept of reuse partitioning and Dynamic Channel Assignment. Such systems hava a problem of the spatial unbalance of blocking probability. The objective of this paper is overcoming this problem. To accomplish this objective, we use a method for analyzing communication traffic characteristics. We also show results on traffic characteristics in the systems.
In this paper, we propose an analytic model using a semi-markov process for parallel computers which provides hardware support for a cache coherence mechanism. The model proposed here, the Semi-markov Memory and Cache coherence Interference model, can be used for the performance prediction of cache coherence based parallel computers since it can be easily applied to descriptions of the waiting states due to network contention or memory interference of both normal data accesses and cache coherence requests. Conventional analytic models using stochastic processes to describe parallel computers have the problem of numerical explosion in the number of states necessary as the system size increases even for simple parallel computers without cache coherence mechanisms. The number of states required by constructing our proposing analytic model, however, does not depend on the system size but only on the kind of cache coherence protocol. For example, the number of states for the Synapse cache coherence protocol is only 20, as is described in this paper. Using the proposed analytic model, we investigate several comparative experiments with widely known simulation results. We found that there is only a 7.08% difference between the simulation and our analytic model, while our analytic model can predict the performance of a 1,024 processor system in the order of microseconds.
Hiroto SHINGAI Hiroyuki MATSUNAGA Kiichi URAHAMA
A method based on clustering is presented for restoring and segmenting gray scale images. An optimum clustering obtained by a gradient method gives an image with gray scale values which vary smoothly in each segmented region. The method is also applied to restoration from sparsely sampled data.