Norio TAGAWA Atsuya INAGAKI Akihiro MINAGAWA
Since the detection of optical flow (two-dimensional motion field on an image) from image sequences is essentially an ill-posed problem, most of the conventional methods use a smoothness constraint for optical flow heuristically and detect reasonable optical flow. However, little discussion exists regarding the degree of smoothness. Furthermore, to recover the relative three-dimensional motion and depth between a camera and a rigid object, in general at first, the optical flow is detected without a rigid motion constraint, and next, the motion and depth are estimated using the detected optical flow. Rigorously speaking, the optical flow should be detected with such a constraint, and consequently three-dimensional motion and depth should be determined. To solve these problems, in this paper, we apply a parametric model to an optical flow, and construct an estimation algorithm based on this model.
Hirotoshi HONMA Shigeru MASUYAMA
If there exist any two vertices in G whose distance becomes longer when a vertex u is removed, then u is defined as a hinge vertex. Finding the set of hinge vertices in a graph can be used to identify critical nodes in an actual network. A number of studies concerning hinge vertices have been made in recent years. In general, it is known that more efficient sequential or parallel algorithms can be developed by restricting classes of graphs. For instance, Chang et al. presented an O(n+m) time algorithm for finding all hinge vertices of a strongly chordal graph. Ho et al. presented a linear time algorithm for all hinge vertices of a permutation graph. In this paper, we shall propose a parallel algorithm which runs in O(log n) time with O(n) processors on CREW PRAM for finding all hinge vertices of an interval graph.
Kazunori MIYOSHI Ichiro HATAKEYAMA Jun'ichi SASAKI Takahiro NAKAMURA
12-channel DC to 622-Mbit/s/ch optical transmitter and receiver have been developed for high-capacity and rather long (about 100 m) bit-parallel raw data transmission in intra- and inter-cabinet interconnection of large-scale switching, routing and computing system. Bit-parallel raw data transmission is done by using a bit-by-bit operational automatic decision threshold control receiver circuit with a DC-coupled configuration, the pin-PDs with their anodes and cathodes separated in a channel-by-channel manner, and a receiver preamplifier with a low-pass filter. The transmitter consists of a 12-channel LD sub-assembly unit and a LD driver LSI. The LD sub-assembly unit consists of a 12-channel array of high temperature characteristic 1.3-µm planar buried hetero-structure (PBH) LDs and 62.5/125 graded-index multi-mode fibers (GI62.5 MMFs). The 1.3-µm PBH LDs and the GI62.5 MMFs are optically coupled by passively visual alignment technology on the Si V-groove. The receiver consists of a 12-channel pin-PD sub-assembly unit and a receiver LSI. The pin-PD sub-assembly unit consist of a 12-channel array of pin-PDs and GI62.5 MMFs. They are optically coupled by using a flip-chip bonding on the Si V-groove. The transmitter and receiver each have eleven data channels and one clock channel. The size is as small as 3.6 cc for each modules, and the power consumptions are 1.7 W (transmitter) and 1.35 W (receiver). They transmitted a bit-parallel raw data through a 100-meter ribbon of GI62.5 MMFs in an ambient temperature range of 0-70C. They provide a synchronous PECL interface parallel link for with a 3.3-V single power supply.
Shinji NISHIMURA Tomohiro KUDOH Hiroaki NISHI Koji TASHO Katsuyoshi HARASAWA Shigeto AKUTSU Shuji FUKUDA Yasutaka SHIKICHI
RHiNET-2/SW is a network switch for the RHiNET-2 parallel computing system. RHiNET-2/SW enables high-speed and long-distance data transmission between PC nodes for parallel computing. In RHiNET-2/SW, a one-chip CMOS switch-LSI and eight pairs of 800-Mbit/s 12-channel parallel optical interconnection modules are mounted into a single compact board. This switch allows high-speed 8-Gbit/s/port parallel optical data transmission over a distance of up to 100 m, and the aggregate throughput is 64 Gbit/s/board. The CMOS-ASIC switching LSI enables high-throughput (64 Gbit/s) packet switching with a single chip. The parallel optical interconnection modules enable high-speed and low-latency data transmission over a long distance. The structure and layout of the printed circuit board is optimized for high-speed, high-density device implementation to overcome electrical problems such as signal propagation-loss and crosstalk. All of the electrical interfaces are composed of high-speed CMOS-LVDS logic (800 Mbit/s/pin). We evaluated the reliability of the optical I/O port through long-term data transmission. No errors were detected during 50 hours of continuous data transmission at a data rate of 800 Mbit/s 10 bits (BER: < 2.44 10-14). This test result shows that RHiNET-2/SW can provide high-throughput, long-transmission-length, and highly reliable data transmission in a practical parallel computing system.
Trong-Yen LEE Pao-Ann HSIUNG Sao-Jie CHEN
The hardware-software codesign of distributed embedded systems is a more challenging task, because each phase of codesign, such as copartitioning, cosynthesis, cosimulation, and coverification must consider the physical restrictions imposed by the distributed characteristics of such systems. Distributed systems often contain several similar parts for which design reuse techniques can be applied. Object-oriented (OO) codesign approach, which allows physical restriction and object design reuse, is adopted in our newly proposed Distributed Embedded System Codesign (DESC) methodology. DESC methodology uses three types of models: Object Modeling Technique (OMT) models for system description and input, Linear Hybrid Automata (LHA) models for internal modeling and verification, and SES/workbench simulation models for performance evaluation. A two-level partitioning algorithm is proposed specifically for distributed systems. Software is synthesized by task scheduling and hardware is synthesized by system-level and object-oriented techniques. Design alternatives for synthesized hardware-software systems are then checked for design feasibility through rapid prototyping using hardware-software emulators. Through a case study on a Vehicle Parking Management System (VPMS), we depict each design phase of the DESC methodology to show benefits of OO codesign and the necessity of a two-level partitioning algorithm.
Akira OHKI Mitsuo USUI Nobuo SATO Nobuyuki TANAKA Kosuke KATSURA Toshiaki KAGAWA Makoto HIKITA Koji ENBUTSU Shunichi TOHNO Yasuhiro ANDO
We have proposed parallel optical interconnection technology, or ParaBIT, for high-throughput, low-cost optical interconnections and already developed a prototype parallel optical interconnect module called "ParaBIT-0," which has a total throughput of 28 Gb/s (700 Mb/s 40 channels). We are now developing a compact, high-throughput module called "ParaBIT-1," which has a total throughput of 60 Gb/s (1.25 Gb/s 48 channels) and is designed to achieve the highest-ever throughput density of 3.3 Gb/s/cc. In this paper, we describe the packaging structure, optical coupling structure and transmission characteristics of ParaBIT-1. We also discuss the technical prospect of realizing a parallel optical interconnect module with the bit rate of 2.5 Gb/s/ch.
Youssef R. SENHAJI Takaya YAMAZATO Masaaki KATAYAMA Akira OGAWA
A modified version of the SAGE algorithm is presented for joint delay-azimuth-attenuation parameters' estimation in a multiuser DS-CDMA system. The introduced modification consists of using different time interval lengths when calculating the time correlations for optimizing the different channel parameters. This modification was proposed for the purpose of a further reduction in the algorithm's computational weight in case of receiving sufficiently resolvable waves. Specifically, we found that short interval windows are sufficient for estimating delays and azimuth angles, which is quite effective in reducing the computational burden in their optimization processes. As for the estimation of the attenuation parameters, a longer time window, equal to the preamble length, is considered for more accurate estimation. Also two other estimators are proposed. The first one combining the modified SAGE with a sequential estimation of the attenuation parameters, suitable for slowly varying channels. Another one, similar to the first, and primarily designed to alleviate the influence of present strong interferers. Through a numerical example, the performances of the three presented estimation schemes, in terms of their near-far resistance, are compared. And it is shown that the proposed second combined estimator outperforms the modified SAGE in environments with high MAI levels.
To reduce an amount of computation of full search algorithm for fast motion estimation, we propose a new and fast matching algorithm without any degradation of predicted images. The computational reduction without any degradation comes from adaptive matching scan algorithm according to the image complexity of the reference block in current frame. Experimentally, we significantly reduce the computational load compared with conventional full search algorithm.
Koichi OKAWA Kenichi HIGUCHI Mamoru SAWAHASHI
In order to increase the link capacity in the wideband direct sequence code division multiple access (W-CDMA) reverse link, employing a parallel-type coherent multi-stage interference canceller (COMSIC) is more practical than employing a serial (successive)-type due to its inherent advantage of a short processing delay, although its interference suppression effect is inferior to that of the serial-type. Therefore, this paper proposes a parallel-type COMSIC with iterative channel estimation (ICE) using both pilot and decision-feedback data symbols at each canceling stage in order to improve the interference suppression effect of the parallel-type COMSIC. Computer simulation results demonstrate that by applying the parallel-type COMSIC with ICE after FEC decoding, the capacity in an isolated cell can be increased by approximately 1.6 (2.5) times that of the conventional parallel-type COMSIC with channel estimation using only pilot symbols (the MF-based Rake receiver) at the required average transmit Eb/N0 of 15 dB, i.e. in the interference-limited channel. The results also show that, although the capacity in the isolated cell with the parallel-type COMSIC with ICE after FEC decoding is degraded by approximately 6% compared to that with the serial-type COMSIC with ICE after FEC decoding, the processing delay can be significantly decreased owing to the simultaneous parallel operation especially when the number of active users is large.
Moriya NAKAMURA Ken-ichi KITAYAMA
Error-free transmission of image fiber-optic two-dimensional (2-D) parallel interconnection using vertical-cavity surface-emitting laser (VCSEL)/photodiode (PD) arrays is demonstrated. Simple constructions of transmitter/receiver modules are proposed. Optical alignment is achieved without power-monitoring. Crosstalk from an adjacent channel was -34 dB. Misalignment tolerance for a BER of less than 10-9 was 85 µm. The results clearly indicate that the interconnection system built around an image fiber and 2-D VCSEL/PD arrays has promise for use in the highly parallel high-density optical interconnects of the future.
Morikazu NAKAMURA Norifumi NAKADA Hideki KINJO Kenji ONAGA
Autonomous distributed scheduling is based on the autonomous decentralized optimization and recently focused as one of flexible scheduling techniques which can more cope with dynamically changing situation than traditional ones. This paper proposes an autonomous distributed scheduling scheme for the parallel machine scheduling problem. Through computer simulation, we observe that our proposed scheme can more quickly reduce the total deadline over-time than one in the literature and can adapt flexibly to unusual situation (addition of jobs).
Boon-Keat TAN Ryuji YOSHIMURA Toshimasa MATSUOKA Kenji TANIGUCHI
A new architecture-based Dynamically Programmable Arithmetic Array processor (DPAA) is proposed for general purpose Digital Signal Processing applications. Parallelism and pipelining are achieved by using DPAA, which consists of various basic arithmetic blocks connected through a code-division multiple access bus interface. The proposed architecture poses 100% interconnection flexibility because connections are done virtually through code matching instead of physical wire connections. Compared to conventional multiplexing architectures, the proposed interconnection topology consumes less chip area and thus, more arithmetic blocks can be incorporated. A 16-bit prototype chip incorporating 10 multipliers and 40 other arithmetic blocks had been implemented into a 4.5 mm 4.5 mm chip with 0.6 µm CMOS process. DPAA also features its simple programmability, as numerical formula can be used to configure the processor without programming languages or specialized CAD tools.
Shinsuke KOBAYASHI Yoshinori TAKEUCHI Akira KITAJIMA Masaharu IMAI
In this paper, an architecture of multi-threaded processor for embedded systems is proposed and evaluated comparing with other processors for embedded systems. The experimental results show the trade-off of hardware costs and execution times among processors. Taking proposed multi-threaded processor into account as an embedded processor, design space of embedded systems are enlarged and more suitable architecture can be selected under some design constraints.
Kazuhiko USHIO Hideaki FUJIMOTO
First, we show that the necessary and sufficient condition for the existence of a balanced bowtie decomposition of the complete tripartite multi-graph λ Kn1,n2,n3 is (i) n1=n2=n3 0 (mod 6) for λ 1,5 (mod 6), (ii) n1=n2=n3 0 (mod 3) for λ 2,4 (mod 6), (iii) n1=n2=n3 0 (mod 2) for λ 3 (mod 6), and (iv) n1=n2=n3 2 for λ 0 (mod 6). Next, we show that the necessary and sufficient condition for the existence of a balanced trefoil decomposition of the complete tripartite multi-graph λ Kn1,n2,n3 is (i) n1=n2=n3 0 (mod 9) for λ 1,2,4,5,7,8 (mod 9), (ii) n1=n2=n3 0 (mod 3) for λ 3,6 (mod 9), and (iii) n1=n2=n3 3 for λ 0 (mod 9).
Wujian ZHANG Runde ZHOU Tsunehachi ISHITANI Ryota KASAI Toshio KONDO
This paper describes an improved multiresolution telescopic search algorithm (MRTlcSA) for block-matching motion estimation. The algorithm uses images with full and reduced bit resolution, and uses motion-track and adaptive-search-window strategies. Simulation results show that the proposed algorithm has low computational complexity and achieves good image quality. We have developed a systolic-architecture-based search engine that has split data paths. In the case of low bit-resolution, the throughput is increased by enhancing the operating parallelism. The new motion estimator works at a low clock frequency and a low supply voltage, and therefore has low power consumption.
Mitsuji MUNEYASU Kouichiro ASOU Yuji WADA Akira TAGUCHI Takao HINAMOTO
This paper presents a new implementation of fuzzy filters for edge-preserving smoothing of an image corrupted by impulsive and white Gaussian noise. This filter structure is expressed as an adaptive weighted mean filter that uses fuzzy control. The parameters of this filter can be adjusted by learning. Finally, simulation results demonstrate the effectiveness of the proposed technique.
This paper presents adaptive image enhancement algorithms which enhance hidden signals in the pictures and describes their implementation for real-time video signals. An image enhancement algorithm proposed by T. Peli and J. S. Lim is extended for application to video signals. A fast implementation algorithm is provided with parallel implementation. The proposed algorithms are shown to be realized in real-time on 200 MHz microprocessors with multimedia extensions for 720 480 (pixels) 30 frames/sec video signals.
Ryuichi FUJIMOTO Osamu WATANABE Fumie FUJII Hideyuki KAWAKITA Hiroshi TANIMOTO
Simple and scalable device-modeling techniques for inductors and capacitors are described. All model parameters are calculated from geometric parameters of the device, process parameters of the technology, and a substrate resistance parameter. Modeling techniques for other devices, such as resistors, varactor diodes, pads and MOSFETs, are also described. Some simulation results using the proposed device-modeling techniques are compared with measured results and they indicate adequacy of the proposed device-modeling techniques.
Trong-Yen LEE Pao-Ann HSIUNG Sao-Jie CHEN
A novel Multi-Level Partitioning (MLP) technique taking into account real-world constraints for hardware-software partitioning in Distributed Embedded Multiprocessor Systems (DEMS) is proposed. This MLP algorithm uses a gradient metric based on hardware-software cost and performance as the core metric for selection of optimal partitions and consists of three nested levels. The innermost level is a simple binary search that allows quick evaluations of a large number of possible partitions. The middle level iterates over different possible allocations of processors (that execute software) to subsystems. The outermost level iterates over the number of processors and the hardware cost range. Heuristics are applied to each level to avoid the expensive exhaustive search. The application of MLP as a recently purposed Distributed Embedded System Codesign (DESC) methodology shows its feasibility. Comparisons between real-world examples partitioned using MLP and using other existing techniques demonstrate contrasting strengths of MLP. Sharing, clustering, and hierarchical system model are some important features of MLP, which contribute towards producing more optimal partition results.
Chong Seong HONG Jin Myung WON Jin Soo LEE
This paper presents a multi-thread evolutionary programming (MEP) technique that is composed of global, local, and minimal search units. An appropriate search routine is called depending on the current situation and the individuals are updated by using the selected routine. In each search routine, the individuals are updated with a normalized relative fitness function to improve the robustness of the algorithm. The proposed method is applied to the problem of backing up a truck-and-trailer system to a loading dock. A fuzzy logic controller is designed for a truck-and-trailer backer-upper system and the MEP algorithm is used to optimize the representative parameters of the fuzzy logic controller. The simulation results show that the proposed controller performs well even under a large variety of initial positions.