Akira OHKI Mitsuo USUI Nobuo SATO Nobuyuki TANAKA Kosuke KATSURA Toshiaki KAGAWA Makoto HIKITA Koji ENBUTSU Shunichi TOHNO Yasuhiro ANDO
We have proposed parallel optical interconnection technology, or ParaBIT, for high-throughput, low-cost optical interconnections and already developed a prototype parallel optical interconnect module called "ParaBIT-0," which has a total throughput of 28 Gb/s (700 Mb/s 40 channels). We are now developing a compact, high-throughput module called "ParaBIT-1," which has a total throughput of 60 Gb/s (1.25 Gb/s 48 channels) and is designed to achieve the highest-ever throughput density of 3.3 Gb/s/cc. In this paper, we describe the packaging structure, optical coupling structure and transmission characteristics of ParaBIT-1. We also discuss the technical prospect of realizing a parallel optical interconnect module with the bit rate of 2.5 Gb/s/ch.
Junichi FUNASAKA Keizo SAISHO Akira FUKUDA
Since the traffic of NetNews is increasing, keeping all articles becomes serious problem from a viewpoint of waste of network bandwidth and the amount of disk usage. In addition, users read not all incoming articles. We have proposed several caching algorithms to overcome this problem and shown that a selective prefetch scheme gives the best system performance among the proposed ones. However, since the selective prefetch scheme employed a simple selecting policy, the scheme gave low hit ratio in some cases. Therefore, this paper intends to improve the selective prefetch scheme from a viewpoint of the amount of disk usage as well as hit ratio. In this paper, we divide the scheme into three factors: reference span, criterion, and threshold in criterion. Through simulation experiments using actual NetNews logs, we investigate the influence of the factors of the reference span and the threshold to system performance. As a result, it is shown that the reference span is more significant factor than the threshold, the selective prefetch scheme with a value around the seven days reference span keeps high hit ratio and reduces the amount of disk usage.
Hirotoshi HONMA Shigeru MASUYAMA
If there exist any two vertices in G whose distance becomes longer when a vertex u is removed, then u is defined as a hinge vertex. Finding the set of hinge vertices in a graph can be used to identify critical nodes in an actual network. A number of studies concerning hinge vertices have been made in recent years. In general, it is known that more efficient sequential or parallel algorithms can be developed by restricting classes of graphs. For instance, Chang et al. presented an O(n+m) time algorithm for finding all hinge vertices of a strongly chordal graph. Ho et al. presented a linear time algorithm for all hinge vertices of a permutation graph. In this paper, we shall propose a parallel algorithm which runs in O(log n) time with O(n) processors on CREW PRAM for finding all hinge vertices of an interval graph.
The performance of conventional error concealment (EC) is significantly affected by the method of selecting candidate motion vectors (MVs). In order to obtain more robust EC results, this letter proposes a new and efficient way to choose candidate MVs. The proposed approach systematically utilizes available neighboring MVs by exploiting a well-known spatiotemporal correlation of block MVs. Through extensive simulations with H.263, this letter demonstrates that the proposed candidate MVs provide superior concealed video quality in comparison to the best results of other existing techniques.
A multimedia content is usually read-only and composed of multimedia objects with their spatial and temporal specifications. These specifications given by its author can enforce the display of objects to be well organized for its context. When multimedia contents are serviced in network environment by an on-demand basis, the temporal relationship among the objects can be used to improve the performance of the service. This paper models the temporal relationship as a scenario that represents the presentation order of the objects in a scenario and proposes several scheduling methods that make it possible to rearrange the transmission order of objects in a scenario. As a result, system resources such as computing power and network bandwidth can be highly utilized. Since the temporal relationship of a scenario is static, it is possible to reduce the scheduling overhead of a server by pre-scheduling currently servicing scenarios. In addition, several simulation results are presented in order to compare and analyze the characteristics of the proposed methods.
Rainer HAINBERGER Yuki KOMAI Yasuyuki OZEKI Masahiro TSUCHIYA Kashiko KODATE Takeshi KAMIYA
By combining the technology of all-optical saturable absorbers and the diffractive optics, a scheme of all-optical time division demultiplexing module is investigated. Following authors' proposal, design, test fabrication of the optical platform in the previous paper, this paper focuses on the characterization of switching performance. Using a multiple quantum well saturable absorber of InGaAs/InAlAs composition, and gain switched semiconductor laser pulses of 25 ps pulse width, the switching function was demonstrated experimentally at wavelength of 1.55 µm. The switching on-off ratio was compared among 4 lens configuration, 2 lens configuration (2L) and free space, collinear geometry. No degradation was observed in the case of 2 lens configuration in comparison to collinear illumination. Thus the feasibility of all-optical switch module with power efficiency and high speed is predicted, under the assumption of the progress in sub-micron lithography.
Koyo NITTA Toshihiro MINAMI Toshio KONDO Takeshi OGURA
This paper describes a unique motion estimation and compensation (ME/MC) hardware architecture for a scene-adaptive algorithm. By statistically analyzing the characteristics of the scene being encoded and controlling the encoding parameters according to the scene, the quality of the decoded image can be enhanced. The most significant feature of the architecture is that the two modules for ME/MC can work independently. Since a time interval can be inserted between the operations of the two modules, a scene-adaptive algorithm can be implemented in the architecture. The ME/MC architecture is loaded on a single-chip MPEG-2 video encoder.
Shinji NISHIMURA Tomohiro KUDOH Hiroaki NISHI Koji TASHO Katsuyoshi HARASAWA Shigeto AKUTSU Shuji FUKUDA Yasutaka SHIKICHI
RHiNET-2/SW is a network switch for the RHiNET-2 parallel computing system. RHiNET-2/SW enables high-speed and long-distance data transmission between PC nodes for parallel computing. In RHiNET-2/SW, a one-chip CMOS switch-LSI and eight pairs of 800-Mbit/s 12-channel parallel optical interconnection modules are mounted into a single compact board. This switch allows high-speed 8-Gbit/s/port parallel optical data transmission over a distance of up to 100 m, and the aggregate throughput is 64 Gbit/s/board. The CMOS-ASIC switching LSI enables high-throughput (64 Gbit/s) packet switching with a single chip. The parallel optical interconnection modules enable high-speed and low-latency data transmission over a long distance. The structure and layout of the printed circuit board is optimized for high-speed, high-density device implementation to overcome electrical problems such as signal propagation-loss and crosstalk. All of the electrical interfaces are composed of high-speed CMOS-LVDS logic (800 Mbit/s/pin). We evaluated the reliability of the optical I/O port through long-term data transmission. No errors were detected during 50 hours of continuous data transmission at a data rate of 800 Mbit/s 10 bits (BER: < 2.44 10-14). This test result shows that RHiNET-2/SW can provide high-throughput, long-transmission-length, and highly reliable data transmission in a practical parallel computing system.
Yuko KAWAJIRI Shinji KOIKE Yoshimitsu ARAI Yasuhiro ANDO
We propose a compact multi-channel 90 optical deflection device for short-distance optical interconnection. The device consists of stacked bent multimode optical waveguides having reflecting mirrors with bending angles of 90. The structure of the bent multimode optical waveguide with a bending angle of 90 was designed by ray-tracing simulations. The simulated insertion loss for each channel of the device was 0.5 dB. We also propose a simple fabrication process using a pair of multi-channel linear optical waveguides with symmetrical 45 mirrors. An 8-channel 90 optical deflection device was fabricated using polymer materials and basic operation was confirmed. Our device has good potential for use as a high-density optical interconnection device.
Kazunori MIYOSHI Ichiro HATAKEYAMA Jun'ichi SASAKI Takahiro NAKAMURA
12-channel DC to 622-Mbit/s/ch optical transmitter and receiver have been developed for high-capacity and rather long (about 100 m) bit-parallel raw data transmission in intra- and inter-cabinet interconnection of large-scale switching, routing and computing system. Bit-parallel raw data transmission is done by using a bit-by-bit operational automatic decision threshold control receiver circuit with a DC-coupled configuration, the pin-PDs with their anodes and cathodes separated in a channel-by-channel manner, and a receiver preamplifier with a low-pass filter. The transmitter consists of a 12-channel LD sub-assembly unit and a LD driver LSI. The LD sub-assembly unit consists of a 12-channel array of high temperature characteristic 1.3-µm planar buried hetero-structure (PBH) LDs and 62.5/125 graded-index multi-mode fibers (GI62.5 MMFs). The 1.3-µm PBH LDs and the GI62.5 MMFs are optically coupled by passively visual alignment technology on the Si V-groove. The receiver consists of a 12-channel pin-PD sub-assembly unit and a receiver LSI. The pin-PD sub-assembly unit consist of a 12-channel array of pin-PDs and GI62.5 MMFs. They are optically coupled by using a flip-chip bonding on the Si V-groove. The transmitter and receiver each have eleven data channels and one clock channel. The size is as small as 3.6 cc for each modules, and the power consumptions are 1.7 W (transmitter) and 1.35 W (receiver). They transmitted a bit-parallel raw data through a 100-meter ribbon of GI62.5 MMFs in an ambient temperature range of 0-70C. They provide a synchronous PECL interface parallel link for with a 3.3-V single power supply.
Yoshiaki YASUNO Yasunori SUTOH Masahiko MORI Masahide ITOH Toyohiko YATAGAI
An improved pulse shaper is proposed which is able to control both the spatial and temporal profile of femtosecond light pulses. Our pulse shaper exploits the spatio-temporal coupling effect seen in pulse shapers. Its properties are numerically analyzed by application of the Wigner distribution function. We confirm that the spatio-temporal output pulse track dictates the differentiation of the phase mask; that the degree of spatio-temporal coupling is determined by the focal length ratio of the lenses in the pulse shaper; and that space to spatial-frequency chirp results from misalignment of lenses.
Osamu TAKANASHI Tsutomu HAMADA Junji OKADA Takeshi KAMIMURA Hidenori YAMADA Masao FUNADA Takashi OZAWA
We propose a low-cost, high-uniformity, and low excess loss star coupler. The proposed star coupler comprises a planar lightguide, a diffuser, and polymer optical fibers (POFs). High-uniformity of optical power distribution was enabled by utilizing the diffused light transmission. Input light is diffused by the diffuser that is attached between the input POFs and the planar lightguide and transmitted through the planar lightguide. The optimum width-to-length ratio of the lightguide is clarified through simulations and experiments. We fabricated the star couplers based on the optimum width-to-length ratio for evaluation. The fabricated 1616 star coupler showed the excellent uniformity at the distribution ratio of 0.8 dB and the excess loss of 3.3 dB. The fabricated star coupler also provides a wide tolerance for misalignment. The maximum number of nodes to assure high transmission quality and the bandwidth of the proposed star coupler are discussed. The proposed star coupler is remarkably cost effective since it can be produced by injection-molding technology. The proposed star coupler enables easy multi-channel interconnection.
Toshiaki KAGAWA Osamu TADANAGA Hiroyuki UENOHARA Kouta TATENO Chikara AMANO
VCSEL output light polarization was controlled by fabricating devices on (311) substrate. Stability was improved by introducing compressive strain to the quantum wells in the active layer. In experiments, the power penalty due to polarization-dependent loss in the transmission line was negligible for both VCSELs with unstrained and strained quantum well active layers on (311)B substrate. The sensitivity at 2.5 Gbps was improved in a device with a strained active layer because the intensity noise due to the polarization instability was reduced. These characteristics are discussed and compared to calculated results.
Wujian ZHANG Runde ZHOU Tsunehachi ISHITANI Ryota KASAI Toshio KONDO
This paper describes an improved multiresolution telescopic search algorithm (MRTlcSA) for block-matching motion estimation. The algorithm uses images with full and reduced bit resolution, and uses motion-track and adaptive-search-window strategies. Simulation results show that the proposed algorithm has low computational complexity and achieves good image quality. We have developed a systolic-architecture-based search engine that has split data paths. In the case of low bit-resolution, the throughput is increased by enhancing the operating parallelism. The new motion estimator works at a low clock frequency and a low supply voltage, and therefore has low power consumption.
Trong-Yen LEE Pao-Ann HSIUNG Sao-Jie CHEN
The hardware-software codesign of distributed embedded systems is a more challenging task, because each phase of codesign, such as copartitioning, cosynthesis, cosimulation, and coverification must consider the physical restrictions imposed by the distributed characteristics of such systems. Distributed systems often contain several similar parts for which design reuse techniques can be applied. Object-oriented (OO) codesign approach, which allows physical restriction and object design reuse, is adopted in our newly proposed Distributed Embedded System Codesign (DESC) methodology. DESC methodology uses three types of models: Object Modeling Technique (OMT) models for system description and input, Linear Hybrid Automata (LHA) models for internal modeling and verification, and SES/workbench simulation models for performance evaluation. A two-level partitioning algorithm is proposed specifically for distributed systems. Software is synthesized by task scheduling and hardware is synthesized by system-level and object-oriented techniques. Design alternatives for synthesized hardware-software systems are then checked for design feasibility through rapid prototyping using hardware-software emulators. Through a case study on a Vehicle Parking Management System (VPMS), we depict each design phase of the DESC methodology to show benefits of OO codesign and the necessity of a two-level partitioning algorithm.
Masanori HARIYAMA Seunghwan LEE Michitaka KAMEYAMA
In a real-time vision system, parallel memory access is essential for highly parallel image processing. The use of multiple memory modules is one efficient technique for parallel access. In the technique, data stored in different memory modules can be accessed in parallel. This paper presents an optimal memory allocation methodology to map data to be read in parallel onto different memory modules. Based on the methodology, a high-performance VLSI processor for three-dimensional instrumentation is proposed.
Masahiro OKUDA Sanjit K. MITRA Masaaki IKEHARA Shin-ichi TAKAHASHI
Most natural images are well modeled as smoothed areas segmented by edges. The smooth areas can be well represented by a wavelet transform with high regularity and with fewer coefficients which requires highpass filters with some vanishing moments. However for the regions around edges, short highpass filters are preferable. In one recently proposed approach, this problem was solved by switching filter banks using longer filters for smoothed areas of the images and shorter filters for areas with edges. This approach was applied to lossy image coding resulting in a reduction of ringing artifacts. As edges were predicted using neighboring pixels, the nonlinear transforms made the decorrelation more flexible. In this paper we propose a time-varying filterbank and apply it to lossless image coding. In this scheme, we estimate the standard deviation of the neighboring pixels of the current pixel by solving the maximum likelihood problem. The filterbank is switched between three filter banks, depending on the estimated standard deviation.
This letter presents a new transformation technique of series solution to asymptotic solution for a perfectly conducting wedge illuminated by E-polarized plane wave. This transformation gives an analytic manipulation example of the Weber-Schafheitlin integral for diffraction problem.
Moriya NAKAMURA Ken-ichi KITAYAMA
Error-free transmission of image fiber-optic two-dimensional (2-D) parallel interconnection using vertical-cavity surface-emitting laser (VCSEL)/photodiode (PD) arrays is demonstrated. Simple constructions of transmitter/receiver modules are proposed. Optical alignment is achieved without power-monitoring. Crosstalk from an adjacent channel was -34 dB. Misalignment tolerance for a BER of less than 10-9 was 85 µm. The results clearly indicate that the interconnection system built around an image fiber and 2-D VCSEL/PD arrays has promise for use in the highly parallel high-density optical interconnects of the future.
Wujian ZHANG Runde ZHOU Tsunehachi ISHITANI Ryota KASAI Toshio KONDO
The ring-like systolic array architecture described in this paper, based on a conventional one-dimensional systolic array architecture, was created through operator rescheduling based on the symmetry of data flow. This eliminated high-latency delay due to the stuffing of the array pipeline in the conventional architecture. The new architecture requires a memory bandwidth no greater than the conventional architecture does, but increases throughput and processor utilization while reducing power consumption.