Sang Wook PARK Fengchao XIAO Dong Chul PARK Yoshio KAMI
The crosstalk phenomenon, wich occurs between transmission lines, is caused by electromagnetic fields of currents flowing through the lines. Crosstalk between two bent lines is studied by using a set of solutions of modified telegrapher's equations. By expressing electromagnetic fields in terms of voltages and currents in the line ends, the resultant network function in the form of an ABCD matrix is obtained. Electromagnetic fields caused by currents flowing in risers at transmission line ends are taken into account in addition to those fields in line sections. The validity of the proposed approach was confirmed by comparing experimental results with computed results and those simulated by a commercial electromagnetic solver for some bent-line models.
Fukuhito OOSHITA Susumu MATSUMAE Toshimitsu MASUZAWA
For execution of computation-intensive applications, one of the most important paradigms is to divide the application into a large number of small independent tasks and execute them on heterogeneous parallel computing environments (abbreviated by HPCEs). In this paper, we aim to execute independent tasks efficiently on HPCEs. We consider the problem to find a schedule that maximizes the throughput of task execution for a huge number of independent tasks. First, for HPCEs where the network forms a directed acyclic graph, we show that we can find, in polynomial time, a schedule that attains the optimal throughput. Secondly, for arbitrary HPCEs, we propose an (+ε)-approximation algorithm for any constant ε(ε>0). In addition, we also show that the framework of our approximation algorithm can be applied to other collective communications such as the gather operation.
Takeshi UENO Takafumi YAMAJI Tetsuro ITAKURA
This paper describes a 1.2-V, 12-bit, 200-MSample/s current-steering CMOS digital-to-analog (D/A) converter for wireless-communication terminals. To our knowledge, the supply voltage of this converter is the lowest for high-speed applications. To overcome increasing device mismatch in low-voltage operation, we propose an H-shaped, 3-dimensional structure for reducing influence of voltage drops (IR drops) along power supplies. This technique relaxes mismatch requirements and allows use of small devices with small parasitics. By using this technique, a low-voltage, high-speed D/A converter was realized. The converter was implemented in a 90-nm CMOS technology. The modulator achieves the intrinsic accuracy of 12 bits and a spurious-free dynamic range (SFDR) above 55 dB over a 100-MHz bandwidth.
Tsutomu NAGATSUKA Yoshihito HIRANO Yoji ISOTA
A highly accurate measurement method of parameters of MZ-type LN optical intensity modulators is presented. In this method, a CW optical signal is input to an optical terminal and small CW RF signal is applied to an electrode of the modulator. Then sideband levels of an output optical signal at different bias points are measured by using optical spectrum analyzer. By using 1st order sideband levels which are measured at two different bias conditions, and using a compensation method to measured levels, we can obtain accurate chirp parameter even when very small power of RF signal is applied to the modulator. In this method, the chirp parameter can be obtained in good accuracy when the input RF voltage is only 3% of the halfwave voltage.
Chester SHU Ka-Lun LEE Mable P. FOK
We report the generation of time- and wavelength-interleaved optical pulses using the principle of sub-harmonic pulse gating in a dispersion-managed fiber cavity. The pulsed source has been applied to the processing of electrical and optical signals including analog-to-digital conversion, wavelength multicast, and serial-to-parallel optical data conversion.
Shinya MIYAMOTO Kenta KASAI Kohichi SAKANIWA
Decoding performance of LDPC (Low-Density Parity-Check) codes is highly dependent on the degree distributions of the Tanner graphs which define the LDPC codes. We compare two LDPC code ensembles, one has a uniform degree distribution and the other a non-uniform one over a BEC (Binary Erasure Channel) and a BSC (Binary Symmetric Channel) thorough DE (Density Evolution). We then derive sufficient conditions on the erasure probability of a BEC and the error probability of a BSC, under which the LDPC code ensembles with uniform degree distributions outperform those with non-uniform degree distributions.
Yang SONG Zhenyu LIU Takeshi IKENAGA Satoshi GOTO
This paper presents two hardware-friendly low-power oriented fast motion estimation (ME) algorithms and their VLSI implementations. The basic idea of the proposed partial distortion sorting (PDS) algorithm is to disable the search points which have larger partial distortions during the ME process, and only keep those search points with smaller ones. To further reduce the computation overhead, a simplified local PDS (LPDS) algorithm is also presented. Experiments show that the PDS and LPDS algorithms can provide almost the same image quality as full search only with 36.7% computation complexity. The proposed two algorithms can be integrated into different FSBMA architectures to save power consumption. In this paper, the 1-D inter ME architecture [12] is used as an detailed example. Under the worst working conditions (1.62 V, 125) and 166 MHz clock frequency, the PDS algorithm can reduce 33.3% power consumption with 4.05 K gates extra hardware cost, and the LPDS can reduce 37.8% power consumption with 1.73 K gates overhead.
Taiji SASAOKA Hideyuki KAWABATA Toshiaki KITAMURA
Parallel programs for distributed memory machines are not easy to create and maintain, especially when they involve sparse matrix computations. In this paper, we propose a program translation system for generating parallel sparse matrix computation codes utilizing PSBLAS. The purpose of the development of the system is to offer the user a convenient way to construct parallel sparse code based on PSBLAS. The system is build up on the idea of bridging the gap between the easy-to-read program representations and highly-tuned parallel executables based on existing parallel sparse matrix computation libraries. The system accepts a MATLAB program with annotations and generates subroutines for an SPMD-style parallel program which runs on distributed-memory machines. Experimental results on parallel machines show that the prototype of our system can generate fairly efficient PSBLAS codes for simple applications such as CG and Bi-CGSTAB programs.
Koji CHIDA Go YAMAMOTO Koutarou SUZUKI Shigenori UCHIYAMA Noburou TANIGUCHI Osamu SHIONOIRI Atsushi KANAI
We propose a protocol for implementing secure circuit evaluation (SCE) based on the threshold homomorphic ElGamal encryption scheme and present the implementation results of the protocol. To the best of knowledge of the authors, the proposed protocol is more efficient in terms of computational complexity than previously reported protocols. We also introduce applications using SCE and estimate their practicality based on the implementation results.
Takeshi KUMAKI Yutaka KONO Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH
This paper presents a scalable FPGA/ASIC implementation architecture for high-speed parallel table-lookup-coding using multi-ported content addressable memory, aiming at facilitating effective table-lookup-coding solutions. The multi-ported CAM adopts a Flexible Multi-ported Content Addressable Memory (FMCAM) technology, which represents an effective parallel processing architecture and was previously reported in [1]. To achieve a high-speed parallel table-lookup-coding solution, FMCAM is improved by additional schemes for a single search mode and counting value setting mode, so that it permits fast parallel table-lookup-coding operations. Evaluation results for Huffman encoding within the JPEG application show that a synthesized semi-custom ASIC implementation of the proposed architecture can already reduce the required clock-cycle number by 93% in comparison to a conventional DSP. Furthermore, the performance per area unit, measured in MOPS/mm2, can be improved by a factor of 3.8 in comparison to parallel operated DSPs. Consequently, the proposed architecture is very suitable for FPGA/ASIC implementation, and is a promising solution for small area integrated realization of real-time table-lookup-coding applications.
Providing data availability in a high performance computing environment is very important, especially in this data-intensive world. Most clusters either equip with RAID (Redundant Array of Independent Disks) devices or use redundant nodes to protect data from loss. However, neither of these can really solve the reliability problem incurred in a striped file system. Striping provides an efficient way to increase I/O throughput both in the distributed and parallel paradigms. But it also reduces the overall reliability of a disk system by N fold, where N is the number of independent disks in the system. Parallel Virtual File System (PVFS) is an open source parallel file system which has been widely used in the Linux environment. Its striping structure is good for performance but provides no fault tolerance. We implement Reliable Parallel File System (RPFS) based on PVFS but with reliability support. Our quantitative analysis shows that MTTF (Mean Time To Failure) of our RPFS is better than that of PVFS. Besides, we propose a parity cache table (PCT) to alleviate the penalty of parity updating. The evaluation of our RPFS shows that its read performance is almost the same as that of PVFS (2% to 13% degradation). As to the write performance, 28% to 45% improvement can be achieved depending on the behavior of the operations.
Takeshi KUMAKI Yasuto KURODA Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper presents a novel optimized real-time Huffman encoder using a pipelined data path based on CAM technology and a parallel code-word-table optimizer. The exploitation of CAM technology enables fast parallel search of the code word table. At the same time, the code word table is optimized according to the frequency of received input symbols and is up-dated in real-time. Since these two functions work in parallel, the proposed architecture realizes fast parallel encoding and keeps a constantly high compression ratio. Evaluation results for the JPEG application show that the proposed architecture can achieve up to 28% smaller encoded picture sizes than the conventional architectures. The obtained encoding time can be reduced by 95% in comparison to a conventional SRAM-based architecture, which is suitable even for the latest end-user-devices requiring fast frame-rates. Furthermore, the proposed architecture provides the only encoder that can simultaneously realize small compressed data size and fast processing speed.
Noriaki SUETAKE Eiji UCHINO Kanae HIRATA
Intelligent scissors is an interactive image segmentation algorithm which allows a user to select piece-wise globally optimal contour segment corresponding to a desired object boundary. However, the intelligent scissors is too sensitive to a noise and texture patterns in an image since it utilizes the gradient information concerning the pixel intensities. This paper describes a new intelligent scissors based on the concept of the separability in order to improve the object boundary extraction performance. The effectiveness of the proposed method has been confirmed by some experiments for actual images acquired by an ordinary digital camera.
Junghyun NAM Seungjoo KIM Sangjoon PARK Dongho WON
A remote user authentication scheme is a two-party protocol whereby an authentication server in a distributed system confirms the identity of a remote individual logging on to the server over an untrusted, open network. Recently, Lee et al. have proposed an efficient nonce-based scheme for remote user authentication using smart cards. This work reviews Lee et al.'s authentication scheme and provides a security analysis on the scheme. Our analysis shows that Lee et al.'s scheme does not achieve its basic aim of authenticating remote users and furthermore has a very hazardous method for changing passwords. In addition, we recommend some changes to the scheme so that it can attain at least its main security goal.
Xuan-Hieu PHAN Le-Minh NGUYEN Yasushi INOGUCHI Susumu HORIGUCHI
Conditional random fields (CRFs) have been successfully applied to various applications of predicting and labeling structured data, such as natural language tagging & parsing, image segmentation & object recognition, and protein secondary structure prediction. The key advantages of CRFs are the ability to encode a variety of overlapping, non-independent features from empirical data as well as the capability of reaching the global normalization and optimization. However, estimating parameters for CRFs is very time-consuming due to an intensive forward-backward computation needed to estimate the likelihood function and its gradient during training. This paper presents a high-performance training of CRFs on massively parallel processing systems that allows us to handle huge datasets with hundreds of thousand data sequences and millions of features. We performed the experiments on an important natural language processing task (text chunking) on large-scale corpora and achieved significant results in terms of both the reduction of computational time and the improvement of prediction accuracy.
Lihua WANG Eiji OKAMOTO Ying MIAO Takeshi OKAMOTO Hiroshi DOI
ID-SP-M4M scheme means ID-based series-parallel multisignature schemes for multi-messages. In this paper, we investigate series-parallel multisignature schemes for multi-messages and propose an ID-SP-M4M scheme based on pairings in which signers in the same subgroup sign the same message, and those in different subgroups sign different messages. Our new scheme is an improvement over the series-parallel multisignature schemes introduced by Doi et al.[6]-[8] and subsequent results such as the schemes proposed by Burmester et al.[4] and the original protocols proposed by Tada [20],[21], in which only one message is to be signed. Furthermore, our ID-SP-M4M scheme is secure against forgery signature attack from parallel insiders under the BDH assumption.
Takashi ISHIDA Masayuki GOTO Toshiyasu MATSUSHIMA Shigeichi HIRASAWA
Recently, a word-valued source has been proposed as a new class of information source models. A word-valued source is regarded as a source with a probability distribution over a word set. Although a word-valued source is a nonstationary source in general, it has been proved that an entropy rate of the source exists and the Asymptotic Equipartition Property (AEP) holds when the word set of the source is prefix-free. However, when the word set is not prefix-free (non-prefix-free), only an upper bound on the entropy density rate for an i.i.d. word-valued source has been derived so far. In this paper, we newly derive a lower bound on the entropy density rate for an i.i.d. word-valued source with a finite non-prefix-free word set. Then some numerical examples are given in order to investigate the behavior of the bounds.
Naobumi MICHISHITA Takashi HIBINO Hiroyuki ARAI
In the design of an active integrated antenna, it is necessary to analyze problems such as unwanted emissions or mutual coupling between elements. In this paper, we clarify the problems in implementing S-parameters for an FDTD analysis. Cubic spline interpolation is suitable for the construction of the S-parameter data. The implementation methods of terminal resistors and vias are examined. The proposed FDTD analysis becomes stable after correcting the discrete time lag in the formation of the incident wave. The validity of the proposed method is verified in its application to the low pass filter and the frequency tunable band pass filter.
Chun-Ping CHEN Yu DONG Maode NIU Deming XU Zhewang MA Tetsuo ANADA
Frequency-variation method (FVM), reported in [1], was further studied for simultaneously measuring the both complex permittivity and complex permeability by intentionally changing the test frequency to obtain different reflections. An enhanced coaxial-probe-based in-situ measurement system has been established. The spectral domain full-wave model is derived to take place of the quasi-static one. A novel coaxial probe is designed so that the one-port calibration could be performed with Agilent-supplied precise cal-kit instead of the liquid standard. Criterions for a right order of the interpolation polynomial used to approximate the frequency-dependent EM parameters; measures to reduce the residual mismatch errors and random error in reflection measurements and to suppress the ambiguities in solving the transcendent equation system were experimentally studied to resolve the problems and improve the accuracy in dispersive absorbing materials' test. Several typical dispersive absorbing coatings have been tested via FVM. The good comparison between the measured results and reference ones validate the feasibility of the proposed improved technique.
Toshiki KANAMOTO Shigekiyo AKUTSU Tamiyo NAKABAYASHI Takahiro ICHINOMIYA Koutaro HACHIYA Atsushi KUROKAWA Hiroshi ISHIKAWA Sakae MUROMOTO Hiroyuki KOBAYASHI Masanori HASHIMOTO
In this letter, we discuss the impact of intrinsic error in parasitic capacitance extraction programs which are commonly used in today's SoC design flows. Most of the extraction programs use pattern-matching methods which introduces an improvable error factor due to the pattern interpolation, and an intrinsically inescapable error factor from the difference of boundary conditions in the electro-magnetic field solver. Here, we study impact of the intrinsic error on timing and crosstalk noise estimation. We experimentally show that the resulting delay and noise estimation errors show a scatter which is normally distributed. Values of the standard deviations will help designers consider the intrinsic error compared with other variation factors.