Johan BAUWELINCK Dieter VERHULST Peter OSSIEUR Xing-Zhi QIU Jan VANDEWEGE Benoit DE VOS
This paper presents a new approach based on current mode circuits for fast and accurate optical level monitoring with wide dynamic range of a gigabit burst-mode laser driver chip. Our proposed solution overcomes the drawbacks that voltage mode implementations show at higher bit rates or in other technologies. The main speed-limiting factor of the level monitoring circuitry is the parasitic capacitance of the back facet monitor photodiode. We propose the use of an active-input current mirror to reduce the impact of this parasitic capacitance. The mirror produces two copies of the photo current, one to be used for the "0" level measurement and another for the "1" level measurement. The mirrored currents are compared to two reference currents by two current comparators. Every reference current needs only one calibration at room temperature. A pattern detection block scans the incoming data for patterns of sufficiently long consecutive 0's or 1's. At the end of such a pattern a valid measurement is present at the output of one of the current comparators. Based on these measurements the digital Automatic Power Control (APC) will adjust the bias (IBIAS) and modulation current (IMOD) setting of the laser driver. Tests show that the chip can stabilize and track the launched optical power with a tolerance of less than 1 dB. In these tests the pattern detection was programmed to sample the current comparators after 5 bytes (32 ns at 1.25 Gbps) of consecutive 1's and 0's. Automatic power control on such short strings of data has not been demonstrated before. Although this laser transmitter was developed for FSAN GPON applications at a speed of 1.25 Gbps upstream, the design concept is generic and can be applied for developing a wide range of burst mode laser transmitters. This chip was developed in a 0.35 µm SiGe BiCMOS process.
Fabian J. THEIS Wakako NAKAMURA
The transformation of a data set using a second-order polynomial mapping to find statistically independent components is considered (quadratic independent component analysis or ICA). Based on overdetermined linear ICA, an algorithm together with separability conditions are given via linearization reduction. The linearization is achieved using a higher dimensional embedding defined by the linear parametrization of the monomials, which can also be applied for higher-order polynomials. The paper finishes with simulations for artificial data and natural images.
Yong-Jae KWAK So-Young PARK Joon-Ho LIM Hae-Chang RIM
In this paper, we propose a naïve probabilistic shift-reduce parsing model which can use contextual information more flexibly than the previous probabilistic GLR parsing models, and utilize the characteristics of agglutinative language in which the functional words are highly developed. Experimental results on Korean have shown that our model using the proposed contextual information improves the parsing accuracy more effectively than the previous models. Moreover, it is compact in model size, and is robust with a small training set.
In this letter, we show the effects of the chip waveform selection on the detection performance of the energy detector in DS/SS communications. Three chip waveforms such as rectangular, half-sine and raised-cosine are examined as the DS/SS chip waveform. It is demonstrated that the partial-band detection can enhance the detection performance of the energy detector approximately 50-70% compared with the full-band detection. When the chip rate is identical, the raised-cosine waveform shows lower detection probability due to its wider spreading bandwidth. However, when the spreading bandwidth is identical, the rectangular waveform shows lower detection probability due to its lower partial-band energy factor.
Yoshiki SAMESHIMA Hideaki SAISHO Kazuko OYANAGI Tsutomu MATSUMOTO
The authors present a multiparty signature generation (MSG) scheme of the Digital Signature Algorithm (FIPS 186-1). The scheme is based on a simple idea, however, it is much more convenient in usability in the real world than existing MSGs. The scheme has the following properties: (1) valid signatures are generated with odd n split private keys, (2) broadcast messages between the key holders are hidden from them, so that the n key holders do not need to process signature generation simultaneously, (3) even if up to t (= ) split keys are stolen, the adversary can get no information on the private key, (4) the scheme is as secure as the original signature algorithm against chosen message attack, and (5) the scheme is efficient in the sense that an implementation on smart card has demonstrated practical performance for interactive use with human user.
Hideyuki ITO Ryusuke KONISHI Hiroshi NAKADA Hideyuki TSUBOI Yuichi OKUYAMA Akira NAGOYA
Design points and the results seen in the development of a dynamically reconfigurable logic LSI, PCA-2, are described. PCA-2 enables the realization of flexible parallel processing based on the autonomous reconfiguration of logic circuits. To realize this feature, we introduce an asynchronous circuit design and a homogeneous cell array structure. PCA-2 represents an advance on the earlier LSI, PCA-1. Cutting edge CMOS technology is used to realize the structural merits of PCA hardware. Compared to PCA-1, PCA-2 offers 16 times greater integration level for programmable logic. Due to miniaturization and design refinement, PCA-2 provides a 6-fold increase in the circuit frequency of the configuration controller and a 3-fold increase in the operating frequency of the programmable logic. The results gained confirm the effects of refinement and the suitability of our architecture for device miniaturization.
Tsuyoki NISHIKAWA Hiroshi ABE Hiroshi SARUWATARI Kiyohiro SHIKANO Atsunobu KAMINUMA
We propose a new algorithm for overdetermined blind source separation (BSS) based on multistage independent component analysis (MSICA). To improve the separation performance, we have proposed MSICA in which frequency-domain ICA and time-domain ICA are cascaded. In the original MSICA, the specific mixing model, where the number of microphones is equal to that of sources, was assumed. However, additional microphones are required to achieve an improved separation performance under reverberant environments. This leads to alternative problems, e.g., a complication of the permutation problem. In order to solve them, we propose a new extended MSICA using subarray processing, where the number of microphones and that of sources are set to be the same in every subarray. The experimental results obtained under the real environment reveal that the separation performance of the proposed MSICA is improved as the number of microphones is increased.
This paper deals with a design problem of an adaptive robust control system for linear systems with structured uncertainties. The control law consists of a state feedback with a fixed gain designed by using the nominal system, a state feedback with an adaptive gain tuned by a parameter adjustment law and a compensation input. We show the parameter adjustment law and that sufficient conditions for the existence of the compensation input are given in terms of linear matrix inequalities (LMIs). Finally, a numerical example is included.
Chawalit BENJANGKAPRASERT Nobuaki TAKAHASHI Tsuyoshi TAKEBE
This paper proposes a new implementation of an adaptive noise canceller based upon a parallel block structure, which aims to raise the processing and convergence rates and to improve the steady-state performance. The procedure is as follows: First, an IIR bandpass filter with a variable center angular frequency using adaptive Q-factor control and two adaptive control signal generators are realized by the parallel block structure. Secondly, a new algorithm for adaptive Q-factor control with parallel block structure is proposed to improve the convergence characteristic. In addition, the steady-state performance of the filter is stabilized by using the variable step size parameter in adaptive control of the center frequency and the speed up of the convergence rate is achieved by adopting a normalized gradient algorithm for adaptive control. Finally, simulation results are given to demonstrate the convergence performance.
Tomoya TAKATANI Tsuyoki NISHIKAWA Hiroshi SARUWATARI Kiyohiro SHIKANO
We newly propose a novel blind separation framework for Single-Input Multiple-Output (SIMO)-model-based acoustic signals using an extended ICA algorithm, SIMO-ICA. The SIMO-ICA consists of multiple ICAs and a fidelity controller, and each ICA runs in parallel under the fidelity control of the entire separation system. The SIMO-ICA can separate the mixed signals, not into monaural source signals but into SIMO-model-based signals from independent sources as they are at the microphones. Thus, the separated signals of SIMO-ICA can maintain the spatial qualities of each sound source. In order to evaluate its effectiveness, separation experiments are carried out under both nonreverberant and reverberant conditions. The experimental results reveal that the signal separation performance of the proposed SIMO-ICA is the same as that of the conventional ICA-based method, and that the spatial quality of the separated sound in SIMO-ICA is remarkably superior to that of the conventional method, particularly for the fidelity of the sound reproduction.
Shoji YAMAMOTO Shuichi ICHIKAWA Hiroshi YAMAMOTO
Subgraph isomorphism problems have various important applications, while generally being NP-complete. Though Ullmann and Konishi proposed the custom circuit designs to accelerate subgraph isomorphism problem, they require many hardware resources for large problems. This study describes the design of data-dependent circuits for subgraph isomorphism problem with evaluation results on an actual FPGA platform. Data-dependent circuits are logic circuits specialized in specific input data. Such circuits are smaller and faster than the original circuit, although it is not reusable and involves circuit generation for each input. In the present study, the circuits were implemented on Xilinx XC2V3000 FPGA, and they successfully operated at a clock frequency 25 MHz. In the case of graphs with 16 vertices, the average execution time is about 7.0% of the software executed on an up-to-date microprocessor (Athlon XP 2600+ of 2.1 GHz clock). Even if the circuit generation time is included, data-dependent circuits are about 14.4 times faster than the software (for random graphs with 16 vertices). This performance advantage becomes larger for larger graphs. Two algorithms (Ullmann's and Konishi's) were examined, and the data-dependent approach was found to be equally effective for both algorithms. We also examined two types of input graph sets, and found that the data-dependent approach shows advantage in both cases.
Ryo MUKAI Hiroshi SAWADA Shoko ARAKI Shoji MAKINO
This paper describes a real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.
An analog neurochip for independent component analysis (ICA) is designed with on-line learning capability. Due to the limited dynamic range of analog device, the nonholonomic ICA algorithm is adopted. In order to accommodate the offsets due to device mismatches, a modified algorithm is developed with 2-quadrant multipliers and self-adjusting biases. Performance of the developed system was demonstrated by Monte-Carlo simulation.
Seshasayi PILLALAMARRI Sumit GHOSH
A principal attraction of ATM networks, in both wired and wireless realizations, is that the key quality of service (QoS) parameters of every call, including end-to-end delay, jitter, and loss are guaranteed by the network when appropriate cell-level traffic controls are imposed at the user network interface (UNI) on a per call basis, utilizing the peak cell rate (PCR) and the sustainable cell rate (SCR) values for the multimedia--voice, video, and data, traffic sources. There are three practical difficulties with these guarantees. First, while PCR and SCR values are, in general, difficult to obtain for traffic sources, the typical user-provided parameter is a combination of the PCR, SCR, and the maximum burstiness over the entire duration of the traffic. Second, the difficulty in accurately defining PCR arises from the requirement that the smallest time interval must be specified over which the PCR is computed which, in the limit, will approach zero or the network's resolution of time. Third, the literature does not contain any reference to a scientific principle underlying these guarantees. Under these circumstances, the issue of providing QoS guarantees in the real world, through traffic controls applied on a per call basis, is rendered uncertain. This paper adopts a radically different, high level approach to the issue of QoS guarantees. It aims at uncovering through systematic experimentation a relationship, if any exists, between the key high level user traffic characteristics and the resulting QoS measures in a realistic operational environment. It may be observed that while each user is solely interested in the QoS of his/her own traffic, the network provider cares for two factors: (1) Maximize the link utilization in the network since links constitute a significant investment, and (2) ensure the QoS guarantees for every user traffic, thereby maintaining customer satisfaction. Based on the observations, this paper proposes a two-phase strategy. Under the first phase, the average "link utilization" computed over all the links in a network is maintained within a range, specified by the underlying network provider, through high level call admission control, i.e. by limiting the volume of the incident traffic on the network, at any time. The second phase is based on the hypothesis that the number of traffic sources, their nature--audio, video, or data, and the bandwidth distribution of the source traffic, admitted subject to a specific chosen value of "link utilization" in the network, will exert a unique influence on the cumulative delay distribution at the buffers of the representative nodes and, hence, on the QoS guarantees of each call. The underlying thinking is as follows. The cumulative buffer delay distribution, at any given node and at any time instant, will clearly reflect the cumulative effect of the traffic distributions of the multiple connections that are currently active on the input links. Any bounds imposed on the cumulative buffer delay distribution at the nodes of the network will also dominate the QoS bounds of each of the constituent user traffic. Thus, for each individual traffic source, the buffer delay distributions at the nodes of the network, obtained for different traffic distributions, may serve as its QoS measure. If the hypothesis is proven true, in essence, the number of traffic sources and their bandwidth distribution will serve asa practically realizable high level traffic control in providing realistic QoS guarantees for every call. To verify the correctness of the hypothesis, an experiment is designed that consists of a representative ATM network, traffic sources that are characterized through representative and realistic user-provided parameters, and a given set of input traffic volumes appropriate for a network provider approved link utilization measure. The key source traffic parameters include the number of sources that are incident on the network and the constituent links at any given time, the bandwidth requirement of the sources, and their nature. For each call, the constituent cells are generated stochastically, utilizing the typical user-provided parameter as an estimate of the bandwidth requirement. Extensive simulations reveal that, for a given link utilization level held uniform throughout the network, while the QoS metrics--end-to-end cell delay, jitter, and loss, are superior in the presence of many calls each with low bandwidth requirement, they are significantly worse when the network carries fewer calls of very high bandwidths. The findings demonstrate the feasibility of guaranteeing QoS for each and every call through high level traffic controls. As for practicality, call durations are relatively long, ranging from ms to even minutes, thereby enabling network management to exercise realistic controls over them, even in a geographically widely dispersed ATM network. In contrast, current traffic controls that act on ATM cells at the UNI face formidable challenge from high bandwidth traffic where cell lifetimes may be extremely short, in the range of µs. The findings also underscore two additional important contributions of this paper. First, the network provider may collect data on the high level user traffic characteristics, compute the corresponding average link utilization in the network, and measure the cumulative buffer delay distributions at the nodes, in an operational network. The provider may then determine, based on all relevant criteria, a range of input and system parameters over which the network may be permitted to operate, the intersection of all of which may yield a realistic network operating point (NOP). During subsequent operation of the network, the network provider may guide and maintain the network at a desired NOP by exercising control over the input and system parameters including link utilization, call admittance based on the requested bandwidth, etc. Second, the finding constitutes a vulnerability of ATM networks which a perpetrator may exploit to launch a performance attack.
Eero AHO Jarno VANNE Kimmo KUUSILINNA Timo D. HAMALAINEN
Parallel memories increase memory bandwidth with several memory modules working in parallel and can be used to feed a processor with only necessary data. The Configurable Parallel Memory Architecture (CPMA) enables a multitude of access formats and module assignment functions to be used within a single hardware implementation, which has not been possible in prior embedded parallel memory systems. This paper focuses on address computation in CPMA, which is implemented using several configurable computation units in parallel. One unit is dedicated for each type of access formats and module assignment functions that the implementation supports. Timing and area estimates are given for a 0.25-micron CMOS process. The utilized resources are shown to be linearly proportional to the number of memory modules.
Fan CHAN Jiannong CAO Alvin T.S. CHAN Minyi GUO
Many parallel applications involve different independent tasks with their own data. Using the MPMD model, programmers can have a modular view and simplified structure of the parallel programs. Although MPI supports both SPMD and MPMD models for programming, MPI libraries do not provide an efficient way for task communication for the MPMD model. We have developed a programming environment, called ClusterGOP, for building and developing parallel applications. Based on the graph-oriented programming (GOP) model, ClusterGOP provides higher-level abstractions for message-passing parallel programming with the support of software tools for developing and running parallel applications. In this paper, we describe how ClusterGOP supports programming of MPMD parallel applications on top of MPI. We discuss the issues of implementing the MPMD model in ClusterGOP using MPI and evaluate the performance by using example applications.
Mohammad Azizur RAHMAN Shigenobu SASAKI Jie ZHOU Shogo MURAMATSU Hisakazu KIKUCHI
Performance of selective Rake (SRake) receiver is evaluated for direct sequence ultra wideband (DS-UWB) communications considering an independent Rayleigh channel having exponentially decaying power delay profile (PDP). BEP performances are shown. The results obtained are compared with similar results in a channel having flat PDP. Assumption of a flat PDP is found to predict the optimum spreading bandwidth to be lower and sub-optimum operating performance beyond optimum spreading bandwidth to be severely worse than that is achievable in a channel having exponentially decaying PDP by employing an SRake receiver having fixed number of combined paths. Optimum spreading bandwidth for SRake in a channel having exponentially decaying PDP is shown to be much larger than the one in a channel having flat PDP; that is specifically a good-news for UWB communications. Effects of partial band interference are also investigated. Interference is found to be less effective in exponentially decaying PDP.
Huabing ZHU Tony K.Y. CHAN Lizhe WANG Reginald C. JEGATHESE
This paper presents a prototype of a distributed 3D rendering system in a hierarchical Grid environment. 3D rendering with massive data sets is a computationally intensive task. In order to make full use of computational resources on Grids, a hierarchical system architecture is designed to run over multiple clusters. This architecture involves both sort-first and sort-last parallel rendering algorithms to achieve excellent scalability, rendering performance and load balance.
In this paper we apply a parallel adaptive solution algorithm to simulate nanoscale double-gate metal-oxide-semiconductor field effect transistors (MOSFETs) on a personal computer (PC)-based Linux cluster with the message passing interface (MPI) libraries. Based on a posteriori error estimation, the triangular mesh generation, the adaptive finite volume method, the monotone iterative method, and the parallel domain decomposition algorithm, a set of two-dimensional quantum correction hydrodynamic (HD) equations is solved numerically on our constructed cluster system. This parallel adaptive simulation methodology with 1-irregular mesh was successfully developed and applied to deep-submicron semiconductor device simulation in our recent work. A 10 nm n-type double-gate MOSFET is simulated with the developed parallel adaptive simulator. In terms of physical quantities and refined adaptive mesh, simulation results demonstrate very good accuracy and computational efficiency. Benchmark results, such as load-balancing, speedup, and parallel efficiency are achieved and exhibit excellent parallel performance. On a 16 nodes PC-based Linux cluster, the maximum difference among CPUs is less than 6%. A 12.8 times speedup and 80% parallel efficiency are simultaneously attained with respect to different simulation cases.
Juan PIERNAS Toni CORTES Jose M. GARCIA
DualFS is a next-generation journaling file system which has the same consistency guaranties as traditional journaling file systems but better performance. This paper introduces three new enhancements which significantly improve DualFS performance during normal operation, and presents different experimental results which compare DualFS and other traditional file systems, namely, Ext2, Ext3, XFS, JFS, and ReiserFS. The experiments carried out prove, for the first time, that a new file system design based on separation of data and metadata can significantly improve file systems' performance without requiring several storage devices.