Yueguang BIAN Youzheng WANG Jing WANG
In this letter, we propose a new modification to the belief propagation (BP) decoding algorithm for Finite-Geometry low-density parity-check (LDPC) codes. The modification is based on introducing feedback into the iterative process, which can break the oscillations of bit log-likelihood ratio (LLR) values. Simulations show that, with a given maximum iteration, the "feedback BP" (FBP) algorithm can achieve better performance than the conventional belief propagation algorithm.
Raghuvel S. BHUVANESWARAN Yoshiaki KATAYAMA Naohisa TAKAHASHI
Data grid consists of scattered computing and storage resources located dispersedly in the grid network. These large sized data sets are replicated in more than one site for the better availability to the other nodes in a grid. Downloading the dataset from these replicated locations have practical difficulties and we find interest in a co-allocated download framework, which enables parallel download of replicated data from multiple servers. In this paper, we proposed a dynamic co-allocation scheme for parallel data transfer in grid environment, which copes up with highly inconsistent network and server performance. The model comprises of co-allocator, monitor and control mechanisms. The scheme initially obtains the bandwidth parameter from the monitor module to fix the partition size and the data transfer tasks are allocated onto the servers in duplication. In this way, the process of data transfer can neither be interrupted nor paralyzed, even when the network link is broken or server crash. We used Globus toolkit for our framework by making use of grid information and GridFTP services. We compared our scheme with the existing schemes and the results show notable improvement in overall completion time of data transfer.
Junichi FUNASAKA Atsushi KAWANO Kenji ISHIDA
Parallel downloading retrieves different pieces of a file from different servers simultaneously and so is expected to greatly shorten file fetch times. A key requirement is that the different servers must hold the same file. We have already proposed a proxy system that can ensure file freshness and concordance. In this paper, we combine parallel downloading with the proxy server technology in order to download a file quickly and ensure that it is the latest version. Our previous paper on parallel downloading took neither the downloading order of file fragments nor the buffer space requirements into account; this paper corrects those omissions. In order to provide the user with the required file in correct order as a byte stream, the proxy server must reorder the pieces fetched from multiple servers and shuffle in the delayed blocks as soon as possible. Thus, "substitution download" is newly introduced, which requests delayed blocks from other servers to complete downloading earlier. Experiments on substitution download across the Internet clarify the tradeoff between the buffering time and the redundant traffic generated by duplicate requests to multiple servers. As a result, the pseudo-optimum balance is discovered and our method is shown both not to increase downloading time and to limit the buffer space. This network software can be applied to download files smoothly absorbing the difference in performance characteristics among heterogeneous networks.
Seongeun EOM Vladimir SHIN Byungha AHN
The watershed transform has been used as a powerful morphological segmentation tool in a variety of image processing applications. This is because it gives a good segmentation result if a topographical relief and markers are suitably chosen for different type of images. This paper proposes a parallel implementation of the watershed transform on the cellular neural network (CNN) universal machine, called cellular watersheds. Owing to its fine grain architecture, the watershed transform can be parallelized using local information. Our parallel implementation is based on a simulated immersion process. To evaluate our implementation, we have experimented on the CNN universal chip, ACE16k, for synthetic and real images.
Recently, research on parallel processing systems is very active, and many complex topologies have been proposed. A burnt pancake graph is one such topology. In this paper, we prove that a faulty burnt pancake graph with degree n has a fault-free Hamiltonian cycle if the number of the faulty elements is n-2 or less, and it has a fault-free Hamiltonian path between any pair of nonfaulty nodes if the number of the faulty elements is n-3 or less.
Ryusuke MIYAMOTO Jumpei ASHIDA Hiroshi TSUTSUI Yukihiro NAKAMURA
A novel pedestrian tracking scheme based on a particle filter is proposed, which adopts a skeleton model of a pedestrian for a state space model and distance transformed images for likelihood computation. The 6-stick skeleton model used in the proposed approach is very distinctive in representing a pedestrian simply but effectively. By the experiment using the real sequences provided by PETS, it is shown that the target pedestrian is tracked adequately by the proposed approach with a simple silhouette extraction method which consists of only background subtraction, even if the tracking target moves so complicatedly and is often so cluttered by other obstacles that the pedestrian can not be tracked by the conventional methods. Moreover, it is demonstrated that the proposed scheme can track the multiple targets in the complex case that their trajectories intersect.
Cheol-Ho SHIN Sangsung CHOI Hanho LEE Jeong-Ki PACK
This paper investigates a design and performance of 4-parallel MB-OFDM UWB receiver. The performance of the proposed MB-OFDM UWB receiver using a 4-parallel synchronization structure is degraded by 0.25 dB compared with that of a receiver using a 1-parallel synchronization structure in the maximum frequency/sampling clock offset tolerance in an AWGN channel. Considering other impairments, including imperfect synchronization algorithms, the effect of quantization error by the 4-parallel synchronization structure is negligible in a multi-path channel environment as well as in an AWGN channel, as identified in simulation results.
Yukinori SATO Ken-ichi SUZUKI Tadao NAKAMURA
High power consumption and slow access of enlarged and multiported register files make it difficult to design high performance superscalar processors. The clustered architecture, where the conventional monolithic register file is partitioned into several smaller register files, is expect to overcome the register file issues. In the clustered architecture, the more a monolithic register file is partitioned, the lower power and faster access register files can be realized. However, the partitioning causes losses of IPC (instructions per clock cycle) due to communication among register files. Therefore, degree of partitioning has a strong impact on the trade-off between power consumption and performance. In addition, the organization of partitioned register files also affects the trade-off. In this paper, we attempt to investigate appropriate degrees of partitioning and organizations of partitioned register files in a clustered architecture to assess the trade-off. From the results of execute-driven simulation, we find that the organization of register files and the degree of partitioning have a strong impact on the IPC, and the configuration with non-consistent register files can make use of the partitioned resources more effectively. From the results of register file access time and energy modeling, we find that the configurations with the highly partitioned non-consistent register file organization can receive benefit of the partitioning in terms of operating frequency and access energy of register files. Further, we examine relationship between IPS (instructions per second) and the product of IPC and operating frequency of register files. The results suggest that highly partitioned non-consistent configurations tends to gain more advantage in performance and power.
Satoshi GOUNAI Tomoaki OHTSUKI Toshinobu KANEKO
Irregular LDPC codes can achieve better error rate performance than regular LDPC codes. However, irregular LDPC codes have higher error floors than regular LDPC codes. The Ordered Statistic Decoding (OSD) algorithm achieves approximate Maximum Likelihood (ML) decoding. ML decoding is effective to lower error floors. However, the OSD estimates satisfy the parity check equation of the LDPC code even the estimates are wrong. Hybrid decoder combining LLR-BP decoding algorithm and the OSD algorithm cannot also lower error floors, because wrong estimates also satisfy the LDPC parity check equation. We proposed the concatenated code constructed with an inner irregular LDPC code and an outer Cyclic Redundancy Check (CRC). Owing to CRC, we can detect wrong codewords from OSD estimates. Our CRC-LDPC code with hybrid decoder can lower error floors in an AWGN channel. In wireless communications, we cannot neglect the effects of the channel. The OSD algorithm needs the ordering of each bit based on the reliability. The Channel State Information (CSI) is used for deciding reliability of each bit. In this paper, we evaluate the Block Error Rate (BLER) of the CRC-LDPC code with hybrid decoder in a fast fading channel with perfect and imperfect CSIs where 'imperfect CSI' means that the distribution of channel and those statistical average of the fading amplitudes are known at the receiver. By computer simulation, we show that the CRC-LDPC code with hybrid decoder can lower error floors than the conventional LDPC code with hybrid decoder in the fast fading channel with perfect and imperfect CSIs. We also show that combining error detection with the OSD algorithm is effective not only for lowering the error floor but also for reducing computational complexity of the OSD algorithm.
Tohru TAINO Tomohiro NISHIHARA Koichi HOSHINO Hiroaki MYOREN Hiromi SATO Hirohiko M. SHIMIZU Susumu TAKADA
A normal-distribution-function-shaped superconducting tunnel junction (NDF-STJ) which consists of Nb/Al-AlOx/Al/Nb has been fabricated as an X-ray detector. Current - voltage characteristics were measured at 0.4 K using three kinds of STJs, which have the dispersion parameters σ of 0.25, 0.45 and 0.75. These STJs showed very low subgap leakage current of about 5 nA. By irradiating with 5.9 keV X-rays, we obtained the spectrum of these NDF-STJs. They showed good energy resolution with small magnetic fields of below 3 mT, which is about one-tenth of those for conventional-shaped STJs.
Jiaqiang LI Ronghong JIN JunPing GENG Yu FAN Wei MAO
In this paper, Integration of Fractional Gaussian Window transform (IFRGWT) is proposed for the parameter estimation of linear FM (LFM) signal; the proposal is based on the integration of the Fractional Fourier transform modified by Gaussian Window. The peak values can be detected by adjusting the standard deviation of Gaussian function and locating the optimal rotated angles. And also the parameters of the signal can be estimated well. As an application, detection and parameter estimation of multiple LFM signals are investigated in low signal-to-noise ratios (SNRs). The analytic results and simulations clearly demonstrate that the method is effective.
Zongsheng ZHANG Go HASEGAWA Masayuki MURATA
Parallel TCP is one possible approach to increasing throughput of data transfer in Long Fat Networks (LFNs). Using parallel TCP is something of black art. As high-speed transport-layer protocols appear, e.g. HSTCP, it is necessary to reinvestigate the performance of parallel TCP, because a choice has to be make among them for the system. In this paper, the performance of parallel TCP is evaluated by mathematical analysis based on a simple dumbbell topology. Packet drop rate and aggregate goodput are used as two metrics to characterize the performance of parallel TCP. Two cases, namely synchronization and non-synchronization, are analyzed in detail when DropTail is deployed on routers. The synchronization case is common in using parallel TCP, but the goodput deteriorates seriously. The non-synchronization case may benefit parallel TCP, but extra mechanisms are required, and it is not easy to implement in the real world. The problem also remains even if Random Early Detection (RED) queue management is employed on routers. The analysis results show the difficulty in using parallel TCP in practice.
Shunsuke OKURA Tetsuro OKURA Bogoda A. INDIKA U.K. Kenji TANIGUCHI
This paper describes the design of a random access memory (RAM) bank with a 0.35-µm CMOS process for column-parallel analog/digital converters (ADC) utilized in CMOS imagers. A dynamic latch is utilized that expends neither input DC nor drain current during the monitoring phase. Accuracy analysis of analog/digital conversion error in the RAM bank is discussed to ensure low power consumption of a counter buffer circuit. Moreover, the counter buffer utilizes a combination of NMOS and CMOS buffers to reduce power consumption. Total power consumption of a 10-bit 800-column 40 MHz RAM bank is 2.9 mA for use in an imager.
Takehiro ITO Kazuya GOTO Xiao ZHOU Takao NISHIZEKI
Assume that each vertex of a graph G is assigned a constant number q of nonnegative integer weights, and that q pairs of nonnegative integers li and ui, 1 ≤ i ≤ q, are given. One wishes to partition G into connected components by deleting edges from G so that the total i-th weights of all vertices in each component is at least li and at most ui for each index i, 1 ≤ i ≤ q. The problem of finding such a "uniform" partition is NP-hard for series-parallel graphs, and is strongly NP-hard for general graphs even for q = 1. In this paper we show that the problem and many variants can be solved in pseudo-polynomial time for series-parallel graphs and partial k-trees, that is, graphs with bounded tree-width.
Takeshi UENO Takafumi YAMAJI Tetsuro ITAKURA
This paper describes a 1.2-V, 12-bit, 200-MSample/s current-steering CMOS digital-to-analog (D/A) converter for wireless-communication terminals. To our knowledge, the supply voltage of this converter is the lowest for high-speed applications. To overcome increasing device mismatch in low-voltage operation, we propose an H-shaped, 3-dimensional structure for reducing influence of voltage drops (IR drops) along power supplies. This technique relaxes mismatch requirements and allows use of small devices with small parasitics. By using this technique, a low-voltage, high-speed D/A converter was realized. The converter was implemented in a 90-nm CMOS technology. The modulator achieves the intrinsic accuracy of 12 bits and a spurious-free dynamic range (SFDR) above 55 dB over a 100-MHz bandwidth.
Yun WU Hanwen LUO Ming DING Renmao LIU Haibin ZHANG
In this letter,we design a special preamble composed of two OFDM training blocks with different numbers of identical parts. Based on the designed preamble, we propose a method to estimate frequency offset utilizing initial estimates from the two OFDM training symbols. By elaborately selecting the numbers of identical parts for the two training blocks, the proposed estimator provides a much larger estimate range than conventional estimators using identical parts. Computer simulations show that the proposed estimator exhibits superior estimate performance, while maintaining low computational complexity.
Kazuhiro FUJITA Hideki KAWAGUCHI Shusuke NISHIYAMA Satoshi TOMIOKA Takeaki ENOTO Igor ZAGORODNOV Thomas WEILAND
Authors have been working in particle accelerator wake field analysis by using the Time Domain Boundary Element Method (TDBEM). A stable TDBEM scheme was presented and good agreements with conventional wake field analysis of the FDTD method were obtained. On the other hand, the TDBEM scheme still contains difficulty of initial value setting on interior region problems for infinitely long accelerator beam pipe. To avoid this initial value setting, we adopted a numerical model of beam pipes with finite length and wall thickness on open scattering problems. But the use of such finite beam pipe models causes another problem of unwanted scattering fields at the beam pipe edge, and leads to the involvement of interior resonant solutions. This paper presents a modified TDBEM scheme, Scattered-field Time Domain Boundary Element Method (S-TDBEM) to treat the infinitely long beam pipe on interior region problems. It is shown that the S-TDBEM is able to avoid the excitation of the edge scattering fields and the involvement of numerical instabilities caused by interior resonance, which occur in the conventional TDBEM.
Takayuki WATANABE Yuichi TANJI Hidemasa KUBOTA Hideki ASAI
This paper presents a fast transient simulation method for power distribution networks (PDNs) of the PCB/Package. Because these PDNs are modeled as large-scale linear circuits consisting of a large number of RLC elements, it takes large costs to solve by conventional circuit simulators, such as SPICE. Our simulation method is based on the leapfrog algorithm, and can solve RLC circuits of PDNs faster than SPICE. Actual PDNs have frequency-dependent dispersions such as the skin-effect of conductors and the dielectric loss. To model these dispersions, more number of RLC elements are required, and circuit structures of these dispersion models are hard to solve by using the leapfrog algorithm. This paper shows that the circuit structures of dispersion models can be converted to suitable structures for the leapfrog algorithm. Further, in order to reduce the simulation time, our proposed method exploits parallel computation techniques. Numerical results show that our proposed method using single processing element (PE) enables a speedup of 20-100 times and 10 times compared to HSPICE and INDUCTWISE with the same level of accuracy, respectively. In a large-scale example with frequency-dependent dispersions, our method achieves over 94% parallel efficiency with 5PEs.
Joong Hyung KWON Duho RHEE Younghoon WHANG Kwang Soon KIM
In this paper, we investigate an efficient user selection and sub-band allocation algorithm in which each user transmits two-step partial CQI to reduce the amount of feedback in multi-user downlink OFDMA systems. Simulation results show that we can greatly reduce the feedback rate at the expense of negligible performance degradation compared to the full CQI feedback schemes or that we can greatly improve the performance with slightly reduced feedback rate compared to conventional partial CQI feedback schemes.
Chester SHU Ka-Lun LEE Mable P. FOK
We report the generation of time- and wavelength-interleaved optical pulses using the principle of sub-harmonic pulse gating in a dispersion-managed fiber cavity. The pulsed source has been applied to the processing of electrical and optical signals including analog-to-digital conversion, wavelength multicast, and serial-to-parallel optical data conversion.