Shotaro IWANAGA Shinji FUKUMA Shin-ichiro MORI
In this paper, a hybrid parallel implementation of inverse matrix computation using SMW formula is proposed. By aggregating the memory bandwidth in the hybrid parallel implementation, the bottleneck due to the memory bandwidth limitation in the authors previous multicore implementation has been dissolved. More than 8 times of speed up is also achieved with dual-core 8-nodes implementation which leads more than 20 simulation steps per second, or near real-time performance.
Misako KOTANI Shingo KAWAMOTO Motohiko ISAKA
Granular gain of low-dimensional lattices based on binary linear codes is estimated using a quantization algorithm which is equivalently a soft-decision decoding of the underlying code. It is shown that substantial portion of the ultimate granular gain is achieved even in limited dimensions.
Michiko INOUE Akira TAKETANI Tomokazu YONEDA Hideo FUJIWARA
Nano-scale VLSI design is facing the problems of increased test data volume. Small delay defects are becoming possible sources of test escapes, and high delay test quality and therefore a greater volume of test data are required. The increased test data volume requires more tester memory and test application time, and both result in test cost inflation. Test pattern ordering gives a practical solution to reduce test cost, where test patterns are ordered so that more defects can be detected as early as possible. In this paper, we propose a test pattern ordering method based on SDQL (Statistical Delay Quality Level), which is a measure of delay test quality considering small delay defects. Our proposed method orders test patterns so that SDQL shrinks fast, which means more delay defects can be detected as early as possible. The proposed method efficiently orders test patterns with minimal usage of time-consuming timing-aware fault simulation. Experimental results demonstrate that our method can obtain test pattern ordering within a reasonable time, and also suggest how to prepare test sets suitable as inputs of test pattern ordering.
In this paper, a new systolic array structure for the extended QR decomposition based recursive least-square (QRD-RLS) equalizer is proposed. The fact that the vectoring and rotation mode coordinate rotation digital computer (CORDIC) processors rotate in the same direction is used to show that the hardware complexity of the systolic array can be reduced. Furthermore, since the vectoring and rotation mode CORDIC processors in the proposed structure rotate simultaneously, operation time is also reduced. The performance of the proposed equalizer is analyzed by observing the flatness obtained by multiplying the frequency responses of the unknown channel with the proposed equalizer. Simulation results through hardware description language (HDL) coding and synthesis show that 23.8% of the chip implementation area can be reduced.
Xiongxin ZHAO Xiao PENG Zhixiang CHEN Dajiang ZHOU Satoshi GOTO
Structured quasi-cyclic low-density parity-check (QC-LDPC) codes have been adopted in many wireless communication standards, such as WiMAX, Wi-Fi and WPAN. To completely support the variable code rate (multi-rate) and variable code length (multi-length) implementation for universal applications, the partial-parallel layered LDPC decoder architecture is straightforward and widely used in the decoder design. In this paper, we propose a high parallel LDPC decoder architecture for WiMAX system with dedicated ASIC design. Different from the block by block decoding schedule in most partial-parallel layered architectures, all the messages within each layer are updated simultaneously in the proposed fully-parallel layered decoder architecture. Meanwhile, the message updating is separated into bit-serial style to reduce hardware complexity. A 6-bit implementation is adopted in the decoder chip, since simulations demonstrate that 6-bit quantization is the best trade-off between performance and complexity. Moreover, the two-layer concurrent processing technique is proposed to further increase the parallelism for low code rates. Implementation results show that the decoder chip saves 22.2% storage bits and only takes 2448 clock cycles per iteration for all the code rates defined in WiMAX standard. It occupies 3.36 mm2 in SMIC 65 nm CMOS process, and realizes 1056 Mbps throughput at 1.2 V, 110 MHz and 10 iterations with 115 mW power occupation, which infers a power efficiency of 10.9 pJ/bit/iteration. The power efficiency is improved 63.6% in normalized comparison with the state-of-art WiMAX LDPC decoder.
Yong WANG Jian-hua GE Jun HU Bo AI
An accurate and rapid synchronization scheme is a prerequisite for achieving high-quality multimedia transmission for wireless handheld terminals, e.g. China multimedia mobile broadcasting (CMMB) system. In this paper, an efficient orthogonal frequency division multiplexing (OFDM) timing synchronization scheme, which is robust to the doubly selective fading channel, is proposed for CMMB system. TS timing is derived by performing an inverse sliding correlation (ISC) between the segmented Sync sequences in the Beacon, which possesses the inverse conjugate symmetry (ICS) characteristic. The ISC can provide sufficient correlative gain even in the ultra low signal noise ratio (SNR) scenarios. Moreover, a fast fine symbol timing method based on the auto-correlation property of Sync sequence is also presented. According to the detection strategy for the significant channel taps, the specific information about channel profile can be obtained. The advantages of the proposed timing scheme over the traditional ones have been demonstrated through both theoretical analysis and numerical simulations.
Dong-Jun KIM Tae-Hak LEE Jun-Ho CHOI Young-Sik KIM
In this letter, a novel ultra-wideband circular quasi-fractal monopole antenna with a six-petaled lotus pattern is presented. The CPW-fed technique and quasi-fractal concept are used to achieve ultra-wideband characteristics. The size of the proposed antenna is 4250 mm2 with a lotus diameter of 19.8 mm. The proposed antenna exhibits ultra-wideband characteristics from 2.65 to 12.72 GHz, which corresponds to a fractional bandwidth of 131%. The measured radiation pattern of the proposed antenna is nearly omnidirectional.
Jungo GOTO Osamu NAKAMURA Kazunari YOKOMAKURA Yasuhiro HAMAGUCHI Shinsuke IBI Seiichi SAMPEI
This paper proposes a spectrum-overlapped resource management (SORM) technique where each user equipment (UE) can ideally obtain the frequency selection diversity gain under multi-user environments. In the SORM technique for cellular systems, under assumption of adopting a soft canceller with minimum mean square error (SC/MMSE) turbo equalizer, an evolved node B (eNB) accepts overlapped frequency resource allocation. As a result, each UE can use the frequency bins having the highest channel gain. However, the SORM becomes non-orthogonal access when the frequency bins having high channel gain for UEs are partially identical. In this case, the inter-user interference (IUI) caused by overlapping spectra among UEs is eventually canceled out by using the SC/MMSE turbo equalizer. Therefore, SORM can achieve better performance than orthogonal access e.g. FDMA when the IUI is completely canceled. This paper demonstrates that SORM has the potential to improve transmission performance, by extrinsic information transfer (EXIT) analysis. Moreover, this paper evaluates the block error rate (BLER) performance of the SORM and the FDMA. Consequently, this paper shows that the SORM outperforms the FDMA.
Takayuki NOZAKI Kenta KASAI Kohichi SAKANIWA
In this paper, we compare the decoding error rates in the error floors for non-binary low-density parity-check (LDPC) codes over general linear groups with those for non-binary LDPC codes over finite fields transmitted through the q-ary memoryless symmetric channels under belief propagation decoding. To analyze non-binary LDPC codes defined over both the general linear group GL(m, F2) and the finite field F2m, we investigate non-binary LDPC codes defined over GL(m3, F2m4). We propose a method to lower the error floors for non-binary LDPC codes. In this analysis, we see that the non-binary LDPC codes constructed by our proposed method defined over general linear group have the same decoding performance in the error floors as those defined over finite field. The non-binary LDPC codes defined over general linear group have more choices of the labels on the edges which satisfy the condition for the optimization.
Zhonghao ZHANG Chongbin XU Li PING
In this paper, we present a transmission scheme for a multiple-input multiple-output (MIMO) quasi-static fading channel with imperfect channel state information at the transmitter (CSIT). In this scheme, we develop a precoder structure to exploit the available CSIT and apply spatial coupling for further performance enhancement. We derive an analytical evaluation method based on extrinsic information transfer (EXIT) functions, which provides convenience for our precoder design. Furthermore, we observe an area property indicating that, for a spatially coupled system, the iterative receiver can perform error-free decoding even the original uncoupled system has multiple fixed points in its EXIT chart. This observation implies that spatial coupling is useful to alleviate the uncertainty in CSIT which causes difficulty in designing LDPC code based on the EXIT curve matching technique. Numerical results are presented, showing an excellent performance of the proposed scheme in MIMO fading channels with imperfect CSIT.
Permutation polynomial based interleavers over integer rings, in particular quadratic permutation polynomials have been widely studied. In this letter, higher degree permutation polynomials for interleavers are considered for interleavers and permutation polynomials superior to quadratic permutation polynomials are found for some lengths.
Jong-Kwan LEE Kyu-Man LEE JaeSung LIM
In this letter, we propose a fast dynamic slot assignment (F-DSA) protocol to reduce timeslot access delay of a newly arrived node in ad hoc networks. As there is no central coordinator, a newly arrived node needs separate negotiation with all the neighboring nodes for assigning slots to itself. Thus, it may result in network join delay and this becomes an obstacle for nodes to dynamically join and leave networks. In order to deal with this issue better, F-DSA simplifies the slot assignment process. It provides frequent opportunities to assign slots by using mini-slots to share control packets in a short time. Numerical analysis and extensive simulation show that F-DSA can significantly reduce the timeslot access delay compared with other existing slot assignment protocols. In addition, we investigate the effect of the mini-slot overhead on the performance.
Lianjun DENG Teruo KAWAMURA Hidekazu TAOKA Mamoru SAWAHASHI
This paper proposes applying intra-subframe frequency hopping (FH) to closed-loop (CL) type transmit diversity using codebook based precoding for a shared channel carrying user traffic data in discrete Fourier transform (DFT)-precoded Orthogonal Frequency Division Multiple Access (OFDMA). In the paper, we present two types of precoding schemes associated with intra-subframe FH: individual precoding vector selection between 2 slots where a 1-ms subframe comprises 2 slots among the reduced precoding codebooks, and common precoding vector selection between 2 slots. We investigate the effect of intra-subframe FH on the codebook based transmit diversity in terms of the average block error rate (BLER) performance while maintaining the same number of feedback bits required for notification of the selected precoding vector as that for the conventional CL transmit diversity without FH. Computer simulation results show that the codebook based transmit diversity with intra-subframe FH is very effective in decreasing the required average received signal-to-noise power ratio (SNR) when the fading maximum Doppler frequency, fD, is higher than approximately 50 Hz both for 2- and 4-antenna transmission in the DFT-precoded OFDMA.
Shogo MORI Gosuke OHASHI Yoshifumi SHIMODAIRA
This study examines the robustness of image quality factors in various types of environment illumination using a parameter design in the field of quality engineering. Experimental results revealed that image quality factors are influenced by environment illuminations in the following order: minimum luminance, maximum luminance and gamma.
Many learning machines such as normal mixtures and layered neural networks are not regular but singular statistical models, because the map from a parameter to a probability distribution is not one-to-one. The conventional statistical asymptotic theory can not be applied to such learning machines because the likelihood function can not be approximated by any normal distribution. Recently, new statistical theory has been established based on algebraic geometry and it was clarified that the generalization and training errors are determined by two birational invariants, the real log canonical threshold and the singular fluctuation. However, their concrete values are left unknown. In the present paper, we propose a new concept, a quasi-regular case in statistical learning theory. A quasi-regular case is not a regular case but a singular case, however, it has the same property as a regular case. In fact, we prove that, in a quasi-regular case, two birational invariants are equal to each other, resulting that the symmetry of the generalization and training errors holds. Moreover, the concrete values of two birational invariants are explicitly obtained, hence the quasi-regular case is useful to study statistical learning theory.
Yousuke SANO Yusuke OHWATARI Nobuhiko MIKI Yuta SAGAE Yukihiko OKUMURA Yasutaka OGAWA Takeo OHGANE Toshihiko NISHIMURA
This paper investigates the dominant impact on the interference rejection combining (IRC) receiver due to the downlink reference signal (RS) based covariance matrix estimation scheme. When the transmission modes using the cell-specific RS (CRS) in LTE/LTE-Advanced are assumed, the property of the non-precoded CRS is different from that of the data signals. This difference poses two problems to the IRC receiver. First, it results in different levels of accuracy for the RS based covariance matrix estimation. Second, assuming the case where the CRS from the interfering cell collides with the desired data signals of the serving cell, the IRC receiver cannot perfectly suppress this CRS interference. The results of simulations assuming two transmitter and receiver antenna branches show that the impact of the CRS-to-CRS collision among cells is greater than that for the CRS interference on the desired data signals especially in closed-loop multiple-input multiple-output (MIMO) systems, from the viewpoint of the output signal-to-interference-plus-noise power ratio (SINR). However, the IRC receiver improves the user throughput by more than 20% compared to the conventional maximal ratio combining (MRC) receiver under the simulation assumptions made in this paper even when the CRS-to-CRS collision is assumed. Furthermore, the results verify the observations made in regard to the impact of inter-cell interference of the CRS for various average received signal-to-noise power ratio (SNR) and signal-to-interference power ratio (SIR) environments.
Changwoo MIN Hyung Kook JUN Won Tae KIM Young Ik EOM
A concurrent FIFO queue is a widely used fundamental data structure for parallelizing software. In this letter, we introduce a novel concurrent FIFO queue algorithm for multicore architecture. We achieve better scalability by reducing contention among concurrent threads, and improve performance by optimizing cache-line usage. Experimental results on a server with eight cores show that our algorithm outperforms state-of-the-art algorithms by a factor of two.
Kazuyoshi SAKAMOTO Yasushi ITOH
L-band SiGe HBT frequency-tunable differential amplifiers with dual-bandpass or dual-bandstop responses have been developed for the next generation adaptive and/or reconfigurable wireless radios. Varactor-loaded dual-band resonators comprised of series and parallel LC circuits are employed in the output circuit of differential amplifiers for realizing dual-bandpass responses as well as the series feedback circuit for dual-bandstop responses. The varactor-loaded series and parallel LC resonator can provide a wider frequency separation between dual-band frequencies than the stacked LC resonator. With the use of the varactor-loaded dual-band resonator in the design of the low-noise SiGe HBT differential amplifier with dual-bandpass responses, the lower-band frequency can be varied from 0.58 to 0.77 GHz with a fixed upper-band frequency of 1.54 GHz. Meanwhile, the upper-band frequency can be varied from 1.1 to 1.5 GHz for a fixed lower-band frequency of 0.57 GHz. The dual-band gain was 6.4 to 13.3 dB over the whole frequency band. In addition, with the use of the varactor-loaded dual-band resonator in the design of the low-noise differential amplifier with dual-bandstop responses, the lower bandstop frequency can be varied from 0.38 to 0.68 GHz with an upper bandstop frequency from 1.05 to 1.12 GHz. Meanwhile, the upper bandstop frequency can be varied from 0.69 to 1.02 GHz for a lower bandstop frequency of 0.38 GHz. The maximal dual-band rejection of gain was 14.4 dB. The varactor-loaded dual-band resonator presented in this paper is expected to greatly contribute to realizing the next generation adaptive and/or reconfigurable wireless transceivers.
Kazutoshi KODAMA Tetsuya IIZUKA Toru NAKURA Kunihiro ASADA
This paper proposes a high frequency resolution Digitally-Controlled Oscillator (DCO) using a single-period control bit switching scheme. The proposed scheme controls the tuning word of DCO in a single period for the fine frequency tuning. The LC type DCO is implemented to realize the proposed scheme, and is fabricated using a standard 65 nm CMOS technology. The measurement results show that the implemented DCO improves the frequency resolution from 560 kHz to 180 kHz without phase noise degradation with an additional area of 200 µm2.
Fitzgerald Sungkyung PARK Nikolaus KLEMMER
A fractional-N phase-locked loop (PLL) is designed for the DigRF interface. The digital part of the PLL mainly consists of a dual-mode phase frequency detector (PFD), a digital counter, and a digital delta-sigma modulator (DSM). The PFD can operate on either 52 MHz or 26 MHz reference frequencies, depending on its use of only the rising edge or both the rising and the falling edges of the reference clock. The interface between the counter and the DSM is designed to give enough timing margin in terms of the signal round-trip delay. The circuitry is implemented using a 90-nm CMOS process technology with a 1.2-V supply, draining 1 mA.