Taeko MATSUNAGA Shinji KIMURA Yusuke MATSUNAGA
Multi-operand adders, which calculates the summation of more than two operands, usually consist of compressor trees which reduce the number of operands to two without any carry propagation, and a carry-propagate adder for the two operands in ASIC implementation. The former part is usually realized using full adders or (3;2) counters like Wallace-trees in ASIC, while adder trees or dedicated hardware are used in FPGA. In this paper, an approach to realize compression trees on FPGAs is proposed. In case of FPGA with m-input LUT, any counters with up to m inputs can be realized with one LUT per an output. Our approach utilizes generalized parallel counters (GPCs) with up to m inputs and synthesizes high-performance compressor trees by setting some intermediate height limits in the compression process like Dadda's multipliers. Experimental results show that the number of GPCs are reduced by up to 22% compared to the existing heuristic. Its effectivity on reduction of delay is also shown against existing approaches on Altera's Stratix III.
Yanzhao MA Hongyi WANG Guican CHEN
This paper presents a step-up/step-down DC-DC converter using a digital dither technique to achieve high efficiency and small output voltage ripple for portable electronic devices. The proposed control method minimizes not only the switching loss by operating like a pure buck or boost converter, but also the conduction loss by reducing the average inductor current even when four switches are used. Digital dither control is introduced to implement a buffer region for smooth transition between buck and boost modes. A minimum ripple dither with higher fundamental frequency is adopted to decrease the output voltage ripple. A window delay-line analog to digital converter (ADC) with delay calibration is achieved to digitalize the control voltage. The step-up/step-down DC-DC converter has been designed with a standard 0.5 µm CMOS process. The output voltage is regulated within the input voltage ranged from 2.5 V to 5.5 V, and the output voltage ripple is reduced to less than 25 mV during the mode transition. The peak power efficiency is 96%, and the maximum load current can reach 800 mA.
Takahiro MATSUMOTO Shinya MATSUFUJI Tetsuya KOJIMA Udaya PARAMPALLI
This paper presents a method of generating sets of orthogonal and zero-correlation zone (ZCZ) periodic real-valued sequences of period 2n, n ≥ 1. The sequences admit a fast correlation algorithm and the sets of sequences achieve the upper bound on family size. A periodic orthogonal sequence has the periodic autocorrelation function with zero sidelobes, and a set with orthogonal sequences whose mutual periodic crosscorrelation function at zero shift is zero. Similarly, a ZCZ set is the set of the sequences with zero-correlation zone. In this paper, we derive the real-valued periodic orthogonal sequences of period 2n from a real-valued Huffman sequence of length 2ν+1, ν being a positive integer and ν ≥ n, whose aperiodic autocorrelation function has zero sidelobes except possibly at the left and right shift-ends. The orthogonal and ZCZ sets of real-valued periodic orthogonal sequences are useful in various systems, such as synchronous code division multiple access (CDMA) systems, quasi-synchronous CDMA systems and digital watermarkings.
Hiroaki KONOURA Yukio MITSUYAMA Masanori HASHIMOTO Takao ONOYE
PMOS stress (ON) probability has a strong impact on circuit timing degradation due to NBTI effect. This paper evaluates how the granularity of stress probability calculation affects NBTI prediction using a state-of-the-art long term prediction model. Experimental evaluations show that the stress probability should be estimated at transistor level to accurately predict the increase in delay, especially when the circuit operation and/or inputs are highly biased. We then devise and evaluate two annotation methods of stress probability to gate-level timing analysis; one guarantees the pessimism desirable for timing analysis and the other aims to obtain the result close to transistor-level timing analysis. Experimental results show that gate-level timing analysis with transistor-level stress probability calculation estimates the increase in delay with 12.6% error.
Ji-Hun EO Sang-Hun KIM Young-Chan JANG
A 200 kS/s 10-bit successive approximation (SA) analog-to-digital converter (ADC) with a rail-to-rail input signal is proposed for acquiring biosignals such as EEG and MEG signals. A split-capacitor-based digital-to-analog converter (SC-DAC) is used to reduce the power consumption and chip area. The SC-DAC's linearity is improved by using dummy capacitors and a small bootstrapped analog switch with a constant on-resistance, without increasing its area. A time-domain comparator with a replica circuit for clock feed-through noise compensation is designed by using a highly differential digital architecture involving a small area. Its area is about 50% less than that of a conventional time-domain comparator. The proposed SA ADC is implemented by using a 0.18-µm 1-poly 6-metal CMOS process with a 1 V supply. The measured DNL and INL are +0.44/-0.4 LSB and +0.71/-0.62 LSB, respectively. The SNDR is 55.43 dB for a 99.01 kHz analog input signal at a sampling rate of 200 kS/s. The power consumption and core area are 5 µW and 0.126 mm2, respectively. The FoM is 47 fJ/conversion-step.
Ryusuke MIYAMOTO Hiroki SUGANO
Pedestrian detection from visual images, which is used for driver assistance or video surveillance, is a recent challenging problem. Co-occurrence histograms of oriented gradients (CoHOG) is a powerful feature descriptor for pedestrian detection and achieves the highest detection accuracy. However, its calculation cost is too large to calculate it in real-time on state-of-the-art processors. In this paper, to obtain optimal parallel implementation for an NVIDIA GPU, several kinds of parallelism of CoHOG-based detection are shown and evaluated suitability for implementation. The experimental result shows that the detection process can be performed at 16.5 fps in QVGA images on NVIDIA Tesla C1060 by optimized parallel implementation. By our evaluation, it is shown that the optimal strategy of parallel implementation for an NVIDIA GPU is different from that of FPGA. We discuss about the reason and show the advantages of each device. To show the scalability and portability of GPU implementation, the same object code is executed on other NVIDA GPUs. The experimental result shows that GTX570 can perform the CoHOG-based pedestiran detection 21.3 fps in QVGA images.
Impulsive noise interference is a significant problem for the Integrated Services Digital Broadcasting for Terrestrial (ISDB-T) receivers due to its effect on the orthogonal frequency division multiplexing (OFDM) signal. In this paper, an adaptive scheme to suppress the effect of impulsive noise is proposed. The impact of impulsive noise can be detected by using the guard band in the frequency domain; furthermore the position information of the impulsive noise, including burst duration, instantaneous power and arrived time, can be estimated as well. Then a time-domain window function with adaptive parameters, which are decided in terms of the estimated information of the impulsive noise and the carrier-to-noise ratio (CNR), is employed to suppress the impulsive interference. Simulation results confirm the validity of the proposed scheme, which improved the bit error rate (BER) performance for the ISDB-T receivers in both AWGN channel and Rayleigh fading channel.
This paper proposes new scheduling algorithms for best effort (BE) traffic classification in business femtocell networks. The purpose of traffic classification is to provide differentiated services to BE users depending on their traffic classes, and the concept of traffic classification is called Inter User Best Effort (IUBE) in CDMA2000 1x Evolution Data Optimized (EVDO) standard. Traffic differentiation is achieved by introducing Grade of Service (GoS) as a quality of service (QoS) parameter into the scheduler's decision metric (DM). New scheduling algorithms are called QoS Round Robin (QoS-RR), QoS Proportionally Fair (QoS-PF), QoS maximum data rate control (DRC) (QoS-maxDRC), QoS average DRC (QoS-aveDRC), QoS exponent DRC (QoS-expDRC), QoS maxDRC-PF (QoS-maxDRC-PF). Two different femtocell throughput experiments are performed using real femtocell devices in order to collect real DRC values. The first experiment examines 4, 8, 12 and 16 IUBE users, while second experiment examines 4 IUBE + 2 Voice over IP (VoIP), 8 IUBE + 2 VoIP, 12 IUBE + 2 VoIP, 16 IUBE + 2 (VoIP) users. Average sector throughput, IUBE traffic differentiation, VoIP delay bound error values are investigated to compare the performance of the proposed scheduling algorithms. In conclusion, QoS-maxDRC-PF scheduler is proposed for business femtocell environment.
Sayed Jalal ZAHABI Mohammadali KHOSRAVIFARD Ali A. TADAION T. Aaron GULLIVER
This letter considers the problem of detecting an offset quadrature phase shift keying (O-QPSK) modulated signal in colored Gaussian noise. The generalized likelihood ratio test (GLRT) is employed for detection. By deriving the GLRT, it is shown that the assumption of colored Gaussian noise results in a more complicated problem than with the white noise assumption that was previously examined in the literature. An efficient solution for the detection maximization problem is proposed, based on which the GLRT is implemented. Performance results are presented to illustrate the detector performance.
Bum-Soo KWON Tae-Jin JUNG Chang-Hong SHIN Kyun-Kyung LEE
A novel algorithm is presented for estimating the 3-D location (azimuth angle, elevation angle, and range) of multiple sources with a uniform circular array (UCA). Based on its centrosymmetric property, a UCA is divided into two subarrays. The steering vectors for these subarrays then yield a 2-D direction of arrival (DOA)-related rotational invariance property in the signal subspace, which enables 2-D DOA estimations using a generalized-ESPRIT algorithm. Based on the estimated 2-D DOAs, a range estimation can then be obtained for each source by defining the 1-D MUSIC spectrum. Despite its low computational complexity, the proposed algorithm can almost match the performance of the benchmark estimator 3-D MUSIC.
Kon-Woo KWON Kwang-Hyun BAEK Jeong Woo LEE
We propose a high-speed and low-complexity architecture for the very large-scale integration (VLSI) implementation of the maximum a posteriori probability (MAP) algorithm suited to the double binary turbo decoder. For this purpose, equation manipulations on the conventional Linear-Log-MAP algorithm and architectural optimization are proposed. It is shown by synthesized simulations that the proposed architecture improves speed, area and power compared with the state-of-the-art Linear-Log-MAP architecture. It is also observed that the proposed architecture shows good overall performance in terms of error correction capability as well as decoder hardware's speed, complexity and throughput.
The doubly constrained robust Capon beamformer (DCRCB), which employs a spherical uncertainty set of the steering vector together with the constant norm constraint, can provide robustness against arbitrary array imperfections. However, its performance can be greatly degraded when the uncertainty bound of the spherical set is not properly selected. In this paper, combining the DCRCB and the weight-vector-norm-constrained beamformer (WVNCB), we suggest a new robust adaptive beamforming method which allows us to overcome the performance degradation due to improper selection of the uncertainty bound. In WVNCB, its weight vector norm is limited not to be larger than a threshold. Both WVNCB and DCRCB belong to a class of diagonal loading methods. The diagonal loading range of WVNCB, which dose not consider negative loading, is extended to match that of DCRCB which can have a negative loading level as well as a positive one. In contrast to the conventional DCRCB with a fixed uncertainty bound, the bound in the proposed method varies such that the weight vector norm constraint is satisfied. Simulation results show that the proposed beamformer outperforms both DCRCB and WVNCB, being far less sensitive to the uncertainty bound than DCRCB.
Hironori UCHIKAWA Kenta KASAI Kohichi SAKANIWA
We consider spatially-coupled protograph-based LDPC codes for the three terminal erasure relay channel. It is observed that BP threshold value, the maximal erasure probability of the channel for which decoding error probability converges to zero, of spatially-coupled codes, in particular spatially-coupled MacKay-Neal code, is close to the theoretical limit for the relay channel. Empirical results suggest that spatially-coupled protograph-based LDPC codes have great potential to achieve theoretical limit of a general relay channel.
Yuanwang YANG Jingye CAI Haiyan JIN
In this letter, an improved triple-tunable frequency synthesizer structure to achieve both high frequency resolution and fast switching speed without degradation of spurious signals (spurs) level performance is proposed. According to this structure, a high performance millimeter-wave frequency synthesizer with low spurious, low phase noise, and fast switching speed, is developed. This synthesizer driven by the direct digital synthesizer (DDS) AD9956 can adjust the output of a DDS and frequency division ratios of two variable frequency dividers (VFDs) to move the spurious components outside the loop bandwidth of the phase-locked loop (PLL). Moreover, the ADF4252 based microwave PLL can further suppress the phase noise. Experimental results from the implemented synthesizer show that remarkable performance improvements have been achieved.
Yusuke KOZAWA Hiromasa HABUCHI
In this paper, N-CSK (N parallel Codes Shift Keying) using modified pseudo orthogonal M-sequence sets (MPOMSs) to realize the parallel combinatory spread spectrum (PC/SS) communication system for the optical communications is proposed. Moreover, the upper bound of data transmission rate and the bit error rate (BER) performance of this N-CSK system using the chip-level detection are evaluated through theoretical analysis by taking into account the scintillation, background-noise, avalanche photo-diode (APD) noise, thermal noise, and signal dependence noise. It is shown that the upper bound of data transmission rate of the proposed system is better than those of OOK/CDM and SIK/CDM. Moreover, the upper bound of data transmission rate of the proposed system can achieve about 1.5 [bit/chip] when the code length of MPOMS is 64 [chip].
Takafumi HAYASHI Takao MAEDA Shinya MATSUFUJI Satoshi OKAWA
The present paper introduces a novel construction of ternary sequences having a zero-correlation zone. The cross-correlation function and the side-lobe of the auto-correlation function of the proposed sequence set is zero for the phase shifts within the zero-correlation zone. The proposed sequence set consists of more than one subset having the same member size. The correlation function of the sequences of a pair of different subsets, referred to as the inter-subset correlation function, has a wider zero-correlation zone than that of the correlation function of sequences of the same subset (intra-subset correlation function). The wide inter-subset zero-correlation enables performance improvement during application of the proposed sequence set. The proposed sequence set has a zero-correlation zone for periodic, aperiodic, and odd correlation functions.
Fagen LI Jiang DENG Tsuyoshi TAKAGI
Authenticated encryption schemes are very useful for private and authenticated communication. In 2010, Rasslan and Youssef showed that the Hwang et al.'s authenticated encryption scheme is not secure by presenting a message forgery attack. However, Rasslan and Youssef did not give how to solve the security issue. In this letter, we give an improvement of the Hwang et al.'s scheme. The improved scheme not only solves the security issue of the original scheme, but also maintains its efficiency.
Fanxin ZENG Xiaoping ZENG Zhenyu ZHANG Guixin XUAN
The approximately synchronized code-division multiple-access (CDMA) communication system, using the QAM sequences with zero correlation zone (ZCZ) as its spreading sequences, not only can remove the multiple access interference (MAI) and multi-path interference (MPI) synchronously, but also has a higher transmission data rate than the one using traditional ZCZ sequences with the same sequence length. Based on Gray mapping and the known binary ZCZ sequences, in this letter, six families of 16-QAM sequences with ZCZ are presented. When the binary ZCZ sequences employed by this letter arrive at the theoretical bound on the binary ZCZ sequences, and their family size is a multiple of 4 or 2, two of the resultant six 16-QAM sequence sets satisfy the bound referred to above as well.
Akira SOGAMI Arata KAWAMURA Youji IIGUNI
We have previously proposed a howling canceller which cancels howling by using a cascade notch filter designed from a distance between a loudspeaker and a microphone. This method utilizes a pilot signal to estimate the distance. In this paper, we introduce two methods into the distance-based howling canceller to improve speech quality. The first one is an adaptive cascade notch filter which adaptively adjusts the nulls to eliminate howling and to keep speech components. The second one is a silent pilot signal whose frequencies exist in the ultrasonic band, and it is inaudible while on transmission. We implement the proposed howling canceller on a DSP to evaluate its capability. The experimental results show that the proposed howling canceller improves speech quality in comparison to the conventional one.
This work first investigates two existing check node unit (CNU) architectures for LDPC decoding: self-message-excluded CNU (SME-CNU) and two-minimum CNU (TM-CNU) architectures, and analyzes their area and timing complexities based on various realization approaches. Compared to TM-CNU architecture, SME-CNU architecture is faster in speed but with much higher complexity for comparison operations. To overcome this problem, this work proposes a novel systematic optimization algorithm for comparison operations required by SME-CNU architectures. The algorithm can automatically synthesize an optimized fast comparison operation that guarantees a shortest comparison delay time and a minimized total number of 2-input comparators. High speed is achieved by adopting parallel divide-and-conquer comparison operations, while the required comparators are minimized by developing a novel set construction algorithm that maximizes shareable comparison operations. As a result, the proposed design significantly reduces the required number of comparison operations, compared to conventional SME-CNU architectures, under the condition that both designs have the same speed performance. Besides, our preliminary hardware simulations show that the proposed design has comparable hardware complexity to low-complexity TM-CNU architectures.