Chin-Long WEY Ping-Chang JUI Gang-Neng SUNG
This study presents efficient algorithms for performing multiply-by-3 (3N) and divide-by-3 (N/3) operations with the additions and subtractions, respectively. No multiplications and divisions are needed. Full adder (FA) and full subtractor (FS) can be implemented to realize the N3 and N/3 operations, respectively. For fast hardware implementation, this paper introduces two basic cells UCA and UCS for 3N and N/3 operations, respectively. For 3N operation, the UCA-based ripple carry adder (RCA) and carry lookahead adder (CLA) designs are proposed and their speed performances are estimated based on the delay data of standard cell library in TSMC 0.18µm CMOS process. Results show that the 16-bit UCA-based RCA is about 3 times faster than the conventional FA-based RCA and even 25% faster than the FA-based CLA. The proposed 16-bit and 64-bit UCA-based CLAs are 62% and 36% faster than the conventional FA-based CLAs, respectively. For N/3 operations, ripple borrow subtractor (RBS) is also presented. The 16-bit UCS-based RBS is about 15.5% faster than the 16-bit FS-based RBS.
The throughput rate of Viterbi decoding (VD) is not limited by the speed of functional units when look-ahead computation techniques are used. The disadvantages of the look-ahead computation in VD are the hardware complexity and the decode latency. In this paper, implementation methods of the look-ahead ACS computation are proposed to improve the hardware efficiency and reduce the latency where the hardware efficiency and the latency can be balanced with a single parameter.
Hisashi IWAMOTO Yuji YANO Yasuto KURODA Koji YAMAMOTO Kazunari INOUE Ikuo OKA
Ternary content addressable memory (TCAM) is popular LSI for use in high-throughput forwarding engines on routers. However, the unique structure applied in TCAM consume huge amounts of power, therefore it restricts the ability to handle large lookup table capacity in IP routers. In this paper, we propose a commodity-memory based hardware architecture for the forwarding information base (FIB) application that solves the substantial problems of power and density. The proposed architecture is examined by a fabricated test chip with 40 nm embedded DRAM (eDRAM) technology, and the effect of power reduction verified is greatly lower than conventional TCAM based and the energy metric achieve 0.01 fJ/bit/search. The power consumption is almost 0.5 W at 250 Msps and 8M entries.
This paper presents an efficient approach for logarithmic and anti-logarithmic converters which can be used in the arithmetic unit of hybrid number system processors and logarithm/exponent function generators in DSP applications. By employing the novel quasi-symmetrical difference method with only the simple shift-add logic and the look-up table, the proposed approach can reduce the hardware area and improve the conversion speed significantly while achieve similar accuracy compared with the previous methods. The implementation results in both FPGA and 0.18-µm CMOS technology are also presented and discussed.
Junjie WU Jianyu YANG Yulin HUANG Haiguang YANG Lingjiang KONG
With appropriate geometry configurations, bistatic Synthetic Aperture Radar (SAR) can break through the limitations of monostatic SAR for forward-looking imaging. Thanks to such a capability, bistatic forward-looking SAR (BFSAR) has extensive potential applications. This paper develops a frequency-domain imaging algorithm for translational invariant BFSAR. The algorithm uses the method of Lengendre polynomials expansion to compute the two dimensional point target reference spectrum, and this spectrum is used to perform the range cell migration correction (RCMC), secondary range compression and azimuth compression. In particular, the Doppler-centroid and bistatic-range dependent interpolation for residual RCMC is presented in detail. In addition, a method that combines the ambiguity and resolution theories to determine the forward-looking imaging swath is also presented in this paper.
Ryuichi FUJIMOTO Mizuki MOTOYOSHI Kyoya TAKANO Minoru FUJISHIMA
The design and measured results of a 120 GHz/140 GHz dual-channel OOK (ON-OFF Keying) receiver are presented in this paper. Because a signal with very wide frequency width is difficult to process in a single-channel receiver, a dual-channel configuration with channel selection is adopted in the proposed receiver. The proposed receiver is fabricated using 65 nm CMOS technology. The measured data rate of 3.0 and 3.6 Gbps, minimum sensitivity of -25.6 and -27.1 dBm, communication distance of 0.30 and 0.38 m are achieved in the 120- and 140-GHz receiver, respectively. The correct channel selection is achieved in the 120-GHz receiver. These results indicate the possibility of the CMOS multiband receiver operating at over 100 GHz for low-power high-speed proximity wireless communication systems.
Shuang BAI Tetsuya MATSUMOTO Yoshinori TAKEUCHI Hiroaki KUDO Noboru OHNISHI
Bag of visual words is a promising approach to object categorization. However, in this framework, ambiguity exists in patch encoding by visual words, due to information loss caused by vector quantization. In this paper, we propose to incorporate patch-level contextual information into bag of visual words for reducing the ambiguity mentioned above. To achieve this goal, we construct a hierarchical codebook in which visual words in the upper hierarchy contain contextual information of visual words in the lower hierarchy. In the proposed method, from each sample point we extract patches of different scales, all of which are described by the SIFT descriptor. Then, we build the hierarchical codebook in which visual words created from coarse scale patches are put in the upper hierarchy, while visual words created from fine scale patches are put in the lower hierarchy. At the same time, by employing the corresponding relationship among these extracted patches, visual words in different hierarchies are associated with each other. After that, we design a method to assign patch pairs, whose patches are extracted from the same sample point, to the constructed codebook. Furthermore, to utilize image information effectively, we implement the proposed method based on two sets of features which are extracted through different sampling strategies and fuse them using a probabilistic approach. Finally, we evaluate the proposed method on dataset Caltech 101 and dataset Caltech 256. Experimental results demonstrate the effectiveness of the proposed method.
Se-yong RO Lin-bo LUO Jong-wha CHONG
Image warping is usually used to perform real-time geometric transformation of the images captured by the CMOS image sensor of video camera. Several existing look-up table (LUT)-based algorithms achieve real-time performance; however, the size of the LUT is still large, and it has to be stored in off-chip memory. To reduce latency and bandwidth due to the use of off-chip memory, this paper proposes an improved LUT (ILUT) scheme that compresses the LUT to the point that it can be stored in on-chip memory. First, a one-step transformation is adopted instead of using several on-line calculation stages. The memory size of the LUT is then reduced by utilizing the similarity of neighbor coordinates, as well as the symmetric characteristic of video camera images. Moreover, an elaborate pipeline hardware structure, cooperating with a novel 25-point interpolation algorithm, is proposed to accelerate the system and reduce further memory usage. The proposed system is implemented by a field-programmable gate array (FPGA)-based platform. Two different examples show that the proposed ILUT achieves real-time performance with small memory usage and low system requirements.
In this paper, a block-constrained trellis coded vector quantization (BC-TCVQ) algorithm is combined with an algebraic codebook to produce an algebraic trellis vector code (ATVC) to be used in ACELP coding. ATVC expands the set of allowed algebraic codebook pulse position, and the trellis branches are labeled with these subsets. The Viterbi algorithm is used to select the excitation codevector. A fast codebook search method using an efficient non-exhaustive search technique is also proposed to reduce the complexity of the ATVC search procedure while maintaining the quality of the reconstructed speech. The ATVC block code is used as the fixed codebook of AMR-NB (12.2 kbps), which reduces the computational complexity compared to the conventional algebraic codebook.
Junghwan KIM Minkyu PARK Sangchul HAN Jinsoo KIM
Prefix caching improves the performance of IP lookup by exploiting spatial and temporal locality of IP references. However, it cannot cache non-leaf prefixes, so several prefix expansion schemes have been proposed to handle those prefixes. Such schemes have some drawbacks to incur modification of routing table or severe miss penalty. We propose an efficient prefix expansion scheme which achieves good performance without additional burden to lookup scheme. In the proposed scheme a non-leaf prefix is expanded to the length of the longest immediate descendant prefix when it is cached. Evaluation result shows our scheme achieves very low miss ratio even though it does not increase the size of routing table and cache miss penalty.
Gang WANG Yaping LIN Rui LI Jinguo LI Xin YAO Peng LIU
High-speed IP address lookup with fast prefix update is essential for designing wire-speed packet forwarding routers. The developments of optical fiber and 100 Gbps interface technologies have placed IP address lookup as the major bottleneck of high performance networks. In this paper, we propose a novel structure named Compressed Multi-way Prefix Tree (CMPT) based on B+ tree to perform dynamic and scalable high-speed IP address lookup. Our contributions are to design a practical structure for high-speed IP address lookup suitable for both IPv4 and IPv6 addresses, and to develop efficient algorithms for dynamic prefix insertion and deletion. By investigating the relationships among routing prefixes, we arrange independent prefixes as the search indexes on internal nodes of CMPT, and by leveraging a nested prefix compression technique, we encode all the routing prefixes on the leaf nodes. For any IP address, the longest prefix matching can be made at leaf nodes without backtracking. For a forwarding table with u independent prefixes, CMPT requires O(logmu) search time and O(mlogmu) dynamic insert and delete time. Performance evaluations using real life IPv4 forwarding tables show promising gains in lookup and dynamic update speeds compared with the existing B-tree structures.
In this paper, an improved hybrid LUT-based architecture for low-error and efficient fixed-width squarer circuits is presented in which LUT-based and conventional logic circuits are employed together to achieve the good trade-off between hardware complexity and performance. By exploiting the mathematical identities and hybrid architecture, the mean error and mean squarer error of the proposed squarer are reduced by up to 40%, compared with the best previous method presented in literature. Moreover, the proposed method can improve the speed and reduce the area of the squarer circuit. The implementation and chip measurement results in 0.18-µm CMOS technology are also presented and discussed.
In this paper, we present an algorithm for reducing the transmit normalization factor by perturbing the transmit signal in a Multi-User Multiple Input Multiple Output (MU-MIMO) system which uses the channel inverse matrix as its precoding matrix. A base station must normalize unnormalized transmit signals due to the limitation of the constant transmit power. This paper defines the norm of the unnormalized transmit signal as the transmit normalization factor used to normalize the transmit signal. Recalling that the transmit normalization factor consists of a combination of the singular values from the channel inverse matrix, we provide a codebook that successively reduces the coefficients of these singular values. Through computer simulations, the proposed algorithm is compared to sphere encoding in terms of the Bit Error Rate (BER) and the outage probability in a MU-MIMO signal environment. Sphere encoding is known to be an optimal solution amongst the perturbation methods that reduce the transmit normalization factor [1]. This work demonstrates that the proposed algorithm is has very good performance, comparable to that of sphere encoding, while its computational load is nearly 200 times less. Since the codebook in our algorithm depends only on the given channel, the difference in the computational complexity becomes even greater when the channel state is not changed, because the codebook can be reused. Furthermore, the codebook exhibits the characteristic of robustness to the maximum Doppler shift.
Kazuya ZAITSU Koji YAMAMOTO Yasuto KURODA Kazunari INOUE Shingo ATA Ikuo OKA
Ternary content addressable memory (TCAM) is becoming very popular for designing high-throughput forwarding engines on routers. However, TCAM has potential problems in terms of hardware and power costs, which limits its ability to deploy large amounts of capacity in IP routers. In this paper, we propose new hardware architecture for fast forwarding engines, called fast prefix search RAM-based hardware (FPS-RAM). We designed FPS-RAM hardware with the intent of maintaining the same search performance and physical user interface as TCAM because our objective is to replace the TCAM in the market. Our RAM-based hardware architecture is completely different from that of TCAM and has dramatically reduced the costs and power consumption to 62% and 52%, respectively. We implemented FPS-RAM on an FPGA to examine its lookup operation.
Ryuichi FUJIMOTO Mizuki MOTOYOSHI Kyoya TAKANO Uroschanit YODPRASIT Minoru FUJISHIMA
The design and measured results of a 120-GHz transmitter and receiver chipset are described in this paper. A simple on-off keying (OOK) modulation is adopted for low power consumption. The proposed transmitter and receiver are fabricated using 65-nm CMOS technology. The current consumption of the transmitter and receiver are 19.2 mA and 48.2 mA respectively. A 9-Gbps PRBS is successfully transferred from the transmitter to the receiver with the bit error rate less than 10-9.
Heewan PARK Byungsik YOON Sangwon KANG Andreas SPANIAS
A new codebook mapping algorithm for artificial bandwidth extension (ABE) is introduced in this paper. We design a wideband line spectrum pair (LSP) codebook which is coupled with the same index as the LSP codebook of a narrowband speech codec. The received narrowband LSP codebook indices are used to directly induce wideband LSP codewords. Thus, the proposed scheme eliminates codebook search processing to estimate the wideband spectrum envelope. We apply the proposed scheme to bandwidth extension in adaptive multi-rate (AMR) compressed domain. Its performance is assessed via the perceptual evaluation of speech quality (PESQ), informal listening tests, and weighted million operations per second (WMOPS) calculations.
In Digital Library (DL) applications, digital book clustering is an important and urgent research task. However, it is difficult to conduct effectively because of the great length of digital books. To do the correct clustering for digital books, a novel method based on probabilistic topic model is proposed. Firstly, we build a topic model named LDAC. The main goal of LDAC topic modeling is to effectively extract topics from digital books. Subsequently, Gibbs sampling is applied for parameter inference. Once the model parameters are learned, each book is assigned to the cluster which maximizes the posterior probability. Experimental results demonstrate that our approach based on LDAC is able to achieve significant improvement as compared to the related methods.
Sungho JEON Junghyun KIM Jaekwon LEE Young-Woo SUH Jong-Soo SEO
In this paper, we propose a power amplifier linearization technique combined with iterative noise cancelation. This method alleviates the effect of added noises which prevents the predistorter (PD) from estimating the exact characteristics of the power amplifier (PA). To iteratively cancel the noise added in the feedback signal, the output signal of the power amplifier without noise is reconstructed by applying the inverse characteristics of the PD to the predistorted signals. The noise can be revealed by subtracting the reconstructed signals from the feedback signals. Simulation results based on the mean-square error (MSE) and power spectral density (PSD) criteria are presented to evaluate PD performance. The results show that the iterative noise cancelation significantly enhances the MSE performance, which leads to an improvement of the out-of-band power suppression. The performance of the proposed technique is verified by computer simulation and hardware test results.
Tianlong SONG Qing CHANG Wei QI
To improve simulation precision, the signal model of navigation satellite signal simulators is illustrated, and the generation mechanism and evaluation criteria of an important error source-phase jitter in baseband signal generation, are studied subsequently. An improved baseband signal generator based on dual-ROM look-up table structure is designed with the application of a newly-established concept-virtual sampling rate. Pre-storage of typical baseband signal data and sampling rate conversion adaptive to Doppler frequency shifts are adopted to achieve the high-precision simulation of baseband signals. Performance analysis of the proposed baseband signal generator demonstrates that it can successfully suppress phase jitter and has better spectral performance, generating high-precision baseband signals, which paves the way to improving the overall precision of navigation satellite signal simulators.
Wenchao LI Jianyu YANG Yulin HUANG Lingjiang KONG
For Doppler parameter estimation of forward-looking SAR, the third-order Doppler parameter can not be neglected. In this paper, the azimuth signal of the transmitter fixed bistatic forward-looking SAR is modeled as a cubic polynomial phase signal (CPPS) and multiple time-overlapped CPPSs, and the modified cubic phase function is presented to estimate the third-order Doppler parameter. By combining the cubic phase function (CPF) with Radon transform, the method can give an accurate estimation of the third-order Doppler parameter. Simulations validate the effectiveness of the algorithm.