Sha SHEN Weiwei SHEN Yibo FAN Xiaoyang ZENG
This paper describes a unified VLSI architecture which can be applied to various types of transforms used in MPEG-2/4, H.264, VC-1, AVS and the emerging new video coding standard named HEVC (High Efficiency Video Coding). A novel design named configurable butterfly array (CBA) is also proposed to support both the forward transform and the inverse transform in this unified architecture. Hadamard transform or 4/8-point DCT/IDCT are used in traditional video coding standards while 16/32-point DCT/IDCT are newly introduced in HEVC. The proposed architecture can support all these transform types in a unified architecture. Two levels (architecture level and block level) of hardware sharing are adopted in this design. In the architecture level, the forward transform can share the hardware resource with the inverse transform. In the block level, the hardware for smaller size transform can be recursively reused by larger size transform. The multiplications of 4 or 8-point transform are implemented with Multiplierless MCM (Multiple Constant Multiplication). In order to reduce the hardware overhead, the multiplications of 16/32 point DCT are implemented with ICM (input-muxed constant multipliers) instead of MCM or regular multipliers. The proposed design is 51% more area efficient than previous work. To the author's knowledge, this is the first published work to support both forward and inverse 4/8/16/32-point integer transform for HEVC standard in a unified architecture.
Hamid JABBAR Sungju LEE Kyeon HUR Taikyeong JEONG
For a development of energy harvesting system, the fact of radio waves and ambient RF (Radio Frequency) sources, including passive devices along with novel circuits, are very closely related to mobile charging devices and energy storage system. The use of schottky diode and voltage multiplier circuits to express on the ambient RF sources surrounding the system is one way that has seen a sudden rise in use for energy harvesting. Practically speaking, the RF and ambient sources can be provided by active and passive devices such as inductors, capacitors, diode, etc. In this paper, we present a schottky based voltage multiplier circuits for mobile charging device which integrate the power generation module with radio wave generation module. We also discuss that multi-stage schematic, e.g., three-stage schottky diode based voltage multiplier circuits, for a continuing effort on energy harvesting system.
Sun-Mi PARK Ku-Young CHANG Dowon HONG Changho SEO
In this paper, we derive a fast polynomial basis multiplier for GF(2m) defined by pentanomials xm+xk3+xk2+xk1+1 with 1 ≤ k1 < k2 < k3 ≤ m/2 using the presented method by Park and Chang. The proposed multiplier has the time delay TA+(2+⌈log2(m-1)⌉) TX or TA+(3+⌈log2(m-1)⌉) TX which is the lowest one compared with known multipliers for pentanomials except for special types, where TA and TX denote the delays of one AND gate and one XOR gate, respectively. On the other hand, its space complexity is very slightly greater than the best known results.
Shunsuke ONO Takamichi MIYATA Isao YAMADA Katsunori YAMAOKA
Solving image recovery problems requires the use of some efficient regularizations based on a priori information with respect to the unknown original image. Naturally, we can assume that an image is modeled as the sum of smooth, edge, and texture components. To obtain a high quality recovered image, appropriate regularizations for each individual component are required. In this paper, we propose a novel image recovery technique which performs decomposition and recovery simultaneously. We formulate image recovery as a nonsmooth convex optimization problem and design an iterative scheme based on the alternating direction method of multipliers (ADMM) for approximating its global minimizer efficiently. Experimental results reveal that the proposed image recovery technique outperforms a state-of-the-art method.
Sangyeop LEE Norifumi KANEMARU Sho IKEDA Tatsuya KAMIMURA Satoru TANOI Hiroyuki ITO Noboru ISHIHARA Kazuya MASU
This paper proposes a low-phase-noise ring-VCO-based frequency multiplier with a new subharmonic direct injection locking technique that only uses a time-delay cell and four MOS transistors. Since the proposed technique behaves as an exclusive OR and can double the reference signal frequency, it increases phase correction points and achieves low phase noise characteristic across the wide output frequency range. The frequency multiplier was fabricated by using 65 nm Si CMOS process. Measured 1-MHz-offset phase noise at 6.34 GHz with reference signals of 528 MHz was -119 dBc/Hz.
Chong JIN Dimitris PAVLIDIS Laurence CONSIDINE
The design, fabrication and characterization of GaN based varactor diodes are presented. MOCVD was used for layer growth and the DC characteristic of 4 µm diameter diodes showed a turn-on voltage of 0.5 V, a breakdown voltage of 21 V and a modulation ratio of 1.63. High frequency characterization allowed obtaining the diode equivalent circuit and observed the bias dependence of the series resistance. The diode cutoff frequency was 900 GHz. A large-signal model was developed for the diode and the device power performance was evaluated. A power of 7.2 dBm with an efficiency of 16.6% was predicted for 47 GHz to 94 GHz doubling.
High performance, low area multipliers are highly desired for modern and future DSP systems due to the increasing demand of high speed DSP applications. In this paper, we present an efficient architecture for an LUT-based truncated multiplier and its application in RGB to YCbCr color space conversion which can be used for digital TV, image and video processing systems. By employing an improved split LUT-based architecture and LUT optimization method, the proposed multiplier can reduce the value of area-delay product by up to 52% compared with other constant multiplier methods. The FPGA implementation of a color space conversion application employing the proposed multiplier also results in significant reduction of area-delay product of up to 48%.
Hong-Yi HUANG Shiun-Dian JAN Yang CHOU Cheng-Yu CHEN
The charge-redistribution low-swing differential logic (CLDL) circuits are presented in this work. It can implement a complex function in a single gate. The CLDL circuits utilizes the charge-redistribution and reduced-swing schemes to reduce the power dissipation and enhance the operation speed. In addition, a pipeline structure is formed by a series connection structure controlled by a true-single-phase clock, thereby achieving high-speed operation. The CLDL circuits perform more than 25% speedup and 31% in power-delay product compared to other differential circuits with true-single-phase clock. A pipelined multiplier-accumulator (MAC) using CLDL structure is fabricated in 0.35 µm single-poly four-metal CMOS process. The test chip is successfully verified to operate at 900-MHz.
Fahad QURESHI Oscar GUSTAFSSON
In this work we consider optimized twiddle factor multipliers based on shift-and-add-multiplication. We propose a low-complexity structure for twiddle factors with a resolution of 32 points. Furthermore, we propose a slightly modified version of a previously reported multiplier for a resolution of 16 points with lower round-off noise. For completeness we also include results on optimal coefficients for eight-points resolution. We perform finite word length analysis for both coefficients and round-off errors and derive optimized coefficients with minimum complexity for varying requirements.
Li-Rong WANG Ming-Hsien TU Shyh-Jye JOU Chung-Len LEE
This paper presents a well-structured modified Booth encoding (MBE) multiplier which is applied in the design of a reconfigurable multiply-accumulator (MAC) core. The multiplier adopts an improved Booth encoder and selector to achieve an extra-row-removal and uses a hybrid approach in the two's complementation circuit to reduce the area and improve the speed. The multiplier is used to form a 32-bit reconfigurable MAC core which can be flexibly configured to execute one 3232, two 1616 or four 88 signed multiply-accumulation. Experimentally, when implemented with a 130 nm CMOS single-Vt standard cell library, the multiplier achieved a 15.8% area saving and 11.7% power saving over the classical design, and the reconfigurable MAC achieved a 4.2% area and a 7.4% power saving over the MAC design published so far if implemented with a mixed-Vt standard cell library.
In this paper, a new DA (Distributed Arithmetic) filter implementation technique is presented. Contrary to the conventional DA technique using ROM, the proposed implementation technique avoids the use of ROM since it does not need the combinations of filter coefficients. Furthermore, by using the Radix-16 modified Booth algorithm, implementation complexity of the proposed structure can be reduced. Through the HDL coding and synthesis, it was shown that 41.6% of implementation area can be reduced.
Yimeng ZHANG Leona OKAMURA Tsutomu YOSHIHARA
A novel charge-recovery logic structure called Pulse Boost Logic (PBL) is proposed in this paper. PBL is a high-speed low-energy-dissipation charge-recovery logic with dual-rail evaluation tree structure. It is driven by 2-phase non-overlap clock, and requires no DC power supply. PBL belongs to boost logic family, which includes boost logic, enhanced boost logic and subthreshold boost logic. In this paper, PBL has been compared with other charge-recovery logic technologies. To demonstrate the performance of PBL structure, a 4-bit pipeline multiplier is designed and fabricated with 0.18 µm CMOS process technology. The simulation results indicate that the 4-bit multiplier can work at a frequency of 1.8 GHz, while the measurement of test chip is at operation frequency of 161 MHz, and the power dissipation at 161 MHz is 772 µW.
Ryosuke NAKAMOTO Sakae SAKURABA Alexandre MARTINS Takeshi ONOMI Shigeo SATO Koji NAKAJIMA
We have designed and implemented a 4-bit Carry Look-ahead Adder (CLA) and 4-bit parallel multipliers to be used for the Fast Fourier Transform (FFT) system with the estimated clock frequency of 20 GHz. Through some high frequency functional tests, we have confirmed that the operation of the CLA has been successful. Through some low speed tests, we have also confirmed that the operation of multiplication has been successful. In addition, we have designed a 4-bit multiplier with a Booth encoder and with a 2-point-4-bit butterfly circuit.
We investigate binary sequence pairs with two-level correlation in terms of their corresponding cyclic difference pairs (CDPs). We define multipliers of a cyclic difference pair and present an existence theorem for multipliers, which could be applied to check the existence/nonexistence of certain hypothetical cyclic difference pairs. Then, we focus on the ideal case where all the out-of-phase correlation coefficients are zero. It is known that such an ideal binary sequence pair exists for length υ = 4u for every u ≥ 1. Using the techniques developed here on the theory of multipliers of a CDP and some exhaustive search, we are able to determine that, for lengths υ ≤ 30, (1) there does not exist "any other" ideal/ binary sequence pair and (2) every example in this range is equivalent to the one of length υ = 4u above. We conjecture that if there is a binary sequence pair with an ideal two-level correlation then its in-phase correlation must be 4. This implies so called the circulant Hadamard matrix conjecture.
Nobutaka KITO Kensuke HANAI Naofumi TAKAGI
A C-testable 4-2 adder tree for an easily testable high-speed multiplier is proposed, and a recursive method for test generation is shown. By using the specific patterns that we call 'alternately inverted patterns,' the adder tree, as well as partial product generators, can be tested with 14 patterns regardless of its operand size under the cell fault model. The test patterns are easily fed through the partial product generators. The hardware overhead of the 4-2 adder tree with partial product generators for a 64-bit multiplier is about 15%. By using a previously proposed easily testable adder as the final adder, we can obtain an easily testable high-speed multiplier.
Werner PROST Dudu ZHANG Benjamin MUNSTERMANN Tobias FELDENGUT Ralf GEITMANN Artur POLOCZEK Franz-Josef TEGUDE
A unipolar n-n heterostrucuture diode is developed in the InP material system. The electronic barrier is formed by a saw tooth type of conduction band bending which consists of a quaternary In0.52(AlyGa1-y)0.48As layer with 0 < y < ymax. This barrier is lattice matched for all y to InP and is embedded between two n+-InGaAs layers. By varying the maximum Al-content from ymax,1 = 0.7 to ymax,2 = 1 a variable barrier height is formed which enables a diode-type I-V characteristic by epitaxial design with an adjustable current density within 3 orders of magnitude. The high current density of the diode with the lower barrier height (ymax,1 = 0.7) makes it suitable for high frequency applications at low signal levels. RF measurements reveal a speed index of 52 ps/V at VD = 0.15 V. The device is investigated for RF-to-DC power conversion in UHF RFID transponders with low-amplitude RF signals.
Shuijiong WU Peilin LIU Yiqing HUANG Qin LIU Takeshi IKENAGA
H.264/AVC encoder employs rate control to adaptively adjust quantization parameter (QP) to enable coded video to be transmitted over a constant bit-rate (CBR) channel. In this topic, bit allocation is crucial since it is directly related with actual bit generation and the coding quality. Meanwhile, the rate-distortion-optimization (RDO) based mode-decision technique also affects performance a lot for the strong relation among mode, bits, and quality. This paper presents a multi-stage rate control scheme for R-D optimized H.264/AVC encoders under CBR video transmission. To enhance the precision of the complexity estimation and bit allocation, a frequency-domain parameter named mean-absolute-transform-difference (MATD) is adopted to represent frame and macroblock (MB) residual complexity. Second, the MATD ratio is utilized to enhance the accuracy of frame layer bit prediction. Then, by considering the bit usage status of whole sequence, a measurement combining forward and backward bit analysis is proposed to adjust the Lagrange multiplier λMODE on frame layer to optimize the mode decision for all MBs within the current frame. On the next stage, bits are allocated on MB layer by proposed remaining complexity analysis. Computed QP is further adjusted according to predicted MB texture bits. Simulation results show the PSNR improvement is up to 1.13 dB by using our algorithm, and the stress of output buffer control is also largely released compared with the recommended rate control in H.264/AVC reference software JM13.2.
Yong-Eun KIM Kyung-Ju CHO Jin-Gyun CHUNG Xinming HUANG
This paper presents an error compensation method for fixed-width group canonic signed digit (GCSD) multipliers that receive a W-bit input and generate a W-bit product. To efficiently compensate for the truncation error, the encoded signals from the GCSD multiplier are used for the generation of the error compensation bias. By Synopsys simulations, it is shown that the proposed method leads to up to 84% reduction in power consumption and up to 78% reduction in area compared with the fixed-width modified Booth multipliers.
Yong-Eun KIM Kyung-Ju CHO Jin-Gyun CHUNG Xinming HUANG
An efficient multiplier design method for predetermined coefficient groups is presented based on the variation of canonic signed digit (CSD) encoding and partial product sharing. By applications to radix-24 FFT structure and the pulse-shaping filter design used in CDMA, it is shown that the proposed method significantly reduces the area, propagation delay and power consumption compared with previous methods.
Zhangcai HUANG Minglu JIANG Yasuaki INOUE
Analog multipliers are one of the most important building blocks in analog signal processing circuits. The performance with high linearity and wide input range is usually required for analog four-quadrant multipliers in most applications. Therefore, a highly linear and wide input range four-quadrant CMOS analog multiplier using active feedback is proposed in this paper. Firstly, a novel configuration of four-quadrant multiplier cell is presented. Its input dynamic range and linearity are improved significantly by adding two resistors compared with the conventional structure. Then based on the proposed multiplier cell configuration, a four-quadrant CMOS analog multiplier with active feedback technique is implemented by two operational amplifiers. Because of both the proposed multiplier cell and active feedback technique, the proposed multiplier achieves a much wider input range with higher linearity than conventional structures. The proposed multiplier was fabricated by a 0.6 µm CMOS process. Experimental results show that the input range of the proposed multiplier can be up to 5.6Vpp with 0.159% linearity error on VX and 4.8Vpp with 0.51% linearity error on VY for 2.5V power supply voltages, respectively.