1-5hit |
Luca FANUCCI Sergio SAPONARA Massimiliano MELANI Pierangelo TERRENI
With reference to video motion estimation in the framework of the new H.264/AVC video coding standard, this paper presents algorithmic and architectural solutions for the implementation of context-aware coprocessors in real-time, low-power embedded systems. A low-complexity context-aware controller is added to a conventional Full Search (FS) motion estimation engine. While the FS coprocessor is working, the context-aware controller extracts from the intermediate processing results information related to the input signal statistics in order to automatically configure the coprocessor itself in terms of search area size and number of reference frames; thus unnecessary computations and memory accesses can be avoided. The achieved complexity saving factor ranges from 2.2 to 25 depending on the input signal while keeping unaltered performance in terms of motion estimation accuracy. The increased efficiency is exploited both for (i) processing time reduction in case of software implementation on a programmable platform; (ii) power consumption reduction in case of dedicated hardware implementation in CMOS technology.
Sergio SAPONARA Pierluigi NUZZO Claudio NANI Geert VAN DER PLAS Luca FANUCCI
Time-interleaved (TI) analog-to-digital converters (ADCs) are frequently advocated as a power-efficient solution to realize the high sampling rates required in single-chip transceivers for the emerging communication schemes: ultra-wideband, fast serial links, cognitive-radio and software-defined radio. However, the combined effects of multiple distortion sources due to channel mismatches (bandwidth, offset, gain and timing) severely affect system performance and power consumption of a TI ADC and need to be accounted for since the earlier design phases. In this paper, system-level design of TI ADCs is addressed through a platform-based methodology, enabling effective investigation of different speed/resolution scenarios as well as the impact of parallelism on accuracy, yield, sampling-rate, area and power consumption. Design space exploration of a TI successive approximation ADC is performed top-down via Monte Carlo simulations, by exploiting behavioral models built bottom-up after characterizing feasible implementations of the main building blocks in a 90-nm 1-V CMOS process. As a result, two implementations of the TI ADC are proposed that are capable to provide an outstanding figure-of-merit below 0.15 pJ/conversion-step.
Nicola E. L'INSALATA Sergio SAPONARA Luca FANUCCI Pierangelo TERRENI
This work presents an FFT/IFFT core compiler particularly suited for the VLSI implementation of OFDM communication systems. The tool employs an architecture template based on the pipelined cascade principle. The generated cores support run-time programmable length and transform type selection, enabling seamless integration into multiple mode and multiple standard terminals. A distinctive feature of the tool is its accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results of the generated macrocells are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. When compared with other tools for automatic FFT core generation, the proposed environment produces macrocells with lower circuit complexity expressed as gate count and RAM/ROM bits, while keeping the same system level performance in terms of throughput, transform size and numerical accuracy.
In this letter a low-complexity and low-power realization of the 2D discrete-cosine-transform and its inverse (DCT/IDCT) is presented. A VLSI circuit based on the Chen algorithm with the distributed arithmetic approach is described. Furthermore low-power design techniques, based on clock gating and data driven switching activity reduction, are used to decrease the circuit power consumption. To this aim, input signal statistics have been extracted from H.263/MPEG verification models. Finally, circuit performance is compared to known software solutions and dedicated full-custom ones.
Luca FANUCCI Sergio SAPONARA Alexander MORELLO
Several IP cells are available in the market to implement 8051-compliant microcontroller in embedded systems. Yet they frequently lack features that have become a key point in such systems, like power optimization. This paper aims at lowering the power consumption of an 8051 IP core while keeping unaltered performances, through Register Transfer Level techniques such as clustered clock gating, operand isolation and state encoding. This approach preserves the IP high-reusability and technology independence, as it only consists of modifications to the source VHDL code. A total power reduction of about 40% is achieved, with limited area overhead.