1-5hit |
Yasuhiro TAKEI Hasitha Muthumala WAIDYASOORIYA Masanori HARIYAMA Michitaka KAMEYAMA
Heterogeneous multi-core architectures with CPUs and accelerators attract many attentions since they can achieve power-efficient computing in various areas from low-power embedded processing to high-performance computing. Since the optimal architecture is different from application to application, finding the most suitable accelerator is very important. In this paper, we propose an FPGA-based heterogeneous multi-core platform with custom accelerators for power-efficient computing. Using the proposed platform, we evaluate several applications and accelerators to identify many key requirements of the applications and properties of the accelerators. Such an evaluation is very important to select and optimize the most suitable accelerator according to the requirements of an application to achieve the best performance.
Hongwei ZHU Ilie I. LUICAN Florin BALASA Dhiraj K. PRADHAN
In real-time data-dominated communication and multimedia processing applications, a multi-layer memory hierarchy is typically used to enhance the system performance and also to reduce the energy consumption. Savings of dynamic energy can be obtained by accessing frequently used data from smaller on-chip memories rather than from large background memories. This paper focuses on the reduction of the dynamic energy consumption in the memory subsystem of multidimensional signal processing systems, starting from the high-level algorithmic specification of the application. The paper presents a formal model which identifies those parts of arrays more intensely accessed, taking also into account the relative lifetimes of the signals. Tested on a two-layer memory hierarchy, this model led to savings of dynamic energy from 40% to over 70% relative to the energy used in the case of flat memory designs.
Hongwei ZHU Ilie I. LUICAN Florin BALASA
In real-time multimedia processing systems a very large part of the power consumption is due to the data storage and data transfer. Moreover, the area cost is often largely dominated by the memory modules. In deriving an optimized (for area and/or power) memory architecture, memory size computation is an important step in the exploration of the possible algorithmic specifications of multimedia applications. This paper presents a novel non-scalar approach for computing exactly the memory size in real-time multimedia algorithms. This methodology uses both algebraic techniques specific to the data-flow analysis used in modern compilers and, also, more recent advances in the theory of polyhedra. In contrast with all the previous works which are only estimation methods, this approach performs exact memory computations even for applications significantly large in terms of the code size, number of scalars, and number of array references.
Hiroe IWASAKI Jiro NAGANUMA Makoto ENDO Takeshi OGURA
This paper proposes a very small on-chip multimedia real-time OS for embedded system LSIs, and demonstrates its usefulness on MPEG-2 multimedia applications. The real-time OS, which has a conditional cyclic task with suspend and resume for interacting hardware (HW) / software (SW) of embedded system LSIs, implements the minimum set of task, interrupt, and semaphore managements on the basis of an analysis of embedded software requirements. It requires only about 2.5 Kbytes memory on run-time, reduces redundant conventional cyclic task execution steps to about 1/2 for HW/SW interactions, and provides sufficient performance in real-time through implementing two typical embedded softwares for practical multimedia system LSIs: an MPEG-2 system protocol LSI and an MPEG-2 video encoder LSI. This on-chip multimedia real-time OS with 2.5 Kbyte memory will be acceptable for future multimedia embedded system LSIs.
This paper describes a new design method for multiply-adders able to process a large quantity of multimedia data. I propose a (signed digits)(unsigned digits) fixed-point multiply-add/subtract unit. The unit eliminates the problems caused by the critical one-bit arithmetic precision drop-off peculiar to the conventional (signed digits)(signed digits) fixed-point multiply scheme. By simultaneously counting in the carry-save form, based on 7-3 counters simultaneously inputting the accumulation terms and the add/sub operation terms of multiplication results, carries are propagated faster than in the conventional method.