1-6hit |
Yuichiro MURACHI Koji HAMANO Tetsuro MATSUNO Junichi MIYAKOSHI Masayuki MIYAMA Masahiko YOSHIMOTO
This paper describes a 95 mW MPEG2 MP@HL motion estimation processor core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 19201080@30 fps resolution video. The search range is 12864 pixels. The ME core integrates 2.25 M transistors in 3.1 mm3.1 mm using 0.18-micron technology.
Yuichiro MURACHI Yuki FUKUYAMA Ryo YAMAMOTO Junichi MIYAKOSHI Hiroshi KAWAGUCHI Hajime ISHIHARA Masayuki MIYAMA Yoshio MATSUDA Masahiko YOSHIMOTO
This paper describes an optical-flow processor core for real-time video recognition. The processor is based on the Pyramidal Lucas and Kanade (PLK) algorithm. It features a smaller chip area, higher pixel rate, and higher accuracy than conventional optical-flow processors. Introduction of search range limitation and the Carman filter to the original PLK algorithm improve the optical-flow accuracy, and reduce the processor hardware cost. Furthermore, window interleaving and window overlap methods reduces the necessary clock frequency of the processor by 70%, allowing low-power characteristics. We first verified the PLK algorithm and architecture with a proto-typed FPGA implementation. Then, we designed a VLSI processor that can handle a VGA 30-fps image sequence at a clock frequency of 332 MHz. The core size and power consumption are estimated at 3.503.00 mm2 and 600 mW, respectively, in a 90-nm process technology.
Junichi MIYAKOSHI Yuichiro MURACHI Tomokazu ISHIHARA Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
For super-parallel video processing, we proposed a power- and area-efficient SRAM core architecture with a segmentation-free access, which means accessibility to arbitrary consecutive pixels, and horizontal/vertical access. To achieve these flexible accesses, a spirally-connected local-wordline select signal and multi-selection scheme in wordlines are proposed, so that extra X-decoders in the conventional multi-division SRAM can be eliminated. Consequently, the proposed SRAM reduces a power and area by 57-60% and 60%, respectively, when it is applied to a 128 parallel architecture. The proposed 160-kbit SRAM with 16-read ports (2-read port SRAM with eight-parallel architecture) is implemented to a search window buffer for an H.264 motion estimation processor core which dissipates 800 µW for QCIF 15-fps in a 130-nm technology.
Junichi MIYAKOSHI Yuichiro MURACHI Tetsuro MATSUNO Masaki HAMAMOTO Takahiro IINUMA Tomokazu ISHIHARA Hiroshi KAWAGUCHI Masayuki MIYAMA Masahiko YOSHIMOTO
We propose a sub-mW H.264 baseline-profile motion estimation processor for portable video applications. It features a VLSI-oriented block partitioning strategy and low-power SIMD/systolic-array datapath architecture, where the datapath can be switched between an SIMD and systolic array depending on processing flow. The processor supports all the seven kinds of block modes, and can handle three reference frames for a CIF (352288) 30-fps to QCIF (176144) 15-fps sequences with a quarter-pixel accuracy. It integrates 3.3 million transistors, and occupies 2.83.1 mm2 in a 130-nm CMOS technology. The proposed processor achieves a power of 800 µW in a QCIF 15-fps sequence with one reference picture.
Junichi MIYAKOSHI Yuichiro MURACHI Koji HAMANO Tetsuro MATSUNO Masayuki MIYAMA Masahiko YOSHIMOTO
This paper proposes a low-power systolic array architecture for a block-matching motion estimation processor IP for portable and high-resolution video applications. The architecture features a ring-connected processing element (PE) array to reduce both computation cycles and memory access cycles at the same time, allowing lower power characteristics. The feature of low memory access cycles allows concurrent operation of a half-pel processing unit with no extra cache. Furthermore, the architecture allows various summation schemes for absolute difference values. For that reason, it is applicable to various video coding modes such as the adaptive field/frame mode in MPEG2 and multiple macroblock mode in H.264. When the architecture is introduced to a design of a MPEG2 MP@HL motion estimation processor VLSI, the power consumption of the VLSI is reduced by 45-73% in comparison to cases with conventional architectures for motion estimation.
Yuichiro MURACHI Junichi MIYAKOSHI Masaki HAMAMOTO Takahiro IINUMA Tomokazu ISHIHARA Fang YIN Jangchung LEE Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
We describe a sub 100-mW H.264 MP@L4.1 integer-pel motion estimation processor core for low power video encoder. It supports macro block adaptive frame field (MBAFF) encoding and bi-directional prediction for a resolution of 19201080 pixels at 30 fps. The proposed processor features a novel hierarchical algorithm, reconfigurable ring-connected systolic array architecture and segmentation-free, rectangle-access search window buffer. The hierarchical algorithm consists of a fine search and a coarse search. A complementary recursive cross search is newly introduced in the coarse search. The fine search is adaptively carried out, based on an image analysis result obtained by the coarse search. The proposed systolic array architecture minimizes the amount of transferred data, and lowers computation cycles for the coarse and fine searches. In addition, we propose a novel search window buffer SRAM that has instantaneous accessibility to a rectangular area with arbitrary location. The processor core has been designed with a 90 nm CMOS design rule. Core size is 2.52.5 mm2. One core supports one-reference-frame and dissipates 48 mW at 1 V. Two core configuration consumes 96 mW for two-reference-frame search.