Motion estimation is a major part of the video coding, which traces the motion of moving objects in video sequences. Among various motion estimation algorithms, the Hierarchical Block-Matching Algorithm (HBMA) that is a multilayered motion estimation algorithm is attractive in motion-compensated interpolation when accurate motion estimation is required. However, parallel processing of HBMA is necessary since the high computational complexity of HBMA prevents it from operating in real-time. Further, the repeated updates of vectors naturally lead to pipelined processing. In this paper, we present a pipelined architecture for HBMA. We investigate the data dependency of HBMA and the requirements of the pipeline to operate synchronously. Each pipeline stage of the proposed architecture consists of a systolic array for the block-matching algorithm, a bilinear interpolator, and a latch mechanism. The latch mechanism mainly resolves the data dependency and arranges the data flow in a synchronous way. The proposed architecture achieves nearly linear speedup without additional hardware cost over a non-pipelined one. It requires the clock of 2.70 ns to process a large size of frame (e.q. HDTV) in real-time, which is about to be available under the current VLSI technology.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Hyung Chul KIM, Seung Ryoul MAENG, Jung Wan CHO, "A Design of Pipelined Architecture for Hierarchical Block-Matching Algorithm" in IEICE TRANSACTIONS on Information,
vol. E78-D, no. 5, pp. 586-595, May 1995, doi: .
Abstract: Motion estimation is a major part of the video coding, which traces the motion of moving objects in video sequences. Among various motion estimation algorithms, the Hierarchical Block-Matching Algorithm (HBMA) that is a multilayered motion estimation algorithm is attractive in motion-compensated interpolation when accurate motion estimation is required. However, parallel processing of HBMA is necessary since the high computational complexity of HBMA prevents it from operating in real-time. Further, the repeated updates of vectors naturally lead to pipelined processing. In this paper, we present a pipelined architecture for HBMA. We investigate the data dependency of HBMA and the requirements of the pipeline to operate synchronously. Each pipeline stage of the proposed architecture consists of a systolic array for the block-matching algorithm, a bilinear interpolator, and a latch mechanism. The latch mechanism mainly resolves the data dependency and arranges the data flow in a synchronous way. The proposed architecture achieves nearly linear speedup without additional hardware cost over a non-pipelined one. It requires the clock of 2.70 ns to process a large size of frame (e.q. HDTV) in real-time, which is about to be available under the current VLSI technology.
URL: https://global.ieice.org/en_transactions/information/10.1587/e78-d_5_586/_p
Copy
@ARTICLE{e78-d_5_586,
author={Hyung Chul KIM, Seung Ryoul MAENG, Jung Wan CHO, },
journal={IEICE TRANSACTIONS on Information},
title={A Design of Pipelined Architecture for Hierarchical Block-Matching Algorithm},
year={1995},
volume={E78-D},
number={5},
pages={586-595},
abstract={Motion estimation is a major part of the video coding, which traces the motion of moving objects in video sequences. Among various motion estimation algorithms, the Hierarchical Block-Matching Algorithm (HBMA) that is a multilayered motion estimation algorithm is attractive in motion-compensated interpolation when accurate motion estimation is required. However, parallel processing of HBMA is necessary since the high computational complexity of HBMA prevents it from operating in real-time. Further, the repeated updates of vectors naturally lead to pipelined processing. In this paper, we present a pipelined architecture for HBMA. We investigate the data dependency of HBMA and the requirements of the pipeline to operate synchronously. Each pipeline stage of the proposed architecture consists of a systolic array for the block-matching algorithm, a bilinear interpolator, and a latch mechanism. The latch mechanism mainly resolves the data dependency and arranges the data flow in a synchronous way. The proposed architecture achieves nearly linear speedup without additional hardware cost over a non-pipelined one. It requires the clock of 2.70 ns to process a large size of frame (e.q. HDTV) in real-time, which is about to be available under the current VLSI technology.},
keywords={},
doi={},
ISSN={},
month={May},}
Copy
TY - JOUR
TI - A Design of Pipelined Architecture for Hierarchical Block-Matching Algorithm
T2 - IEICE TRANSACTIONS on Information
SP - 586
EP - 595
AU - Hyung Chul KIM
AU - Seung Ryoul MAENG
AU - Jung Wan CHO
PY - 1995
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E78-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 1995
AB - Motion estimation is a major part of the video coding, which traces the motion of moving objects in video sequences. Among various motion estimation algorithms, the Hierarchical Block-Matching Algorithm (HBMA) that is a multilayered motion estimation algorithm is attractive in motion-compensated interpolation when accurate motion estimation is required. However, parallel processing of HBMA is necessary since the high computational complexity of HBMA prevents it from operating in real-time. Further, the repeated updates of vectors naturally lead to pipelined processing. In this paper, we present a pipelined architecture for HBMA. We investigate the data dependency of HBMA and the requirements of the pipeline to operate synchronously. Each pipeline stage of the proposed architecture consists of a systolic array for the block-matching algorithm, a bilinear interpolator, and a latch mechanism. The latch mechanism mainly resolves the data dependency and arranges the data flow in a synchronous way. The proposed architecture achieves nearly linear speedup without additional hardware cost over a non-pipelined one. It requires the clock of 2.70 ns to process a large size of frame (e.q. HDTV) in real-time, which is about to be available under the current VLSI technology.
ER -