The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] h.264(137hit)

21-40hit(137hit)

  • Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip

    Hung K. NGUYEN  Peng CAO  Xue-Xiang WANG  Jun YANG  Longxing SHI  Min ZHU  Leibo LIU  Shaojun WEI  

     
    PAPER-Computer System

      Vol:
    E96-D No:3
      Page(s):
    601-615

    REMUS-II (REconfigurable MUltimedia System 2) is a coarse-grained dynamically reconfigurable computing system for multimedia and communication baseband processing. This paper proposes a real-time H.264 baseline profile encoder on REMUS-II. First, we propose an overall mapping flow for mapping algorithms onto the platform of REMUS-II system and then illustrate it by implementing the H.264 encoder. Second, parallel and pipelining techniques are considered for fully exploiting the abundant computing resources of REMUS-II, thus increasing total computing throughput and solving high computational complexity of H.264 encoder. Besides, some data-reuse schemes are also used to increase data-reuse ratio and therefore reduce the required data bandwidth. Third, we propose a scheduling scheme to manage run-time reconfiguration of the system. The scheduling is also responsible for synchronizing the data communication between tasks and handling conflict between hardware resources. Experimental results prove that the REMUS-MB (REMUS-II version for mobile applications) system can perform a real-time H.264/AVC baseline profile encoder. The encoder can encode CIF@30 fps video sequences with two reference frames and maximum search range of [-16,15]. The implementation, thereby, can be applied to handheld devices targeted at mobile multimedia applications. The platform of REMUS-MB system is designed and synthesized by using TSMC 65 nm low power technology. The die size of REMUS-MB is 13.97 mm2. REMUS-MB consumes, on average, about 100 mW while working at 166 MHz. To my knowledge, in the literature this is the first implementation of H.264 encoding algorithm on a coarse-grained dynamically reconfigurable computing system.

  • A Low Complexity H.264/AVC Deblocking Filter with Simplified Filtering Boundary Strength Decision

    Luong Pham VAN  Hoyoung LEE  Jaehwan KIM  Byeungwoo JEON  

     
    PAPER-Digital Signal Processing

      Vol:
    E96-A No:2
      Page(s):
    562-572

    Blocking artifacts are introduced in many block-based coding systems, and its reduction can significantly improve the subjective quality of compressed video. The H.264/AVC uses an in-loop deblocking filter to remove the blocking artifacts. The filter considers some coding conditions in its adaptive deblocking filtering such as coded block pattern (CBP), motion vector, macroblock type, etc. for inter-predicted blocks, however, it does not consider much for intra-coded blocks. In this paper, we utilize the human visual system (HVS) characteristic and the local characteristic of image blocks to modify the boundary strength (BS) of the intra-deblocking filter in order to gain improvement in the subjective quality and also to reduce the complexity in filtering intra coded slices. In addition, we propose a low-complexity deblocking method which utilizes the correlation between vertical and horizontal boundaries of a block in inter coded slices. Experimental results show that our proposed method achieves not only significant gain in the subjective quality but also some PSNR gain, and reduces the computational complexity of the deblocking filter by 36.23% on average.

  • Improved B-Picture Coding Scheme for Next Generation Video Compression

    Seung-Jin BAEK  Seung-Won JUNG  Hahyun LEE  Hui Yong KIM  Sung-Jea KO  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E95-D No:9
      Page(s):
    2318-2326

    In this paper, an improved B-picture coding algorithm based on the symmetric bi-directional motion estimation (ME) is proposed. In addition to the block match error between blocks in the forward and backward reference frames, the proposed method exploits the previously-reconstructed template regions in the current and reference frames for bi-directional ME. The side match error between the predicted target block and its template is also employed in order to alleviate block discontinuities. To efficiently perform ME, an initial motion vector (MV) is adaptively derived by exploiting temporal correlations. Experimental results show that the number of generated bits is reduced by up to 9.31% when the proposed algorithm is employed as a new macroblock (MB) coding mode for the H.264/AVC standard.

  • Encoder-Unconstrained User Interactive Partial Decoding Scheme

    Chen LIU  Xin JIN  Tianruo ZHANG  Satoshi GOTO  

     
    PAPER-Coding & Processing

      Vol:
    E95-A No:8
      Page(s):
    1288-1296

    High-definition (HD) videos become more and more popular on portable devices these years. Due to the resolution mismatch between the HD video sources and the relative low-resolution screens of portable devices, the HD videos are usually fully decoded and then down-sampled (FDDS) for the displays, which not only increase the cost of both computational power and memory bandwidth, but also lose the details of video contents. In this paper, an encoder-unconstrained partial decoding scheme for H.264/AVC is presented to solve the problem by only decoding the object of interest (OOI) related region, which is defined by users. A simplified compression domain tracking method is utilized to ensure that the OOI locates in the center of the display area. The decoded partial area (DPA) adaptation, the reference block relocation (RBR) and co-located temporal Intra prediction (CTIP) methods are proposed to improve the visual quality for the DPA with low complexity. The simulation results show that the proposed partial decoding scheme provides an average of 50.16% decoding time reduction comparing to the fully decoding process. The displayed region also presents the original HD granularity of OOI. The proposed partial decoding scheme is especially useful for displaying HD video on the devices of which the battery life is a crucial factor.

  • Recent Advances on Scalable Video Coding

    Kazuya HAYASE  Hiroshi FUJII  Yukihiro BANDOH  Hirohisa JOZAWA  

     
    INVITED PAPER

      Vol:
    E95-A No:8
      Page(s):
    1230-1239

    Scalable video coding offers efficient video transmission to a variety of display devices over heterogeneous and error-prone networks. Scalable video coding has been strenuously researched in recent years and state-of-the-art international coding with scalability has been standardized as SVC, which is an extension of H.264/AVC. This paper summarizes the recent advanced research that has been done for improving the quality and reducing the complexity of scalable video coding (including SVC), as well as for improving the quality assessment techniques. It is intended to give researchers a critical, technical overview of what is required to develop more efficient scalable video coding in the future.

  • A Direct Inter-Mode Selection Algorithm for P-Frames in Fast H.264/AVC Transcoding

    Bin SONG  Haixiao LIU  Hao QIN  Jie QIN  

     
    PAPER-Multimedia Systems for Communications

      Vol:
    E95-B No:6
      Page(s):
    2101-2108

    A direct inter-mode selection algorithm for P-frames in fast homogeneous H.264/AVC bit-rate reduction transcoding is proposed in this paper. To achieve the direct inter-mode selection, we firstly develop a low-complexity distortion estimation method for fast transcoding, in which the distortion is directly calculated from the decoded residual together with the reference frames. We also present a linear estimation method to approximate the coding rate. With the estimated distortion and rate, the rate-distortion cost can be easily computed in the transcoder. In our algorithm, a method based on the normalized rate difference of P-frames (RP) is used to detect the high motion scene. To achieve fast transcoding, only for the P-frames with RP larger than a threshold, the rate-distortion optimized (RDO) mode decision is performed; meanwhile, the average cost of each inter-mode (ACM) is calculated. Then for the subsequent frames transcoding, the optimal coding mode can be directly selected using the estimated cost and the ACM threshold. Experiments show that the proposed method can significantly simplify the complex RDO mode decision, and achieve transcoding time reductions of up to 62% with small loss of rate-distortion performance.

  • An H.264/AVC High422 Profile and MPEG-2 422 Profile Encoder LSI for HDTV Broadcasting Infrastructures

    Koyo NITTA  Hiroe IWASAKI  Takayuki ONISHI  Takashi SANO  Atsushi SAGATA  Yasuyuki NAKAJIMA  Minoru INAMORI  Ryuichi TANIDA  Atsushi SHIMIZU  Ken NAKAMURA  Mitsuo IKEDA  Jiro NAGANUMA  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    432-440

    An H.264/AVC encoder LSI (named “SARA”) that supports High422 profile, as well as 422 profile of MPEG-2, has been developed for HDTV broadcasting infrastructures. It contains three motion estimation and compensation (ME/MC) engines with wide search ranges of -217.75 to +199.75 horizontally, -109.75 to +145.75 vertically, which can utilize almost all H.264/AVC ME/MC coding tools, such as multiple reference frame, variable block size, quarter-pel prediction, macroblock adaptive field/frame prediction (MBAFF), spatial/temporal direct mode, and weighted prediction. Our evaluations show that it can encode fast moving scenes with 1.2 dB to 1.7 dB higher than the JM. It was successfully fabricated in a 90-nm technology, and integrates 140 million transistors.

  • An 88/44 Adaptive Hadamard Transform Based FME VLSI Architecture for 4 K2 K H.264/AVC Encoder

    Yibo FAN  Jialiang LIU  Dexue ZHANG  Xiaoyang ZENG  Xinhua CHEN  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    447-455

    Fidelity Range Extension (FRExt) (i.e. High Profile) was added to the H.264/AVC recommendation in the second version. One of the features included in FRExt is the Adaptive Block-size Transform (ABT). In order to conform to the FRExt, a Fractional Motion Estimation (FME) architecture is proposed to support the 88/44 adaptive Hadamard Transform (88/44 AHT). The 88/44 AHT circuit contributes to higher throughput and encoding performance. In order to increase the utilization of SATD (Sum of Absolute Transformed Difference) Generator (SG) in unit time, the proposed architecture employs two 8-pel interpolators (IP) to time-share one SG. These two IPs can work in turn to provide the available data continuously to the SG, which increases the data throughput and significantly reduces the cycles that are needed to process one Macroblock. Furthermore, this architecture also exploits the linear feature of Hadamard Transform to generate the quarter-pel SATD. This method could help to shorten the long datapath in the second-step of two-iteration FME algorithm. Finally, experimental results show that this architecture could be used in the applications requiring different performances by adjusting the supported modes and operation frequency. It can support the real-time encoding of the seven-mode 4 K2 K@24 fps or six-mode 4 K2 K@30 fps video sequences.

  • A 64 Cycles/MB, Luma-Chroma Parallelized H.264/AVC Deblocking Filter for 4 K2 K Applications

    Weiwei SHEN  Yibo FAN  Xiaoyang ZENG  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    441-446

    In this paper, a high-throughput debloking filter is presented for H.264/AVC standard, catering video applications with 4 K2 K (40962304) ultra-definition resolution. In order to strengthen the parallelism without simply increasing the area, we propose a luma-chroma parallel method. Meanwhile, this work reduces the number of processing cycles, the amount of external memory traffic and the working frequency, by using triple four-stage pipeline filters and a luma-chroma interlaced sequence. Furthermore, it eliminates most unnecessary off-chip memory bandwidth with a highly reusable memory scheme, and adopts a “slide window” buffer scheme. As a result, our design can support 4 K2 K at 30 fps applications at the working frequency of only 70.8 MHz.

  • Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications

    Xinning LIU  Chen MEI  Peng CAO  Min ZHU  Longxing SHI  

     
    PAPER-Design Methodology

      Vol:
    E95-D No:2
      Page(s):
    374-382

    This paper proposes a novel sub-architecture to optimize the data flow of REMUS-II (REconfigurable MUltimedia System 2), a dynamically coarse grain reconfigurable architecture. REMUS-II consists of a µPU (Micro-Processor Unit) and two RPUs (Reconfigurable Processor Unit), which are used to speeds up control-intensive tasks and data-intensive tasks respectively. The parallel computing capability and flexibility of REMUS-II makes itself an excellent candidate to process multimedia applications, which require a large amount of memory accesses. In this paper, we specifically optimize the data flow to deal with those performance-hazard and energy-hungry memory accessing in order to meet the bandwidth requirement of parallel computing. The RPU internal memory could work in multiple modes, like 2D-access mode and transformation mode, according to different multimedia access patterns. This novel design can improve the performance up to 26% compared to traditional on-chip memory. Meanwhile, the block buffer is implemented to optimize the off-chip data flow through reducing off-chip memory accesses, which reducing up to 43% compared to direct DDR access. Based on RTL simulation, REMUS-II can achieve 1080p@30 fps of H.264 High Profile@ Level 4 and High Level MPEG2 at 200 MHz clock frequency. The REMUS-II is implemented into 23.7 mm2 silicon on TSMC 65 nm logic process with a 400 MHz maximum working frequency.

  • Optimal Bit Allocation with Priority Layer Dropping for H.264 Scalable Video

    Junghyun HAN  Jitae SHIN  Sang-Hyo KIM  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E95-B No:2
      Page(s):
    684-688

    This letter proposes a practical algorithm for video transmission of the scalable extension of H.264/AVC (SVC) over limited bit-rate and varying channel signal-to-noise ratio (SNR). The proposal consists of SVC source-layer dropping and layered FEC using LDPC codes to maximize the video quality. The experimental results show that the proposed method realizes better video quality than the compared unequal error protection (UEP) without source-layer dropping. This implies that the dropping of a certain number of source-layers and using the resultant bit-budget for channel coding is more effective than the other UEP case which uses all possible source-layers.

  • High-Quality P2P Video Streaming System Considering the Cooperation of Constitution Information and Delivery Status

    Yohei OKAMOTO  Yosuke TANIGAWA  Hideki TODE  

     
    PAPER

      Vol:
    E94-B No:10
      Page(s):
    2732-2740

    Recently, video streaming services using P2P (Peer-to-Peer) have attracted attention to solve the problem of load concentration on servers and to reduce large latency. Many P2P streaming systems, like Coolstreaming, however, take a complicated approach to control playback timing severely. This leads to less churn resiliency and less adaptability to fluctuation of network traffic. Therefore, we focus on a simple and robust approach to realize “pseudo” streaming with high quality, which is based on BitTorrent. In the existing methods with the simple approach, peers download pieces just closer to playback timing to decrease the playback discontinuity. However, these methods do not consider the constitution of video structure in sophisticated manner. P2P streaming system must consider several important metrics for high-quality and fair distribution. Therefore, in this paper, we propose a new P2P video streaming system considering the cooperation of three important metrics; video structure, playback timing, and piece dispersion on network. In this system, users vary three piece selections to suit the delivery status. Specifically, users preferentially download pieces which affect the video quality, which are closer to playback timing, and which improve the delivery efficiency. Moreover, we show the effectiveness of the proposed method by computer simulation.

  • New Error Resilience Technique Using Adaptive FMO and Intra Refresh for H.264 Video Transmission

    Tien HUU VU  Supavadee ARAMVITH  Yoshikazu MIYANAGA  

     
    PAPER-Digital Signal Processing

      Vol:
    E94-A No:8
      Page(s):
    1647-1655

    In this paper, we propose an error resilience scheme for wireless video coding based on adaptive flexible macroblock ordering (FMO) and intra refresh. An FMO explicit map is generated frame-by-frame by using prior information. This information involves estimated locations of guard and burst sections in the channel and estimated effect of error propagation (EEP) from the previous frame to the current frame. In addition, the role of the current frame in propagating an error to the next frame is also considered. A suitable intra refresh rate which is adaptive to the channel state is used to reduce the dependence between frames and thus can stop the EEP. The results in experiments show that the proposed method gains some improvements in terms of peak signal-to-noise rate (PSNR) as compared with some other methods that have not considered the channel condition and the error propagation in generating an FMO map.

  • Fast H.264/AVC DIRECT Mode Decision Based on Mode Selection and Predicted Rate-Distortion Cost

    Xiaocong JIN  Jun SUN  Yiqing HUANG  Jia SU  Takeshi IKENAGA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E94-D No:8
      Page(s):
    1653-1662

    Different encoding modes for variable block size are available in the H.264/AVC standard in order to offer better coding quality. However, this also introduces huge computation time due to the exhaustive check for all modes. In this paper, a fast spatial DIRECT mode decision method for profiles supporting B frame encoding (main profile, high profile, etc.) in H.264/AVC is proposed. Statistical analysis on multiple video sequences is carried out, and the strong relationship of mode selection and rate-distortion (RD) cost between the current DIRECT macroblock (MB) and the co-located MBs is observed. With the check of mode condition, predicted RD cost threshold and dynamic parameter update model, the complex mode decision process can be terminated at an early stage even for small QP cases. Simulation results demonstrate the proposed method can achieve much better performance than the original exhaustive rate-distortion optimization (RDO) based mode decision algorithm by reducing up to 56.8% of encoding time for IBPBP picture group and up to 67.8% of encoding time for IBBPBBP picture group while incurring only negligible bit increment and quality degradation.

  • A Spatially Adaptive Gradient-Projection Algorithm to Remove Coding Artifacts of H.264

    Kwon-Yul CHOI  Min-Cheol HONG  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E94-D No:5
      Page(s):
    1073-1081

    In this paper, we propose a spatially adaptive gradient-projection algorithm for the H.264 video coding standard to remove coding artifacts using local statistics. A hybrid method combining a new weighted constrained least squares (WCLS) approach and the projection onto convex sets (POCS) approach is introduced, where weighting components are determined on the basis of the human visual system (HVS) and projection set is defined by the difference between adjacent pixels and the quantization index (QI). A new visual function is defined to determine the weighting matrices controlling the degree of global smoothness, and a projection set is used to obtain a solution satisfying local smoothing constraints, so that the coding artifacts such as blocking and ringing artifacts can be simultaneously removed. The experimental results show the capability and efficiency of the proposed algorithm.

  • Multiple Region-of-Interest Based H.264 Encoder with a Detection Architecture in Macroblock Level Pipelining

    Tianruo ZHANG  Chen LIU  Minghui WANG  Satoshi GOTO  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    401-410

    This paper proposes a region-of-interest (ROI) based H.264 encoder and the VLSI architecture of the ROI detection algorithm. In ROI based video coding system, pre-processing unit to detect ROI should only introduce low computational complexity overhead due to the low power requirement. The Macroblocks (MBs) in ROIs are detected sequentially in the same order of H.264 encoding to satisfy the MB level pipelining of ROI detector and H.264 encoder. ROI detection is performed in a novel estimation-and-verification process with an ROI contour template. Proposed architecture can be configured to detect either single ROI or multiple ROIs in each frame and the throughput of single detection mode is 5.5 times of multiple detection mode. 98.01% and 97.89% of MBs in ROIs can be detected in single and multiple detection modes respectively. Hardware cost of proposed architecture is only 4.68 k gates. Detection speed is 753 fps for CIF format video at the operation frequency of 200 MHz in multiple detection mode with power consumption of 0.47 mW. Compared with previous fast ROI detection algorithms for video coding application, the proposed architecture obtains more accurate and smaller ROI. Therefore, more efficient ROI based computation complexity and compression efficiency optimization can be implemented in H.264 encoder.

  • Highly Parallel and Fully Reused H.264/AVC High Profile Intra Predictor Generation Engine for Super Hi-Vision 4k4k@60 fps

    Yiqing HUANG  Xiaocong JIN  Jin ZHOU  Jia SU  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    428-438

    One high profile intra predictor generation engine is proposed in this paper. Firstly, hardware level algorithm optimization for intra 88 (I8MB) mode is introduced. The original candidate pixels for generating prediction samples of I8MB are replaced with boundary pixels of intra 44 (I4MB) blocks. Based on this adoption, full data reuse between predictors of I4MB and filtered samples of I8MB can be achieved with almost no quality loss. Secondly, one lossless two-44-block based parallel predictor generation flow is proposed. The original predictor generation flow is optimized from 16 stages to 10 stages for I4MB and Intra 1616 (I16MB), which saves 37.5% processing cycles. For I8MB, similar methodology with different processing order of 44 scaled blocks is introduced. Thirdly, fully utilized hardwired engines for I4MB, I16MB and I8MB are proposed in this paper. Except DC (direct current) and plane modes, full data reuse among all intra modes of high profile can be achieved. Fourthly, for DC mode, one combined predictor generation process is introduced and predictor generation of I16MB's DC mode is merged into the process of I4MB's DC mode. Moreover, by configuring proposed hardwired engines, predictor generation of I16MB's plane mode and chrominance plane mode can be accomplished with only 50% cycles of original design. Totally, when compared with original full-mode design and latest dynamic mode reused design, the proposed predictor generation engine can achieve 89.5% and 73.2% saving of processing cycles, respectively. Synthesized by TSMC 0.18 µm technology under worst work conditions (1.62 V, 125°C), with 380 MHz and 37.2 k gates, the proposed design can handle real-time high profile intra predictor generation of Super Hi-Vision 4 k4 k@60 fps. The maximum work frequency of our design under worst condition is 468 MHz.

  • Cache Based Motion Compensation Architecture for Quad-HD H.264/AVC Video Decoder

    Jinjia ZHOU  Dajiang ZHOU  Gang HE  Satoshi GOTO  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    439-447

    In this paper, we present a cache based motion compensation (MC) architecture for Quad-HD H.264/AVC video decoder. With the significantly increased throughput requirement, VLSI design for MC is greatly challenged by the huge area cost and power consumption. Moreover, the long memory system latency leads to performance drop of the MC pipeline. To solve these problems, three optimization schemes are proposed in this work. Firstly, a high-performance interpolator based on Horizontal-Vertical Expansion and Luma-Chroma Parallelism (HVE-LCP) is proposed to efficiently increase the processing throughput to at least over 4 times as the previous designs. Secondly, an efficient cache memory organization scheme (4S×4) is adopted to improve the on-chip memory utilization, which contributes to memory area saving of 25% and memory power saving of 3949%. Finally, by employing a Split Task Queue (STQ) architecture, the cache system is capable of tolerating much longer latency of the memory system. Consequently, the cache idle time is saved by 90%, which contributes to reducing the overall processing time by 2440%. When implemented with SMIC 90 nm process, this design costs a logic gate count and on-chip memory of 108.8 k and 3.1 kB respectively. The proposed MC architecture can support real-time processing of 3840×2160@60 fps with less than 166 MHz.

  • An H.264/AVC Decoder with Reduced External Memory Access for Motion Compensation

    Jaesun KIM  Younghoon KIM  Hyuk-Jae LEE  

     
    PAPER-Computer System

      Vol:
    E94-D No:4
      Page(s):
    798-808

    The excessive memory access required to perform motion compensation when decoding compressed video is one of the main limitations in improving the performance of an H.264/AVC decoder. This paper proposes an H.264/AVC decoder that employs three techniques to reduce external memory access events: efficient distribution of reference frame data, on-chip cache memory, and frame memory recompression. The distribution of reference frame data is optimized to reduce the number of row activations during SDRAM access. The novel cache organization is proposed to simplify tag comparisons and ease the access to consecutive 4×4 blocks. A recompression algorithm is modified to improve compression efficiency by using unused storage space in neighboring blocks as well as the correlation with the neighboring pixels stored in the cache. Experimental results show that the three techniques together reduce external memory access time by an average of 90%, which is 16% better than the improvements achieved by previous work. Efficiency of the frame memory recompression algorithm is improved with a 32×32 cache, resulting in a PSNR improvement of 0.371 dB. The H.264/AVC decoder with the three techniques is fabricated and implemented as an ASIC using 0.18 µm technology.

  • Optimized 2-D SAD Tree Architecture of Integer Motion Estimation for H.264/AVC

    Yibo FAN  Xiaoyang ZENG  Satoshi GOTO  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    411-418

    Integer Motion Estimation (IME) costs much computation in H.264/AVC video encoder. 2-D SAD tree IME architecture provides very high performance for encoder, and it has been used by many video codec designs. This paper proposes an optimized hardware design of 2-D SAD tree IME. Firstly, a new hardware architecture is proposed to reduce on-chip memory size. Secondly, a new search pattern is proposed to fully use memory bandwidth and reduce external memory access. Thirdly, the data-path is redesigned, and the performance is greatly improved. In order to compare with other IME designs, an IME design support D1 size, 30 fps with search range [32, 32] is implemented. The hardware cost of this design includes 118 KGates and 8 Kb SRAM, the maximum clock frequency is 200 MHz. Compared to the original 2-D SAD tree IME, our design saves 87.5% on-chip memory, and achieves 3 times performance than original one. Our design provides a new way to design a low cost and high performance IME for H.264/AVC encoder.

21-40hit(137hit)