1-7hit |
Xiantao JIANG Tian SONG Takashi SHIMAMOTO Wen SHI Lisheng WANG
The next generation high efficiency video coding (HEVC) standard achieves high performance by extending the encoding block to 64×64. There are some parallel tools to improve the efficiency for encoder and decoder. However, owing to the dependence of the current prediction block and surrounding block, parallel processing at CU level and Sub-CU level are hard to achieve. In this paper, focusing on the spatial motion vector prediction (SMVP) and temporal motion vector prediction (TMVP), parallel improvement for spatio-temporal prediction algorithms are presented, which can remove the dependency between prediction coding units and neighboring coding units. Using this proposal, it is convenient to process motion estimation in parallel, which is suitable for different parallel platforms such as multi-core platform, compute unified device architecture (CUDA) and so on. The simulation experiment results demonstrate that based on HM12.0 test model for different test sequences, the proposed algorithm can improve the advanced motion vector prediction with only 0.01% BD-rate increase that result is better than previous work, and the BDPSNR is almost the same as the HEVC reference software.
Wenjun ZHAO Takao ONOYE Tian SONG
In this paper, a specified hardware architecture of the Fast Mode Decision (FMD) algorithms presented by our previous work is proposed. This architecture is designed as an embedded mode dispatch module. On the basis of this module, some unnecessary modes can be skipped or the mode decision process can be terminated in advanced. In order to maintain a higher compatibility, the FMD algorithms are unitedly designed as an unique module that can be easily embedded into a common video codec for H.265/HEVC. The input and output interfaces between the proposed module and other parts of the codec are designed based on simple but effective protocol. Hardware synthesis results on FPGA demonstrate that the proposed architecture achieves a maximum frequency of about 193 MHz with less than 1% of the total resources consumed. Moreover, the proposed module can improve the overall throughput.
Takafumi KATAYAMA Tian SONG Wen SHI Gen FUJITA Xiantao JIANG Takashi SHIMAMOTO
Scalable high efficiency video coding (SHVC) can provide variable video quality according to terminal devices. However, the computational complexity of SHVC is increased by introducing new techniques based on high efficiency video coding (HEVC). In this paper, a hardware oriented low complexity algorithm is proposed. The hardware oriented proposals have two key points. Firstly, the coding unit depth is determined by analyzing the boundary correlation between coding units before encoding process starts. Secondly, the redundant calculation of R-D optimization is reduced by adaptively using the information of the neighboring coding units and the co-located units in the base layer. The simulation results show that the proposed algorithm can achieve over 62% computation complexity reduction compared to the original SHM11.0. Compared with other related work, over 11% time saving have been achieved without PSNR loss. Furthermore, the proposed algorithm is hardware friendly which can be implemented in a small area.
Yizhong LIU Tian SONG Takashi SHIMAMOTO
In this paper, we propose a high-throughput binary arithmetic coding architecture for CABAC (Context Adaptive Binary Arithmetic Coding) which is one of the entropy coding tools used in the H.264/AVC main and high profiles. The full CABAC encoding functions, including binarization, context model selection, arithmetic encoding and bits generation, are implemented in this proposal. The binarization and context model selection are implemented in a proposed binarizer, in which a FIFO is used to pack the binarization results and output 4 bins in one clock. The arithmetic encoding and bits generation are implemented in a four-stage pipeline with the encoding ability of 4 bins/clock. In order to improve the processing speed, the context variables access and update for 4 bins are paralleled and the pipeline path is balanced. Also, because of the outstanding bits issue, a bits packing and generation strategy for 4 bins paralleled processing is proposed. After implemented in verilog-HDL and synthesized with Synopsys Design Compiler using 90 nm libraries, this proposal can work at the clock frequency of 250 MHz and takes up about 58 K standard cells, 3.2 Kbits register files and 27.6 K bits ROM. The throughput of processing 1000 M bins per second can be achieved in this proposal for the HDTV applications.
Xiantao JIANG Tian SONG Wen SHI Takafumi KATAYAMA Takashi SHIMAMOTO Lisheng WANG
In this work, a high efficiency coding unit (CU) size decision algorithm is proposed for high efficiency video coding (HEVC) inter coding. The CU splitting or non-splitting is modeled as a binary classification problem based on probability graphical model (PGM). This method incorporates two sub-methods: CU size termination decision and CU size skip decision. This method focuses on the trade-off between encoding efficiency and encoding complexity, and it has a good performance. Particularly in the high resolution application, simulation results demonstrate that the proposed algorithm can reduce encoding time by 53.62%-57.54%, while the increased BD-rate are only 1.27%-1.65%, compared to the HEVC software model.
Yizhong LIU Tian SONG Yiqi ZHUANG Takashi SHIMAMOTO Xiang LI
This paper proposes a novel greedy algorithm, called Creditability-Estimation based Matching Pursuit (CEMP), for the compressed sensing signal recovery. As proved in the algorithm of Stagewise Orthogonal Matching Pursuit (StOMP), two Gaussian distributions are followed by the matched filter coefficients corresponding to and without corresponding to the actual support set of the original sparse signal, respectively. Therefore, the selection for each support point is interpreted as a process of hypothesis testing, and the preliminarily selected support set is supposed to consist of rejected atoms. A hard threshold, which is controlled by an input parameter, is used to implement the rejection. Because the Type I error may happen during the hypothesis testing, not all the rejected atoms are creditable to be the true support points. The creditability of each preliminarily selected support point is evaluated by a well-designed built-in mechanism, and the several most creditable ones are adaptively selected into the final support set without being controlled by any extra external parameters. Moreover, the proposed CEMP does not necessitate the sparsity level to be a priori control parameter in operation. In order to verify the performance of the proposed algorithm, Gaussian and Pulse Amplitude Modulation sparse signals are measured in the noiseless and noisy cases, and the experiments of the compressed sensing signal recoveries by several greedy algorithms including CEMP are implemented. The simulation results show the proposed CEMP can achieve the best performances of the recovery accuracy and robustness as a whole. Besides, the experiment of the compressed sensing image recovery shows that CEMP can recover the image with the highest Peak Signal to Noise Ratio (PSNR) and the best visual quality.
Xiantao JIANG Tian SONG Wen SHI Takashi SHIMAMOTO Lisheng WANG
The purpose of this work is to reduce the redundant coding process with the tradeoff between the encoding complexity and coding efficiency in HEVC, especially for high resolution applications. Therefore, a CU depth prediction algorithm is proposed for motion estimation process of HEVC. At first, an efficient CTU depth prediction algorithm is proposed to reduce redundant depth. Then, CU size termination and skip algorithm is proposed based on the neighboring block depth and motion consistency. Finally, the overall algorithm, which has excellent complexity reduction performance for high resolution application is proposed. Moreover, the proposed method achieves steady performance, and it can significantly reduce the encoding time in different environment configuration and quantization parameter. The simulation experiment results demonstrate that, in the RA case, the average time saving is about 56% with only 0.79% BD-bitrate loss for the high resolution, and this performance is better than the previous state of the art work.