1-6hit |
Qi TENG Guowei TENG Xiang LI Ran MA Ping AN Zhenglong YANG
The latest versatile video coding (VVC) introduces some novel techniques such as quadtree with nested multi-type tree (QTMT), multiple transform selection (MTS) and multiple reference line (MRL). These tools improve compression efficiency compared with the previous standard H.265/HEVC, but they suffer from very high computational complexity. One of the most time-consuming parts of VVC intra coding is the coding tree unit (CTU) structure decision. In this paper, we propose a low-complexity multi-type tree (MT) pruning method for VVC intra coding. This method consists of lookahead search and MT pruning. The lookahead search process is performed to derive the approximate rate-distortion (RD) cost of each MT node at depth 2 or 3. Subsequently, the improbable MT nodes are pruned by different strategies under different cost errors. These strategies are designed according to the priority of the node. Experimental results show that the overall proposed algorithm can achieve 47.15% time saving with only 0.93% Bjøntegaard delta bit rate (BDBR) increase over natural scene sequences, and 45.39% time saving with 1.55% BDBR increase over screen content sequences, compared with the VVC reference software VTM 10.0. Such results demonstrate that our method achieves a good trade-off between computational complexity and compression quality compared to recent methods.
Lili WEI Zhenglong YANG Zhenming WANG Guozhong WANG
Since HEVC intra rate control has no prior information to rely on for coding, it is a difficult work to obtain the optimal λ for every coding tree unit (CTU). In this paper, a convolutional neural network (CNN) based intra rate control is proposed. Firstly, a CNN with two last output channels is used to predict the key parameters of the CTU R-λ curve. For well training the CNN, a combining loss function is built and the balance factor γ is explored to achieve the minimum loss result. Secondly, the initial CTU λ can be calculated by the predicted results of the CNN and the allocated bit per pixel (bpp). According to the rate distortion optimization (RDO) of a frame, a spatial equation is derived between the CTU λ and the frame λ. Lastly, The CTU clipping function is used to obtain the optimal CTU λ for the intra rate control. The experimental results show that the proposed algorithm improves the intra rate control performance significantly with a good rate control accuracy.
Guowei TENG Hao LI Zhenglong YANG
This paper proposes a temporal domain difference based secondary background modeling algorithm for surveillance video coding. The proposed algorithm has three key technical contributions as following. Firstly, the LDBCBR (Long Distance Block Composed Background Reference) algorithm is proposed, which exploits IBBS (interval of background blocks searching) to weaken the temporal correlation of the foreground. Secondly, both BCBR (Block Composed Background Reference) and LDBCBR are exploited at the same time to generate the temporary background reference frame. The secondary modeling algorithm utilizes the temporary background blocks generated by BCBR and LDBCBR to get the final background frame. Thirdly, monitor the background reference frame after it is generated is also important. We would update the background blocks immediately when it has a big change, shorten the modeling period of the areas where foreground moves frequently and check the stable background regularly. The proposed algorithm is implemented in the platform of IEEE1857 and the experimental results demonstrate that it has significant improvement in coding efficiency. In surveillance test sequences recommended by the China AVS (Advanced Audio Video Standard) working group, our method achieve BD-Rate gain by 6.81% and 27.30% comparing with BCBR and the baseline profile.
Zhenglong YANG Guozhong WANG GuoWei TENG
Although HEVC rate control can achieve high coding efficiency, it still does not fully utilize the special characteristics of surveillance videos, which typically have a moving foreground and relatively static background. For surveillance videos, it is usually necessary to provide a better coding quality of the moving foreground. In this paper, a foreground-background CTU λ separate decision scheme is proposed. First, low-complexity pixel-based segmentation is presented to obtain the foreground and the background. Second, the rate distortion (RD) characteristics of the foreground and the background are explored. With the rate distortion optimization (RDO) process, the average CTU λ value of the foreground or the background should be equal to the frame λ. Then, a separate optimal CTU λ decision is proposed with a separate λ clipping method. Finally, a separate updating process is used to obtain reasonable parameters for the foreground and the background. The experimental results show that the quality of the foreground is improved by 0.30 dB in the random access configuration and 0.45 dB in the low delay configuration without degradation of either the rate control accuracy or whole frame quality.
Manlin XIAO Zhibo DUAN Zhenglong YANG
Based on TLS-ESPRIT algorithm, this paper proposes a weighted spatial smoothing DOA estimation algorithm to address the problem that the conventional TLS-ESPRIT algorithm will be disabled to estimate the direction of arrival (DOA) in the scenario of coherent sources. The proposed method divides the received signal array into several subarrays with special structural feature. Then, utilizing these subarrays, this paper constructs the new weighted covariance matrix to estimate the DOA based on TLS-ESPRIT. The auto-correlation and cross-correlation information of subarrays in the proposed algorithm is extracted sufficiently, improving the orthogonality between the signal subspace and the noise subspace so that the DOA of coherent sources could be estimated accurately. The simulations show that the proposed algorithm is superior to the conventional spatial smoothing algorithms under different signal to noise ratio (SNR) and snapshot numbers with coherent sources.
Weiwei QI Shubin ZHENG Liming LI Zhenglong YANG
Bolts in the bogie box of metro vehicles are fasteners which are significant for bogie box structure. Effective loosening bolts detection in early stage can avoid the bolt loss and accident occurrence. Recently, detection methods based on machine vision are developed for bolt loosening. But traditional image processing and machine learning methods have high missed rate and false rate for bolts detection due to the small size and complex background. To address this problem, a loosening bolts defection method based on deep learning is proposed. The proposed method cascades two stages in a coarse-to-fine manner, including location stage based on the Single Shot Multibox Detector (SSD) and the improved SSD sequentially localizing the bogie box and bolts and a semantic segmentation stage with the U-shaped Network (U-Net) to detect the looseness of the bolts. The accuracy and effectiveness of the proposed method are verified with images captured from the Shanghai Metro Line 9. The results show that the proposed method has a higher accuracy in detecting the bolts loosening, which can guarantee the stable operation of the metro vehicles.