The search functionality is under construction.

Author Search Result

[Author] Mengmeng ZHANG(18hit)

1-18hit
  • Fast Intra Coding Algorithm for HEVC Based on Decision Tree

    Jia QIN  Huihui BAI  Mengmeng ZHANG  Yao ZHAO  

     
    LETTER-Image

      Vol:
    E100-A No:5
      Page(s):
    1274-1278

    High Efficiency Video Coding (HEVC) is the latest coding standard. Compared with Advanced Video coding (H.264/AVC), HEVC offers about a 50% bitrate reduction at the same reconstructed video quality. However, this new coding standard leads to enormous computational complexity, which makes it difficult to encode video in real time. Therefore, in this paper, aiming at the high complexity of intra coding in HEVC, a new fast coding unit (CU) splitting algorithm is proposed based on the decision tree. Decision tree, as a method of machine learning, can be designed to determine the size of CUs adaptively. Here, two significant features, Just Noticeable Difference (JND) values and coding bits of each CU can be extracted to train the decision tree, according to their relationships with the CUs' partitions. The experimental results have revealed that the proposed algorithm can save about 34% of time, on average, with only a small increase of BD-rate under the “All_Intra” setting, compared with the HEVC reference software.

  • Standard-Compliant Multiple Description Image Coding Based on Convolutional Neural Networks

    Ting ZHANG  Huihui BAI  Mengmeng ZHANG  Yao ZHAO  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2018/07/19
      Vol:
    E101-D No:10
      Page(s):
    2543-2546

    Multiple description (MD) coding is an attractive framework for robust information transmission over non-prioritized and unpredictable networks. In this paper, a novel MD image coding scheme is proposed based on convolutional neural networks (CNNs), which aims to improve the reconstructed quality of side and central decoders. For this purpose initially, a given image is encoded into two independent descriptions by sub-sampling. Such a design can make the proposed method compatible with the existing image coding standards. At the decoder, in order to achieve high-quality of side and central image reconstruction, three CNNs, including two side decoder sub-networks and one central decoder sub-network, are adopted into an end-to-end reconstruction framework. Experimental results show the improvement achieved by the proposed scheme in terms of both peak signal-to-noise ratio values and subjective quality. The proposed method demonstrates better rate central and side distortion performance.

  • Mining Approximate Primary Functional Dependency on Web Tables

    Siyu CHEN  Ning WANG  Mengmeng ZHANG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/11/29
      Vol:
    E102-D No:3
      Page(s):
    650-654

    We propose to discover approximate primary functional dependency (aPFD) for web tables, which focus on the determination relationship between primary attributes and non-primary attributes and are more helpful for entity column detection and topic discovery on web tables. Based on association rules and information theory, we propose metrics Conf and InfoGain to evaluate PFDs. By quantifying PFDs' strength and designing pruning strategies to eliminate false positives, our method could select minimal non-trivial approximate PFD effectively and are scalable to large tables. The comprehensive experimental results on real web datasets show that our method significantly outperforms previous work in both effectiveness and efficiency.

  • A Fast Chroma Intra-Prediction Mode Decision Algorithm Based on Texture Characteristics for VVC

    Zhi LIU  Yifan SU  Shuzhong YANG  Mengmeng ZHANG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2021/02/05
      Vol:
    E104-D No:5
      Page(s):
    781-784

    Cross-component linear model (CCLM) chromaticity prediction is a new technique introduced in Versatile Video Coding (VVC), which utilizes the reconstructed luminance component to predict the chromaticity parts, and can improve the coding performance. However, it increases the coding complexity. In this paper, how to accelerate the chroma intra-prediction process is studied based on texture characteristics. Firstly, two observations have been found through experimental statistics for the process. One is that the choice of the chroma intra-prediction candidate modes is closely related to the texture complexity of the coding unit (CU), and the other is that whether the direct mode (DM) is selected is closely related to the texture similarity between current chromaticity CU and the corresponding luminance CU. Secondly, a fast chroma intra-prediction mode decision algorithm is proposed based on these observations. A modified metric named sum modulus difference (SMD) is introduced to measure the texture complexity of CU and guide the filtering of the irrelevant candidate modes. Meanwhile, the structural similarity index measurement (SSIM) is adopted to help judging the selection of the DM mode. The experimental results show that compared with the reference model VTM8.0, the proposed algorithm can reduce the coding time by 12.92% on average, and increases the BD-rate of Y, U, and V components by only 0.05%, 0.32%, and 0.29% respectively.

  • Fast Intra Prediction and CU Partition Algorithm for Virtual Reality 360 Degree Video Coding

    Zhi LIU  Cai XU  Mengmeng ZHANG  Wen YUE  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2018/12/18
      Vol:
    E102-D No:3
      Page(s):
    666-669

    Virtual Reality (VR) 360 degree video has ultra-high definition. Reducing the coding complexity becomes a key consideration in coding algorithm design. In this paper, a novel candidate mode pruning process is introduced between Rough Mode Decision and Most Probable Mode based on the statistical analysis of the intra-coding parameters used in VR 360 degree video coding under Cubemap projection (CMP) format. In addition, updated coding bits thresholds for VR 360 degree video are designed in the proposed algorithm. The experimental results show that the proposed algorithm brings 38.73% and 23.70% saving in average coding time at the cost of only 1.4% and 2.1% Bjontegaard delta rate increase in All-Intra mode and Randomaccess mode, respectively.

  • Fast Coding-Mode Selection and CU-Depth Prediction Algorithm Based on Text-Block Recognition for Screen Content Coding

    Mengmeng ZHANG  Ang ZHU  Zhi LIU  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2016/07/12
      Vol:
    E99-D No:10
      Page(s):
    2651-2655

    As an important extension of high-efficiency video coding (HEVC), screen content coding (SCC) includes various new coding modes, such as Intra Block Copy (IBC), Palette-based coding (Palette), and Adaptive Color Transform (ACT). These new tools have improved screen content encoding performance. This paper proposed a novel and fast algorithm by classifying Code Units (CUs) as text CUs or non-text CUs. For text CUs, the Intra mode was skipped in the compression process, whereas for non-text CUs, the IBC mode was skipped. The current CU depth range was then predicted according to its adjacent left CU depth level. Compared with the reference software HM16.7+SCM5.4, the proposed algorithm reduced encoding time by 23% on average and achieved an approximate 0.44% increase in Bjøntegaard delta bit rate and a negligible peak signal-to-noise ratio loss.

  • An Efficient Multimodal Aggregation Network for Video-Text Retrieval

    Zhi LIU  Fangyuan ZHAO  Mengmeng ZHANG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2022/06/27
      Vol:
    E105-D No:10
      Page(s):
    1825-1828

    In video-text retrieval task, mainstream framework consists of three parts: video encoder, text encoder and similarity calculation. MMT (Multi-modal Transformer) achieves remarkable performance for this task, however, it faces the problem of insufficient training dataset. In this paper, an efficient multimodal aggregation network for video-text retrieval is proposed. Different from the prior work using MMT to fuse video features, the NetVLAD is introduced in the proposed network. It has fewer parameters and is feasible for training with small datasets. In addition, since the function of CLIP (Contrastive Language-Image Pre-training) can be considered as learning language models from visual supervision, it is introduced as text encoder in the proposed network to avoid overfitting. Meanwhile, in order to make full use of the pre-training model, a two-step training scheme is designed. Experiments show that the proposed model achieves competitive results compared with the latest work.

  • New VVC Chroma Prediction Modes Based on Coloring with Inter-Channel Correlation

    Zhi LIU  Jia CAO  Xiaohan GUAN  Mengmeng ZHANG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2022/06/27
      Vol:
    E105-D No:10
      Page(s):
    1821-1824

    Inter-channel correlation is one of the redundancy which need to be eliminated in video coding. In the latest video coding standard H.266/VVC, the DM (Direct Mode) and CCLM (Cross-component Linear Model) modes have been introduced to reduce the similarity between luminance and chroma. However, inter-channel correlation is still observed. In this paper, a new inter-channel prediction algorithm is proposed, which utilizes coloring principle to predict chroma pixels. From the coloring perspective, for most natural content video frames, the three components Y, U and V always demonstrate similar coloring pattern. Therefore, the U and V components can be predicted using the coloring pattern of the Y component. In the proposed algorithm, correlation coefficients are obtained in a lightweight way to describe the coloring relationship between current pixel and reference pixel in Y component, and used to predict chroma pixels. The optimal position for the reference samples is also designed. Base on the selected position of the reference samples, two new chroma prediction modes are defined. Experiment results show that, compared with VTM 12.1, the proposed algorithm has an average of -0.92% and -0.96% BD-rate improvement for U and V components, for All Intra (AI) configurations. At the same time, the increased encoding time and decoding time can be ignored.

  • Fast CU Splitting in HEVC Intra Coding for Screen Content Coding

    Mengmeng ZHANG  Yang ZHANG  Huihui BAI  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E98-D No:2
      Page(s):
    467-470

    The high efficiency video coding (HEVC) standard has significantly improved compression performance for many applications, including remote desktop and desktop sharing. Screen content video coding is widely used in applications with a high demand for real-time performance. HEVC usually introduces great computational complexity, which makes fast algorithms necessary to offset the limited computing power of HEVC encoders. In this study, a statistical analysis of several screen content sequences is first performed to better account for the completely different statistics of natural images and videos. Second, a fast coding unit (CU) splitting method is proposed, which aims to reduce HEVC intra coding computational complexity, especially in screen content coding. In the proposed scheme, CU size decision is made by checking the smoothness of the luminance values in every coding tree unit. Experiments demonstrate that in HEVC range extension standard, the proposed scheme can save an average of 29% computational complexity with 0.9% Bjøntegaard Delta rate (BD-rate) increase compared with HM13.0+RExt6.0 anchor for screen content sequences. For default HEVC, the proposed scheme can reduce encoding time by an average of 38% with negligible loss of coding efficiency.

  • A Fast Intra Mode Decision Algorithm in VVC Based on Feature Cross for Screen Content Videos

    Zhi LIU  Siyuan ZHANG  Xiaohan GUAN  Mengmeng ZHANG  

     
    LETTER-Coding Theory

      Pubricized:
    2023/07/24
      Vol:
    E107-A No:1
      Page(s):
    178-181

    In previous machine learning based fast intra mode decision algorithms for screen content videos, feature design is a key task and it is always difficult to obtain distinguishable features. In this paper, the idea of interaction of features is introduced to fast video coding algorithm, and a fast intra mode decision algorithm based on feature cross is proposed for screen content videos. The numeric features and category features are designed based on the characteristics of screen content videos, and the adaptive factorization network (AFN) is improved and adopted to carry out feature interaction to designed features, and output distinguishable features. The experimental results show that for AI (All Intra) configuration, compared with standard VVC/H.266, the coding time is reduced by 29.64% and the BD rate is increased only by 1.65%.

  • A Fast Multi-Type Tree Decision Algorithm for VVC Based on Pixel Difference of Sub-Blocks

    Zhi LIU  Mengjun DONG  Mengmeng ZHANG  

     
    LETTER-Coding Theory

      Pubricized:
    2020/03/02
      Vol:
    E103-A No:6
      Page(s):
    856-859

    In the upcoming video coding standard VVC (Versatile Video Coding, H.266), a new coding block structure named quadtree nested multi-type trees (MTT) has been proposed. Compared with the quadtree structure defined in HEVC (High Efficiency Video Coding), the partition structure of MTT can achieve better coding performance. Since the splitting scheme of a CU (Coding Unit) need to be calculated recursively, the computational complexity is significantly increased. To reduce computational complexity as well as maintain compression performance, a fast multi-type tree decision algorithm is proposed. The application of binary and ternary tree in horizontal or vertical direction is found to be closely related to the characteristics of CU in this paper, and a metric named pixel difference of sub-blocks (SBPD) is defined to measure the characteristics of CU in different splitting type. By comparing the SBPD in horizontal and vertical sub-blocks, the selection of binary and ternary tree can be decided in advance, so as to skip some redundant splitting modes. Experimental results show that compared with the original reference software VTM 4.0, the average time saving of the proposed algorithm is 27% and the BD-rate is only increased by 0.55%.

  • Multiple Description Video Coding Using Inter- and Intra-Description Correlation at Macro Block Level

    Huihui BAI  Mengmeng ZHANG  Anhong WANG  Meiqin LIU  Yao ZHAO  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E97-D No:2
      Page(s):
    384-387

    A novel standard-compliant multiple description (MD) video codec is proposed in this paper, which aims to achieve effective redundancy allocation using inter- and intra-description correlation. The inter-description correlation at macro block (MB) level is applied to produce side information of different modes which is helpful for better side decoding quality. Furthermore, the intra-description correlation at MB level is exploited to design the adaptive skip mode for higher compression efficiency. The experimental results exhibit a better rate of side and central distortion performance compared with other relevant MDC schemes.

  • A Novel Fast Intra Prediction Scheme for Depth-Map in 3D High Efficiency Video Coding

    Mengmeng ZHANG  Shenghui QIU  Huihui BAI  

     
    LETTER-Coding Theory

      Vol:
    E97-A No:7
      Page(s):
    1635-1639

    The development of 3D High Efficiency Video Coding (3D-HEVC) has resulted in a growing interest in the compression of depth-maps. To achieve better intra prediction performance, the Depth Modeling Mode (DMM) technique is employed as an intra prediction technique for depth-maps. However, the complexity and computation load have dramatically increased with the application of DMM. Therefore, in view of the limited colors in depth-maps, this paper presents a novel fast intra coding scheme based on Base Colors and Index Map (BCIM) to reduce the complexity of DMM effectively. Furthermore, the index map is remapped, and the Base Colors are coded by predictive coding in BCIM to improve compression efficiency. Compared with the intra prediction coding in DMM, the experimental results illustrate that the proposed scheme provides a decrease of approximately 51.2% in the intra prediction time. Meanwhile, the BD-rate increase is only 0.83% for the virtual intermediate views generated by Depth-Image-Based Rendering.

  • Just Noticeable Difference Based Fast Coding Unit Partition in HEVC Intra Coding

    Meng ZHANG  Huihui BAI  Meiqin LIU  Anhong WANG  Mengmeng ZHANG  Yao ZHAO  

     
    LETTER-Image

      Vol:
    E97-A No:12
      Page(s):
    2680-2683

    As an ongoing video compression standard, High Efficiency Video Coding (HEVC) has achieved better rate distortion performance than H.264, but it also leads to enormous encoding complexity. In this paper, we propose a novel fast coding unit partition algorithm in the intra prediction of HEVC. Firstly, instead of the time-consuming rate distortion optimization for coding mode decision, just-noticeable-difference (JND) values can be exploited to partition the coding unit according to human visual system characteristics. Furthermore, coding bits in HEVC can also be considered as assisted information to refine the partition results. Compared with HEVC test model HM10.1, the experimental results show that the fast intra mode decision algorithm provides over 28% encoding time saving on average with comparable rate distortion performance.

  • Improving Sliced Wasserstein Distance with Geometric Median for Knowledge Distillation Open Access

    Hongyun LU  Mengmeng ZHANG  Hongyuan JING  Zhi LIU  

     
    LETTER-Fundamentals of Information Systems

      Pubricized:
    2024/03/08
      Vol:
    E107-D No:7
      Page(s):
    890-893

    Currently, the most advanced knowledge distillation models use a metric learning approach based on probability distributions. However, the correlation between supervised probability distributions is typically geometric and implicit, causing inefficiency and an inability to capture structural feature representations among different tasks. To overcome this problem, we propose a knowledge distillation loss using the robust sliced Wasserstein distance with geometric median (GMSW) to estimate the differences between the teacher and student representations. Due to the intuitive geometric properties of GMSW, the student model can effectively learn to align its produced hidden states from the teacher model, thereby establishing a robust correlation among implicit features. In experiment, our method outperforms state-of-the-art models in both high-resource and low-resource settings.

  • An Improved SAO Scheme for Screen Content Coding

    Mengmeng ZHANG  Chuan ZHOU  Jizheng XU  

     
    LETTER-Image

      Vol:
    E99-A No:7
      Page(s):
    1499-1502

    The High efficiency video coding (HEVC) standard defines two in-loop filters to improve the objective and subjective quality of the reconstructed frames. Through analyzing the effectiveness of the in-loop filters, it is noted that band offset (BO) process achieves much more coding gains for text region which mostly employ intra block copy (IntraBC) prediction mode. The intraBC prediction process in HEVC is performed by using the already reconstructed region for block matching, which is similar to motion compensation. If BO process is applied after one coding tree unit (CTU) encoded, the distortion between original and reconstructed samples copied by the IntraBC prediction will be further reduced, which is simple to operate and can obtain good coding efficiency. Experimental results show that the proposed scheme achieves up to 3.4% BD-rate reduction in All-intra (AI) for screen content sequences with encoding and decoding time no increase.

  • Edge-Based Adaptive Sampling for Image Block Compressive Sensing

    Lijing MA  Huihui BAI  Mengmeng ZHANG  Yao ZHAO  

     
    LETTER-Image

      Vol:
    E99-A No:11
      Page(s):
    2095-2098

    In this paper, a novel scheme of the adaptive sampling of block compressive sensing is proposed for natural images. In view of the contents of images, the edge proportion in a block can be used to represent its sparsity. Furthermore, according to the edge proportion, the adaptive sampling rate can be adaptively allocated for better compressive sensing recovery. Given that there are too many blocks in an image, it may lead to a overhead cost for recording the ratio of measurement of each block. Therefore, K-means method is applied to classify the blocks into clusters and for each cluster a kind of ratio of measurement can be allocated. In addition, we design an iterative termination condition to reduce time-consuming in the iteration of compressive sensing recovery. The experimental results show that compared with the corresponding methods, the proposed scheme can acquire a better reconstructed image at the same sampling rate.

  • Fast Algorithm Based on Rough LCU Minimum Depth Prediction and Early CU Partition Termination for HEVC Intra Coding

    Mengmeng ZHANG  Heng ZHANG  Zhi LIU  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:2
      Page(s):
    634-638

    The new generation video standard, i.e., High-efficiency Video Coding (HEVC), shows a significantly improved efficiency relative to the last standard, i.e., H.264. However, the quad tree structured coding units (CUs), which are adopted in HEVC to improve compression efficiency, cause high computational complexity. In this study, a novel fast algorithm is proposed for CU partition in intra coding to reduce the computational complexity. A rough minimum depth prediction of the largest CU method and an early termination method for CU partition based on the total coding bits of the current CU are employed. Many approaches have been proposed to reduce the encoding complexity of HEVC, but these methods do not use the total coding bits of the current CU as the main basis for judgment to judge the CU complexity. Compared with the reference software HM16.6, the proposed algorithm reduces encoding time by 45% on average and achieves an approximately 1.1% increase in Bjntegaard delta bit rate and a negligible peak signal-to-noise ratio loss.