The search functionality is under construction.

Author Search Result

[Author] Zifen HE(9hit)

1-9hit
  • Large Displacement Dynamic Scene Segmentation through Multiscale Saliency Flow

    Yinhui ZHANG  Zifen HE  

     
    PAPER-Pattern Recognition

      Pubricized:
    2016/03/30
      Vol:
    E99-D No:7
      Page(s):
    1871-1876

    Most unsupervised video segmentation algorithms are difficult to handle object extraction in dynamic real-world scenes with large displacements, as foreground hypothesis is often initialized with no explicit mutual constraint on top-down spatio-temporal coherency despite that it may be imposed to the segmentation objective. To handle such situations, we propose a multiscale saliency flow (MSF) model that jointly learns both foreground and background features of multiscale salient evidences, hence allowing temporally coherent top-down information in one frame to be propagated throughout the remaining frames. In particular, the top-down evidences are detected by combining saliency signature within a certain range of higher scales of approximation coefficients in wavelet domain. Saliency flow is then estimated by Gaussian kernel correlation of non-maximal suppressed multiscale evidences, which are characterized by HOG descriptors in a high-dimensional feature space. We build the proposed MSF model in accordance with the primary object hypothesis that jointly integrates temporal consistent constraints of saliency map estimated at multiple scales into the objective. We demonstrate the effectiveness of the proposed multiscale saliency flow for segmenting dynamic real-world scenes with large displacements caused by uniform sampling of video sequences.

  • Robust Segmentation of Highly Dynamic Scene with Missing Data

    Yinhui ZHANG  Zifen HE  Changyu LIU  

     
    LETTER-Pattern Recognition

      Pubricized:
    2014/09/29
      Vol:
    E98-D No:1
      Page(s):
    201-205

    Segmenting foreground objects from highly dynamic scenes with missing data is very challenging. We present a novel unsupervised segmentation approach that can cope with extensive scene dynamic as well as a substantial amount of missing data that present in dynamic scene. To make this possible, we exploit convex optimization of total variation beforehand for images with missing data in which depletion mask is available. Inpainting depleted images using total variation facilitates detecting ambiguous objects from highly dynamic images, because it is more likely to yield areas of object instances with improved grayscale contrast. We use a conditional random field that adapts to integrate both appearance and motion knowledge of the foreground objects. Our approach segments foreground object instances while inpainting the highly dynamic scene with a variety amount of missing data in a coupled way. We demonstrate this on a very challenging dataset from the UCSD Highly Dynamic Scene Benchmarks (HDSB) and compare our method with two state-of-the-art unsupervised image sequence segmentation algorithms and provide quantitative and qualitative performance comparisons.

  • Fusion on the Wavelet Coefficients Scale-Related for Double Encryption Holographic Halftone Watermark Hidden Technology

    Zifen HE  Yinhui ZHANG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2015/03/27
      Vol:
    E98-D No:7
      Page(s):
    1391-1395

    We present a new framework for embedding holographic halftone watermarking data into images by fusion of scale-related wavelet coefficients. The halftone watermarking image is obtained by using error-diffusion method and converted into Fresnel hologram, which is considered to be the initial password. After encryption, a scrambled watermarking image through Arnold transform is embedded into the host image during the halftoning process. We characterize the multi-scale representation of the original image using the discrete wavelet transform. The boundary information of the target image is fused by correlation of wavelet coefficients across wavelet transform layers to increase the pixel resolution scale. We apply the inter-scale fusion method to gain fusion coefficient of the fine-scale, which takes into account both the detail of the image and approximate information. Using the proposed method, the watermarking information can be embedded into the host image with recovery against the halftoning operation. The experimental results show that the proposed approach provides security and robustness against JPEG compression and different attacks compared to previous alternatives.

  • Semantic Motion Signature for Segmentation of High Speed Large Displacement Objects

    Yinhui ZHANG  Zifen HE  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2016/10/05
      Vol:
    E100-D No:1
      Page(s):
    220-224

    This paper presents a novel method for unsupervised segmentation of objects with large displacements in high speed video sequences. Our general framework introduces a new foreground object predicting method that finds object hypotheses by encoding both spatial and temporal features via a semantic motion signature scheme. More specifically, temporal cues of object hypotheses are captured by the motion signature proposed in this paper, which is derived from sparse saliency representation imposed on magnitude of optical flow field. We integrate semantic scores derived from deep networks with location priors that allows us to directly estimate appearance potentials of foreground hypotheses. A unified MRF energy functional is proposed to simultaneously incorporate the information from the motion signature and semantic prediction features. The functional enforces both spatial and temporal consistency and impose appearance constancy and spatio-temporal smoothness constraints directly on the object hypotheses. It inherently handles the challenges of segmenting ambiguous objects with large displacements in high speed videos. Our experiments on video object segmentation benchmarks demonstrate the effectiveness of the proposed method for segmenting high speed objects despite the complicated scene dynamics and large displacements.

  • Video Object Segmentation of Dynamic Scenes with Large Displacements

    Yinhui ZHANG  Zifen HE  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2015/06/17
      Vol:
    E98-D No:9
      Page(s):
    1719-1723

    Segmenting foreground objects in unconstrained dynamic scenes still remains a difficult problem. We present a novel unsupervised segmentation approach that allows robust object segmentation of dynamic scenes with large displacements. To make this possible, we project motion based foreground region hypotheses generated via standard optical flow onto visual saliency regions. The motion hypotheses correspond to inside seeds mapping of the motion boundary. For visual saliency, we generalize the image signature method from images to videos to delineate saliency mapping of object proposals. The mapping of image signatures estimated in Discrete Cosine Transform (DCT) domain favor stand-out regions in the human visual system. We leverage a Markov random field built on superpixels to impose both spatial and temporal consistence constraints on the motion-saliency combined segments. Projecting salient regions via an image signature with inside mapping seeds facilitates segmenting ambiguous objects from unconstrained dynamic scenes in presence of large displacements. We demonstrate the performance on fourteen challenging unconstrained dynamic scenes, compare our method with two state-of-the-art unsupervised video segmentation algorithms, and provide quantitative and qualitative performance comparisons.

  • Spatial-Temporal Aggregated Shuffle Attention for Video Instance Segmentation of Traffic Scene

    Chongren ZHAO  Yinhui ZHANG  Zifen HE  Yunnan DENG  Ying HUANG  Guangchen CHEN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2022/11/24
      Vol:
    E106-D No:2
      Page(s):
    240-251

    Aiming at the problem of spatial focus regions distribution dispersion and dislocation in feature pyramid networks and insufficient feature dependency acquisition in both spatial and channel dimensions, this paper proposes a spatial-temporal aggregated shuffle attention for video instance segmentation (STASA-VIS). First, an mixed subsampling (MS) module to embed activating features from the low-level target area of feature pyramid into the high-level is designed, so as to aggregate spatial information on target area. Taking advantage of the coherent information in video frames, STASA-VIS uses the first ones of every 5 video frames as the key-frames and then propagates the keyframe feature maps of the pyramid layers forward in the time domain, and fuses with the non-keyframe mixed subsampled features to achieve time-domain consistent feature aggregation. Finally, STASA-VIS embeds shuffle attention in the backbone to capture the pixel-level pairwise relationship and dimensional dependencies among the channels and reduce the computation. Experimental results show that the segmentation accuracy of STASA-VIS reaches 41.2%, and the test speed reaches 34FPS, which is better than the state-of-the-art one stage video instance segmentation (VIS) methods in accuracy and achieves real-time segmentation.

  • GECNN for Weakly Supervised Semantic Segmentation of 3D Point Clouds

    Zifen HE  Shouye ZHU  Ying HUANG  Yinhui ZHANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/09/24
      Vol:
    E104-D No:12
      Page(s):
    2237-2243

    This paper presents a novel method for weakly supervised semantic segmentation of 3D point clouds using a novel graph and edge convolutional neural network (GECNN) towards 1% and 10% point cloud with labels. Our general framework facilitates semantic segmentation by encoding both global and local scale features via a parallel graph and edge aggregation scheme. More specifically, global scale graph structure cues of point clouds are captured by a graph convolutional neural network, which is propagated from pairwise affinity representation over the whole graph established in a d-dimensional feature embedding space. We integrate local scale features derived from a dynamic edge feature aggregation convolutional neural networks that allows us to fusion both global and local cues of 3D point clouds. The proposed GECNN model is trained by using a comprehensive objective which consists of incomplete, inexact, self-supervision and smoothness constraints based on partially labeled points. The proposed approach enforces global and local consistency constraints directly on the objective losses. It inherently handles the challenges of segmenting sparse 3D point clouds with limited annotations in a large scale point cloud space. Our experiments on the ShapeNet and S3DIS benchmarks demonstrate the effectiveness of the proposed approach for efficient (within 20 epochs) learning of large scale point cloud semantics despite very limited labels.

  • Nonlinear Regression of Saliency Guided Proposals for Unsupervised Segmentation of Dynamic Scenes

    Yinhui ZHANG  Mohamed ABDEL-MOTTALEB  Zifen HE  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2015/11/06
      Vol:
    E99-D No:2
      Page(s):
    467-474

    This paper proposes an efficient video object segmentation approach that is tolerant to complex scene dynamics. Unlike existing approaches that rely on estimating object-like proposals on an intra-frame basis, the proposed approach employs temporally consistent foreground hypothesis using nonlinear regression of saliency guided proposals across a video sequence. For this purpose, we first generate salient foreground proposals at superpixel level by leveraging a saliency signature in the discrete cosine transform domain. We propose to use a random forest based nonlinear regression scheme to learn both appearance and shape features from salient foreground regions in all frames of a sequence. Availability of such features can help rank every foreground proposals of a sequence, and we show that the regions with high ranking scores are well correlated with semantic foreground objects in dynamic scenes. Subsequently, we utilize a Markov Random Field to integrate both appearance and motion coherence of the top-ranked object proposals. A temporal nonlinear regressor for generating salient object support regions significantly improves the segmentation performance compared to using only per-frame objectness cues. Extensive experiments on challenging real-world video sequences are performed to validate the feasibility and superiority of the proposed approach for addressing dynamic scene segmentation.

  • Digital Halftoning through Approximate Optimization of Scale-Related Perceived Error Metric

    Zifen HE  Yinhui ZHANG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2015/10/20
      Vol:
    E99-D No:1
      Page(s):
    305-308

    This work presents an approximate global optimization method for image halftone by fusing multi-scale information of the tree model. We employ Gaussian mixture model and hidden Markov tree to characterized the intra-scale clustering and inter-scale persistence properties of the detailed coefficients, respectively. The model of multiscale perceived error metric and the theory of scale-related perceived error metric are used to fuse the statistical distribution of the error metric of the scale of clustering and cross-scale persistence. An Energy function is then generated. Through energy minimization via graph cuts, we gain the halftone image. In the related experiment, we demonstrate the superior performance of this new algorithm when compared with several algorithms and quantitative evaluation.