The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] multi-scale feature(6hit)

1-6hit
  • Multi-Scale Contrastive Learning for Human Pose Estimation Open Access

    Wenxia BAO  An LIN  Hua HUANG  Xianjun YANG  Hemu CHEN  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/06/17
      Vol:
    E107-D No:10
      Page(s):
    1332-1341

    Recent years have seen remarkable progress in human pose estimation. However, manual annotation of keypoints remains tedious and imprecise. To alleviate this problem, this paper proposes a novel method called Multi-Scale Contrastive Learning (MSCL). This method uses a siamese network structure with upper and lower branches that capture diffirent views of the same image. Each branch uses a backbone network to extract image representations, employing multi-scale feature vectors to capture information. These feature vectors are then passed through an enhanced feature pyramid for fusion, producing more robust feature representations. The feature vectors are then further encoded by mapping and prediction heads to predict the feature vector of another view. Using negative cosine similarity between vectors as a loss function, the backbone network is pre-trained on a large-scale unlabeled dataset, enhancing its capacity to extract visual representations. Finally, transfer learning is performed on a small amount of labelled data for the pose estimation task. Experiments on COCO datasets show significant improvements in Average Precision (AP) of 1.8%, 0.9%, and 1.2% with 1%, 5%, and 10% labelled data on COCO. In addition, the Percentage of Correct Keypoints (PCK) improves by 0.5% on MPII&AIC, outperforming mainstream contrastive learning methods.

  • Prohibited Item Detection Within X-Ray Security Inspection Images Based on an Improved Cascade Network Open Access

    Qingqi ZHANG  Xiaoan BAO  Ren WU  Mitsuru NAKATA  Qi-Wei GE  

     
    PAPER

      Pubricized:
    2024/01/16
      Vol:
    E107-A No:5
      Page(s):
    813-824

    Automatic detection of prohibited items is vital in helping security staff be more efficient while improving the public safety index. However, prohibited item detection within X-ray security inspection images is limited by various factors, including the imbalance distribution of categories, diversity of prohibited item scales, and overlap between items. In this paper, we propose to leverage the Poisson blending algorithm with the Canny edge operator to alleviate the imbalance distribution of categories maximally in the X-ray images dataset. Based on this, we improve the cascade network to deal with the other two difficulties. To address the prohibited scale diversity problem, we propose the Re-BiFPN feature fusion method, which includes a coordinate attention atrous spatial pyramid pooling (CA-ASPP) module and a recursive connection. The CA-ASPP module can implicitly extract direction-aware and position-aware information from the feature map. The recursive connection feeds the CA-ASPP module processed multi-scale feature map to the bottom-up backbone layer for further multi-scale feature extraction. In addition, a Rep-CIoU loss function is designed to address the overlapping problem in X-ray images. Extensive experimental results demonstrate that our method can successfully identify ten types of prohibited items, such as Knives, Scissors, Pressure, etc. and achieves 83.4% of mAP, which is 3.8% superior to the original cascade network. Moreover, our method outperforms other mainstream methods by a significant margin.

  • Visual Inspection Method for Subway Tunnel Cracks Based on Multi-Kernel Convolution Cascade Enhancement Learning

    Baoxian WANG  Zhihao DONG  Yuzhao WANG  Shoupeng QIN  Zhao TAN  Weigang ZHAO  Wei-Xin REN  Junfang WANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/06/27
      Vol:
    E106-D No:10
      Page(s):
    1715-1722

    As a typical surface defect of tunnel lining structures, cracking disease affects the durability of tunnel structures and poses hidden dangers to tunnel driving safety. Factors such as interference from the complex service environment of the tunnel and the low signal-to-noise ratio of the crack targets themselves, have led to existing crack recognition methods based on semantic segmentation being unable to meet actual engineering needs. Based on this, this paper uses the Unet network as the basic framework for crack identification and proposes to construct a multi-kernel convolution cascade enhancement (MKCE) model to achieve accurate detection and identification of crack diseases. First of all, to ensure the performance of crack feature extraction, the model modified the main feature extraction network in the basic framework to ResNet-50 residual network. Compared with the VGG-16 network, this modification can extract richer crack detail features while reducing model parameters. Secondly, considering that the Unet network cannot effectively perceive multi-scale crack features in the skip connection stage, a multi-kernel convolution cascade enhancement module is proposed by combining a cascaded connection of multi-kernel convolution groups and multi-expansion rate dilated convolution groups. This module achieves a comprehensive perception of local details and the global content of tunnel lining cracks. In addition, to better weaken the effect of tunnel background clutter interference, a convolutional block attention calculation module is further introduced after the multi-kernel convolution cascade enhancement module, which effectively reduces the false alarm rate of crack recognition. The algorithm is tested on a large number of subway tunnel crack image datasets. The experimental results show that, compared with other crack recognition algorithms based on deep learning, the method in this paper has achieved the best results in terms of accuracy and intersection over union (IoU) indicators, which verifies the method in this paper has better applicability.

  • Efficient Multi-Scale Feature Fusion for Image Manipulation Detection

    Yuxue ZHANG  Guorui FENG  

     
    LETTER-Information Network

      Pubricized:
    2022/02/03
      Vol:
    E105-D No:5
      Page(s):
    1107-1111

    Convolutional Neural Network (CNN) has made extraordinary progress in image classification tasks. However, it is less effective to use CNN directly to detect image manipulation. To address this problem, we propose an image filtering layer and a multi-scale feature fusion module which can guide the model more accurately and effectively to perform image manipulation detection. Through a series of experiments, it is shown that our model achieves improvements on image manipulation detection compared with the previous researches.

  • An Autoencoder Based Background Subtraction for Public Surveillance

    Yue LI  Xiaosheng YU  Haijun CAO  Ming XU  

     
    LETTER-Image

      Pubricized:
    2021/04/08
      Vol:
    E104-A No:10
      Page(s):
    1445-1449

    An autoencoder is trained to generate the background from the surveillance image by setting the training label as the shuffled input, instead of the input itself in a traditional autoencoder. Then the multi-scale features are extracted by a sparse autoencoder from the surveillance image and the corresponding background to detect foreground.

  • PSTNet: Crowd Flow Prediction by Pyramidal Spatio-Temporal Network

    Enze YANG  Shuoyan LIU  Yuxin LIU  Kai FANG  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/04/12
      Vol:
    E104-D No:10
      Page(s):
    1780-1783

    Crowd flow prediction in high density urban scenes is involved in a wide range of intelligent transportation and smart city applications, and it has become a significant topic in urban computing. In this letter, a CNN-based framework called Pyramidal Spatio-Temporal Network (PSTNet) for crowd flow prediction is proposed. Spatial encoding is employed for spatial representation of external factors, while prior pyramid enhances feature dependence of spatial scale distances and temporal spans, after that, post pyramid is proposed to fuse the heterogeneous spatio-temporal features of multiple scales. Experimental results based on TaxiBJ and MobileBJ demonstrate that proposed PSTNet outperforms the state-of-the-art methods.