The search functionality is under construction.

Keyword Search Result

[Keyword] computer vision(60hit)

1-20hit(60hit)

  • A Ranking Information Based Network for Facial Beauty Prediction Open Access

    Haochen LYU  Jianjun LI  Yin YE  Chin-Chen CHANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/01/26
      Vol:
    E107-D No:6
      Page(s):
    772-780

    The purpose of Facial Beauty Prediction (FBP) is to automatically assess facial attractiveness based on human aesthetics. Most neural network-based prediction methods do not consider the ranking information in the task. For scoring tasks like facial beauty prediction, there is abundant ranking information both between images and within images. Reasonable utilization of these information during training can greatly improve the performance of the model. In this paper, we propose a novel end-to-end Convolutional Neural Network (CNN) model based on ranking information of images, incorporating a Rank Module and an Adaptive Weight Module. We also design pairwise ranking loss functions to fully leverage the ranking information of images. Considering training efficiency and model inference capability, we choose ResNet-50 as the backbone network. We conduct experiments on the SCUT-FBP5500 dataset and the results show that our model achieves a new state-of-the-art performance. Furthermore, ablation experiments show that our approach greatly contributes to improving the model performance. Finally, the Rank Module with the corresponding ranking loss is plug-and-play and can be extended to any CNN model and any task with ranking information. Code is available at https://github.com/nehcoah/Rank-Info-Net.

  • Lightweight and Fast Low-Light Image Enhancement Method Based on PoolFormer

    Xin HU  Jinhua WANG  Sunhan XU  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2023/10/05
      Vol:
    E107-D No:1
      Page(s):
    157-160

    Images captured in low-light environments have low visibility and high noise, which will seriously affect subsequent visual tasks such as target detection and face recognition. Therefore, low-light image enhancement is of great significance in obtaining high-quality images and is a challenging problem in computer vision tasks. A low-light enhancement model, LLFormer, based on the Vision Transformer, uses axis-based multi-head self-attention and a cross-layer attention fusion mechanism to reduce the complexity and achieve feature extraction. This algorithm can enhance images well. However, the calculation of the attention mechanism is complex and the number of parameters is large, which limits the application of the model in practice. In response to this problem, a lightweight module, PoolFormer, is used to replace the attention module with spatial pooling, which can increase the parallelism of the network and greatly reduce the number of model parameters. To suppress image noise and improve visual effects, a new loss function is constructed for model optimization. The experiment results show that the proposed method not only reduces the number of parameters by 49%, but also performs better in terms of image detail restoration and noise suppression compared with the baseline model. On the LOL dataset, the PSNR and SSIM were 24.098dB and 0.8575 respectively. On the MIT-Adobe FiveK dataset, the PSNR and SSIM were 27.060dB and 0.9490. The evaluation results on the two datasets are better than the current mainstream low-light enhancement algorithms.

  • Computer Vision-Based Tracking of Workers in Construction Sites Based on MDNet

    Wen LIU  Yixiao SHAO  Shihong ZHAI  Zhao YANG  Peishuai CHEN  

     
    PAPER-Smart Industry

      Pubricized:
    2022/10/20
      Vol:
    E106-D No:5
      Page(s):
    653-661

    Automatic continuous tracking of objects involved in a construction project is required for such tasks as productivity assessment, unsafe behavior recognition, and progress monitoring. Many computer-vision-based tracking approaches have been investigated and successfully tested on construction sites; however, their practical applications are hindered by the tracking accuracy limited by the dynamic, complex nature of construction sites (i.e. clutter with background, occlusion, varying scale and pose). To achieve better tracking performance, a novel deep-learning-based tracking approach called the Multi-Domain Convolutional Neural Networks (MD-CNN) is proposed and investigated. The proposed approach consists of two key stages: 1) multi-domain representation of learning; and 2) online visual tracking. To evaluate the effectiveness and feasibility of this approach, it is applied to a metro project in Wuhan China, and the results demonstrate good tracking performance in construction scenarios with complex background. The average distance error and F-measure for the MDNet are 7.64 pixels and 67, respectively. The results demonstrate that the proposed approach can be used by site managers to monitor and track workers for hazard prevention in construction sites.

  • Learning Multi-Level Features for Improved 3D Reconstruction

    Fairuz SAFWAN MAHAD  Masakazu IWAMURA  Koichi KISE  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/12/08
      Vol:
    E106-D No:3
      Page(s):
    381-390

    3D reconstruction methods using neural networks are popular and have been studied extensively. However, the resulting models typically lack detail, reducing the quality of the 3D reconstruction. This is because the network is not designed to capture the fine details of the object. Therefore, in this paper, we propose two networks designed to capture both the coarse and fine details of the object to improve the reconstruction of the detailed parts of the object. To accomplish this, we design two networks. The first network uses a multi-scale architecture with skip connections to associate and merge features from other levels. For the second network, we design a multi-branch deep generative network that separately learns the local features, generic features, and the intermediate features through three different tailored components. In both network architectures, the principle entails allowing the network to learn features at different levels that can reconstruct the fine parts and the overall shape of the reconstructed 3D model. We show that both of our methods outperformed state-of-the-art approaches.

  • Anomaly Detection Using Spatio-Temporal Context Learned by Video Clip Sorting

    Wen SHAO  Rei KAWAKAMI  Takeshi NAEMURA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/02/08
      Vol:
    E105-D No:5
      Page(s):
    1094-1102

    Previous studies on anomaly detection in videos have trained detectors in which reconstruction and prediction tasks are performed on normal data so that frames on which their task performance is low will be detected as anomalies during testing. This paper proposes a new approach that involves sorting video clips, by using a generative network structure. Our approach learns spatial contexts from appearances and temporal contexts from the order relationship of the frames. Experiments were conducted on four datasets, and we categorized the anomalous sequences by appearance and motion. Evaluations were conducted not only on each total dataset but also on each of the categories. Our method improved detection performance on both anomalies with different appearance and different motion from normality. Moreover, combining our approach with a prediction method produced improvements in precision at a high recall.

  • Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering

    Xiang SHEN  Dezhi HAN  Chin-Chen CHANG  Liang ZONG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/01/06
      Vol:
    E105-D No:4
      Page(s):
    785-796

    Visual Question Answering (VQA) is multi-task research that requires simultaneous processing of vision and text. Recent research on the VQA models employ a co-attention mechanism to build a model between the context and the image. However, the features of questions and the modeling of the image region force irrelevant information to be calculated in the model, thus affecting the performance. This paper proposes a novel dual self-guided attention with sparse question networks (DSSQN) to address this issue. The aim is to avoid having irrelevant information calculated into the model when modeling the internal dependencies on both the question and image. Simultaneously, it overcomes the coarse interaction between sparse question features and image features. First, the sparse question self-attention (SQSA) unit in the encoder calculates the feature with the highest weight. From the self-attention learning of question words, the question features of larger weights are reserved. Secondly, sparse question features are utilized to guide the focus on image features to obtain fine-grained image features, and to also prevent irrelevant information from being calculated into the model. A dual self-guided attention (DSGA) unit is designed to improve modal interaction between questions and images. Third, the sparse question self-attention of the parameter δ is optimized to select these question-related object regions. Our experiments with VQA 2.0 benchmark datasets demonstrate that DSSQN outperforms the state-of-the-art methods. For example, the accuracy of our proposed model on the test-dev and test-std is 71.03% and 71.37%, respectively. In addition, we show through visualization results that our model can pay more attention to important features than other advanced models. At the same time, we also hope that it can promote the development of VQA in the field of artificial intelligence (AI).

  • Learning Pyramidal Feature Hierarchy for 3D Reconstruction

    Fairuz Safwan MAHAD  Masakazu IWAMURA  Koichi KISE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/11/16
      Vol:
    E105-D No:2
      Page(s):
    446-449

    Neural network-based three-dimensional (3D) reconstruction methods have produced promising results. However, they do not pay particular attention to reconstructing detailed parts of objects. This occurs because the network is not designed to capture the fine details of objects. In this paper, we propose a network designed to capture both the coarse and fine details of objects to improve the reconstruction of the fine parts of objects.

  • Video Inpainting by Frame Alignment with Deformable Convolution

    Yusuke HARA  Xueting WANG  Toshihiko YAMASAKI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/04/22
      Vol:
    E104-D No:8
      Page(s):
    1349-1358

    Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.

  • Entropy Based Illumination-Invariant Foreground Detection

    Karthikeyan PANJAPPAGOUNDER RAJAMANICKAM  Sakthivel PERIYASAMY  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2019/04/18
      Vol:
    E102-D No:7
      Page(s):
    1434-1437

    Background subtraction algorithms generate a background model of the monitoring scene and compare the background model with the current video frame to detect foreground objects. In general, most of the background subtraction algorithms fail to detect foreground objects when the scene illumination changes. An entropy based background subtraction algorithm is proposed to address this problem. The proposed method adapts to illumination changes by updating the background model according to differences in entropy value between the current frame and the previous frame. This entropy based background modeling can efficiently handle both sudden and gradual illumination variations. The proposed algorithm is tested in six video sequences and compared with four algorithms to demonstrate its efficiency in terms of F-score, similarity and frame rate.

  • Using Temporal Correlation to Optimize Stereo Matching in Video Sequences

    Ming LI  Li SHI  Xudong CHEN  Sidan DU  Yang LI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2019/03/01
      Vol:
    E102-D No:6
      Page(s):
    1183-1196

    The large computational complexity makes stereo matching a big challenge in real-time application scenario. The problem of stereo matching in a video sequence is slightly different with that in a still image because there exists temporal correlation among video frames. However, no existing method considered temporal consistency of disparity for algorithm acceleration. In this work, we proposed a scheme called the dynamic disparity range (DDR) to optimize matching cost calculation and cost aggregation steps by narrowing disparity searching range, and a scheme called temporal cost aggregation path to optimize the cost aggregation step. Based on the schemes, we proposed the DDR-SGM and the DDR-MCCNN algorithms for the stereo matching in video sequences. Evaluation results showed that the proposed algorithms significantly reduced the computational complexity with only very slight loss of accuracy. We proved that the proposed optimizations for the stereo matching are effective and the temporal consistency in stereo video is highly useful for either improving accuracy or reducing computational complexity.

  • Transform Electric Power Curve into Dynamometer Diagram Image Using Deep Recurrent Neural Network

    Junfeng SHI  Wenming MA  Peng SONG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/05/09
      Vol:
    E101-D No:8
      Page(s):
    2154-2158

    To learn the working situation of rod-pumped wells under ground, we always need to analyze dynamometer diagrams, which are generated by the load sensor and displacement sensor. Rod-pumped wells are usually located in the places with extreme weather, and these sensors are installed on some special oil equipments in the open air. As time goes by, sensors are prone to generating unstable and incorrect data. Unfortunately, load sensors are too expensive to frequently reinstall. Therefore, the resulting dynamometer diagrams sometimes cannot make an accurate diagnosis. Instead, as an absolutely necessary equipment of the rod-pumped well, the electric motor has much longer life and cannot be easily impacted by the weather. The electric power curve during a swabbing period can also reflect the working situation under ground, but is much harder to explain than the dynamometer diagram. This letter presented a novel deep learning architecture, which can transform the electric power curve into the dimensionless dynamometer diagram image. We conduct our experiments on a real-world dataset, and the results show that our method can get an impressive transformation accuracy.

  • HOG-Based Object Detection Processor Design Using ASIP Methodology

    Shanlin XIAO  Tsuyoshi ISSHIKI  Dongju LI  Hiroaki KUNIEDA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E100-A No:12
      Page(s):
    2972-2984

    Object detection is an essential and expensive process in many computer vision systems. Standard off-the-shelf embedded processors are hard to achieve performance-power balance for implementation of object detection applications. In this work, we explore an Application Specific Instruction set Processor (ASIP) for object detection using Histogram of Oriented Gradients (HOG) feature. Algorithm simplifications are adopted to reduce memory bandwidth requirements and mathematical complexity without losing reliability. Also, parallel histogram generation and on-the-fly Support Vector Machine (SVM) calculation architecture are employed to reduce the necessary cycle counts. The HOG algorithm on the proposed ASIP was accelerated by a factor of 63x compared to the pure software implementation. The ASIP was synthesized for a standard 90nm CMOS library, with a silicon area of 1.31mm2 and 47.8mW power consumption at a 200MHz frequency. Our object detection processor can achieve 42 frames-per-second (fps) on VGA video. The evaluation and implementation results show that the proposed ASIP is both area-efficient and power-efficient while being competitive with commercial CPUs/DSPs. Furthermore, our ASIP exhibits comparable performance even with hard-wire designs.

  • Bit-Quad-Based Euler Number Computing

    Bin YAO  Lifeng HE  Shiying KANG  Xiao ZHAO  Yuyan CHAO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2017/06/20
      Vol:
    E100-D No:9
      Page(s):
    2197-2204

    The Euler number of a binary image is an important topological property for pattern recognition, image analysis, and computer vision. A famous method for computing the Euler number of a binary image is by counting certain patterns of bit-quads in the image, which has been improved by scanning three rows once to process two bit-quads simultaneously. This paper studies the bit-quad-based Euler number computing problem. We show that for a bit-quad-based Euler number computing algorithm, with the increase of the number of bit-quads being processed simultaneously, on the one hand, the average number of pixels to be checked for processing a bit-quad will decrease in theory, and on the other hand, the length of the codes for implementing the algorithm will increase, which will make the algorithm less efficient in practice. Experimental results on various types of images demonstrated that scanning five rows once and processing four bit-quads simultaneously is the optimal tradeoff, and that the optimal bit-quad-based Euler number computing algorithm is more efficient than other Euler number computing algorithms.

  • Design of an Application Specific Instruction Set Processor for Real-Time Object Detection Using AdaBoost Algorithm

    Shanlin XIAO  Tsuyoshi ISSHIKI  Dongju LI  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E100-A No:7
      Page(s):
    1384-1395

    Object detection is at the heart of nearly all the computer vision systems. Standard off-the-shelf embedded processors are hard to meet the trade-offs among performance, power consumption and flexibility required by object detection applications. Therefore, this paper presents an Application Specific Instruction set Processor (ASIP) for object detection using AdaBoost-based learning algorithm with Haar-like features as weak classifiers. Algorithm optimizations are employed to reduce memory bandwidth requirements without losing reliability. In the proposed ASIP, Single Instruction Multiple Data (SIMD) architecture is adopted for fully exploiting data-level parallelism inherent to the target algorithm. With adding pipeline stages, application-specific hardware components and custom instructions, the AdaBoost algorithm is accelerated by a factor of 13.7x compared to the optimized pure software implementation. Compared with ARM946 and TMS320C64+, our ASIP shows 32x and 7x better throughput, 10x and 224x better area efficiency, 6.8x and 18.8x better power efficiency, respectively. Furthermore, compared to hard-wired designs, evaluation results show an advantage of the proposed architecture in terms of chip area efficiency while maintain a reliable performance and achieve real-time object detection at 32fps on VGA video.

  • Inter-Person Occlusion Handling with Social Interaction for Online Multi-Pedestrian Tracking

    Yuke LI  Weiming SHEN  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2016/09/15
      Vol:
    E99-D No:12
      Page(s):
    3165-3171

    Inter-person occlusion handling is a critical issue in the field of tracking, and it has been extensively researched. Several state-of-the-art methods have been proposed, such as focusing on the appearance of the targets or utilizing knowledge of the scene. In contrast with the approaches proposed in the literature, we propose to address this issue using a social interaction model, which allows us to explore spatio-temporal information pertaining to the targets involved in the occlusion situation. Our experimental results show promising results compared with those obtained using other methods.

  • A Further Improvement on Bit-Quad-Based Euler Number Computing Algorithm

    Bin YAO  Lifeng HE  Shiying KANG  Xiao ZHAO  Yuyan CHAO  

     
    LETTER-Pattern Recognition

      Pubricized:
    2015/10/30
      Vol:
    E99-D No:2
      Page(s):
    545-549

    The Euler number is an important topological property in a binary image, and it can be computed by counting certain bit-quads in the binary image. This paper proposes a further improved bit-quad-based algorithm for computing the Euler number. By scanning image rows two by two and utilizing the information obtained while processing the previous pixels, the number of pixels to be checked for processing a bit-quad can be decreased from 2 to 1.5. Experimental results demonstrated that our proposed algorithm significantly outperforms conventional Euler number computing algorithms.

  • A Graph-Theory-Based Algorithm for Euler Number Computing

    Lifeng HE  Bin YAO  Xiao ZHAO  Yun YANG  Yuyan CHAO  Atsushi OHTA  

     
    LETTER-Pattern Recognition

      Pubricized:
    2014/11/10
      Vol:
    E98-D No:2
      Page(s):
    457-461

    This paper proposes a graph-theory-based Euler number computing algorithm. According to the graph theory and the analysis of a mask's configuration, the Euler number of a binary image in our algorithm is calculated by counting four patterns of the mask. Unlike most conventional Euler number computing algorithms, we do not need to do any processing of the background pixels. Experimental results demonstrated that our algorithm is much more efficient than conventional Euler number computing algorithms.

  • An Efficient Two-Scan Labeling Algorithm for Binary Hexagonal Images

    Lifeng HE  Xiao ZHAO  Bin YAO  Yun YANG  Yuyan CHAO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2014/08/27
      Vol:
    E97-D No:12
      Page(s):
    3244-3247

    This paper proposes an efficient two-scan labeling algorithm for binary hexagonal images. Unlike conventional labeling algorithms, which process pixels one by one in the first scan, our algorithm processes pixels two by two. We show that using our algorithm, we can check a smaller number of pixels. Experimental results demonstrated that our method is more efficient than the algorithm extended straightly from the corresponding labeling algorithm for rectangle binary images.

  • An Efficient Strategy for Bit-Quad-Based Euler Number Computing Algorithm

    Bin YAO  Hua WU  Yun YANG  Yuyan CHAO  Atsushi OHTA  Haruki KAWANAKA  Lifeng HE  

     
    LETTER-Pattern Recognition

      Vol:
    E97-D No:5
      Page(s):
    1374-1378

    The Euler number of a binary image is an important topological property for pattern recognition, and can be calculated by counting certain bit-quads in the image. This paper proposes an efficient strategy for improving the bit-quad-based Euler number computing algorithm. By use of the information obtained when processing the previous bit quad, the number of times that pixels must be checked in processing a bit quad decreases from 4 to 2. Experiments demonstrate that an algorithm with our strategy significantly outperforms conventional Euler number computing algorithms.

  • Single-View Sketch Based Surface Modeling

    Alexis ANDRE  Suguru SAITO  Masayuki NAKAJIMA  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E92-D No:6
      Page(s):
    1304-1311

    We propose a sketch-based modeling system where all user input is performed from a unique viewpoint. The strokes drawn by the user must not then be restricted to the drawing plane: their orientation in the 3D space is automatically determined by the system. The desired surface is reconstructed from a grid made of two groups of similar lines, that are considered co-planar. The orientation of the two sets of planes is determined by assuming that at the intersection of a representative line of each group, those two lines are perpendicular.

1-20hit(60hit)