The search functionality is under construction.

Keyword Search Result

[Keyword] depth estimation(10hit)

1-10hit
  • 2D Human Skeleton Action Recognition Based on Depth Estimation Open Access

    Lei WANG  Shanmin YANG  Jianwei ZHANG  Song GU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/02/27
      Vol:
    E107-D No:7
      Page(s):
    869-877

    Human action recognition (HAR) exhibits limited accuracy in video surveillance due to the 2D information captured with monocular cameras. To address the problem, a depth estimation-based human skeleton action recognition method (SARDE) is proposed in this study, with the aim of transforming 2D human action data into 3D format to dig hidden action clues in the 2D data. SARDE comprises two tasks, i.e., human skeleton action recognition and monocular depth estimation. The two tasks are integrated in a multi-task manner in end-to-end training to comprehensively utilize the correlation between action recognition and depth estimation by sharing parameters to learn the depth features effectively for human action recognition. In this study, graph-structured networks with inception blocks and skip connections are investigated for depth estimation. The experimental results verify the effectiveness and superiority of the proposed method in skeleton action recognition that the method reaches state-of-the-art on the datasets.

  • Projection-Based Physical Adversarial Attack for Monocular Depth Estimation

    Renya DAIMO  Satoshi ONO  

     
    LETTER

      Pubricized:
    2022/10/17
      Vol:
    E106-D No:1
      Page(s):
    31-35

    Monocular depth estimation has improved drastically due to the development of deep neural networks (DNNs). However, recent studies have revealed that DNNs for monocular depth estimation contain vulnerabilities that can lead to misestimation when perturbations are added to input. This study investigates whether DNNs for monocular depth estimation is vulnerable to misestimation when patterned light is projected on an object using a video projector. To this end, this study proposes an evolutionary adversarial attack method with multi-fidelity evaluation scheme that allows creating adversarial examples under black-box condition while suppressing the computational cost. Experiments in both simulated and real scenes showed that the designed light pattern caused a DNN to misestimate objects as if they have moved to the back.

  • Smaller Residual Network for Single Image Depth Estimation

    Andi HENDRA  Yasushi KANAZAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/08/17
      Vol:
    E104-D No:11
      Page(s):
    1992-2001

    We propose a new framework for estimating depth information from a single image. Our framework is relatively small and straightforward by employing a two-stage architecture: a residual network and a simple decoder network. Our residual network in this paper is a remodeled of the original ResNet-50 architecture, which consists of only thirty-eight convolution layers in the residual block following by pair of two up-sampling and layers. While the simple decoder network, stack of five convolution layers, accepts the initial depth to be refined as the final output depth. During training, we monitor the loss behavior and adjust the learning rate hyperparameter in order to improve the performance. Furthermore, instead of using a single common pixel-wise loss, we also compute loss based on gradient-direction, and their structure similarity. This setting in our network can significantly reduce the number of network parameters, and simultaneously get a more accurate image depth map. The performance of our approach has been evaluated by conducting both quantitative and qualitative comparisons with several prior related methods on the publicly NYU and KITTI datasets.

  • Simultaneous Attack on CNN-Based Monocular Depth Estimation and Optical Flow Estimation

    Koichiro YAMANAKA  Keita TAKAHASHI  Toshiaki FUJII  Ryuraroh MATSUMOTO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/02/08
      Vol:
    E104-D No:5
      Page(s):
    785-788

    Thanks to the excellent learning capability of deep convolutional neural networks (CNNs), CNN-based methods have achieved great success in computer vision and image recognition tasks. However, it has turned out that these methods often have inherent vulnerabilities, which makes us cautious of the potential risks of using them for real-world applications such as autonomous driving. To reveal such vulnerabilities, we propose a method of simultaneously attacking monocular depth estimation and optical flow estimation, both of which are common artificial-intelligence-based tasks that are intensively investigated for autonomous driving scenarios. Our method can generate an adversarial patch that can fool CNN-based monocular depth estimation and optical flow estimation methods simultaneously by simply placing the patch in the input images. To the best of our knowledge, this is the first work to achieve simultaneous patch attacks on two or more CNNs developed for different tasks.

  • Simultaneous Estimation of Object Region and Depth in Participating Media Using a ToF Camera

    Yuki FUJIMURA  Motoharu SONOGASHIRA  Masaaki IIYAMA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2019/12/03
      Vol:
    E103-D No:3
      Page(s):
    660-673

    Three-dimensional (3D) reconstruction and scene depth estimation from 2-dimensional (2D) images are major tasks in computer vision. However, using conventional 3D reconstruction techniques gets challenging in participating media such as murky water, fog, or smoke. We have developed a method that uses a continuous-wave time-of-flight (ToF) camera to estimate an object region and depth in participating media simultaneously. The scattered light observed by the camera is saturated, so it does not depend on the scene depth. In addition, received signals bouncing off distant points are negligible due to light attenuation, and thus the observation of such a point contains only a scattering component. These phenomena enable us to estimate the scattering component in an object region from a background that only contains the scattering component. The problem is formulated as robust estimation where the object region is regarded as outliers, and it enables the simultaneous estimation of an object region and depth on the basis of an iteratively reweighted least squares (IRLS) optimization scheme. We demonstrate the effectiveness of the proposed method using captured images from a ToF camera in real foggy scenes and evaluate the applicability with synthesized data.

  • Depth from Defocus Technique Based on Cross Reblurring

    Kazumi TAKEMURA  Toshiyuki YOSHIDA  

     
    PAPER

      Pubricized:
    2019/07/11
      Vol:
    E102-D No:11
      Page(s):
    2083-2092

    This paper proposes a novel Depth From Defocus (DFD) technique based on the property that two images having different focus settings coincide if they are reblurred with the opposite focus setting, which is referred to as the “cross reblurring” property in this paper. Based on the property, the proposed technique estimates the block-wise depth profile for a target object by minimizing the mean squared error between the cross-reblurred images. Unlike existing DFD techniques, the proposed technique is free of lens parameters and independent of point spread function models. A compensation technique for a possible pixel disalignment between images is also proposed to improve the depth estimation accuracy. The experimental results and comparisons with the other DFD techniques show the advantages of our technique.

  • Fast Single Image De-Hazing Using Characteristics of RGB Channel of Foggy Image

    Dubok PARK  David K. HAN  Changwon JEON  Hanseok KO  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E96-D No:8
      Page(s):
    1793-1799

    Images captured under foggy conditions often exhibit poor contrast and color. This is primarily due to the air-light which degrades image quality exponentially with fog depth between the scene and the camera. In this paper, we restore fog-degraded images by first estimating depth using the physical model characterizing the RGB channels in a single monocular image. The fog effects are then removed by subtracting the estimated irradiance, which is empirically related to the scene depth information obtained, from the total irradiance received by the sensor. Effective restoration of color and contrast of images taken under foggy conditions are demonstrated. In the experiments, we validate the effectiveness of our method compared with conventional method.

  • Pedestrian Detection with Sparse Depth Estimation

    Yu WANG  Jien KATO  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E94-D No:8
      Page(s):
    1690-1699

    In this paper, we deal with the pedestrian detection task in outdoor scenes. Because of the complexity of such scenes, generally used gradient-feature-based detectors do not work well on them. We propose to use sparse 3D depth information as an additional cue to do the detection task, in order to achieve a fast improvement in performance. Our proposed method uses a probabilistic model to integrate image-feature-based classification with sparse depth estimation. Benefiting from the depth estimates, we map the prior distribution of human's actual height onto the image, and update the image-feature-based classification result probabilistically. We have two contributions in this paper: 1) a simplified graphical model which can efficiently integrate depth cue in detection; and 2) a sparse depth estimation method which could provide fast and reliable estimation of depth information. An experiment shows that our method provides a promising enhancement over baseline detector within minimal additional time.

  • Design and Implementation of a Real-Time Video-Based Rendering System Using a Network Camera Array

    Yuichi TAGUCHI  Keita TAKAHASHI  Takeshi NAEMURA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E92-D No:7
      Page(s):
    1442-1452

    We present a real-time video-based rendering system using a network camera array. Our system consists of 64 commodity network cameras that are connected to a single PC through a gigabit Ethernet. To render a high-quality novel view, our system estimates a view-dependent per-pixel depth map in real time by using a layered representation. The rendering algorithm is fully implemented on the GPU, which allows our system to efficiently perform capturing and rendering processes as a pipeline by using the CPU and GPU independently. Using QVGA input video resolution, our system renders a free-viewpoint video at up to 30 frames per second, depending on the output video resolution and the number of depth layers. Experimental results show high-quality images synthesized from various scenes.

  • Electromagnetic Scattering Analysis for Crack Depth Estimation

    Hidenori SEKIGUCHI  Hiroshi SHIRAI  

     
    PAPER

      Vol:
    E86-C No:11
      Page(s):
    2224-2229

    A simple non-destructive depth estimation method for a crack on a metal surface has been proposed. This method is based on our finding that the electromagnetic back scattering from a narrow trough (crack model) on the ground plane causes periodical nulls (dips) as the frequency changes, and the first dip occurs when the depth of the crack becomes nearly one half of the incident wavelength. Dependencies of the crack's aperture and the incident angle have also been studied from rigorous and numerical analyses, and considered as our depth estimation parameters. A simple estimation formula for a crack depth has been derived from these studies. Test measurement has been made to check the accuracy of our estimation formula. Time domain gating process is utilized for isolating the crack scattering spectra buried in the measured frequency RCS data. Tested crack types are a narrow rectangular, a tapered, and a stair approximated crack shapes. It is found that the depth of these cracks can be measured within 3 percent error by our estimation method.