The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] depth(97hit)

1-20hit(97hit)

  • 2D Human Skeleton Action Recognition Based on Depth Estimation Open Access

    Lei WANG  Shanmin YANG  Jianwei ZHANG  Song GU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2024/02/27
      Vol:
    E107-D No:7
      Page(s):
    869-877

    Human action recognition (HAR) exhibits limited accuracy in video surveillance due to the 2D information captured with monocular cameras. To address the problem, a depth estimation-based human skeleton action recognition method (SARDE) is proposed in this study, with the aim of transforming 2D human action data into 3D format to dig hidden action clues in the 2D data. SARDE comprises two tasks, i.e., human skeleton action recognition and monocular depth estimation. The two tasks are integrated in a multi-task manner in end-to-end training to comprehensively utilize the correlation between action recognition and depth estimation by sharing parameters to learn the depth features effectively for human action recognition. In this study, graph-structured networks with inception blocks and skip connections are investigated for depth estimation. The experimental results verify the effectiveness and superiority of the proposed method in skeleton action recognition that the method reaches state-of-the-art on the datasets.

  • Projection-Based Physical Adversarial Attack for Monocular Depth Estimation

    Renya DAIMO  Satoshi ONO  

     
    LETTER

      Pubricized:
    2022/10/17
      Vol:
    E106-D No:1
      Page(s):
    31-35

    Monocular depth estimation has improved drastically due to the development of deep neural networks (DNNs). However, recent studies have revealed that DNNs for monocular depth estimation contain vulnerabilities that can lead to misestimation when perturbations are added to input. This study investigates whether DNNs for monocular depth estimation is vulnerable to misestimation when patterned light is projected on an object using a video projector. To this end, this study proposes an evolutionary adversarial attack method with multi-fidelity evaluation scheme that allows creating adversarial examples under black-box condition while suppressing the computational cost. Experiments in both simulated and real scenes showed that the designed light pattern caused a DNN to misestimate objects as if they have moved to the back.

  • Depth Image Noise Reduction and Super-Resolution by Pixel-Wise Multi-Frame Fusion

    Masahiro MURAYAMA  Toyohiro HIGASHIYAMA  Yuki HARAZONO  Hirotake ISHII  Hiroshi SHIMODA  Shinobu OKIDO  Yasuyoshi TARUTA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2022/03/04
      Vol:
    E105-D No:6
      Page(s):
    1211-1224

    High-quality depth images are required for stable and accurate computer vision. Depth images captured by depth cameras tend to be noisy, incomplete, and of low-resolution. Therefore, increasing the accuracy and resolution of depth images is desirable. We propose a method for reducing the noise and holes from depth images pixel by pixel, and increasing resolution. For each pixel in the target image, the linear space from the focal point of the camera through each pixel to the existing object is divided into equally spaced grids. In each grid, the difference from each grid to the object surface is obtained from multiple tracked depth images, which have noisy depth values of the respective image pixels. Then, the coordinates of the correct object surface are obtainable by reducing the depth random noise. The missing values are completed. The resolution can also be increased by creating new pixels between existing pixels and by then using the same process as that used for noise reduction. Evaluation results have demonstrated that the proposed method can do processing with less GPU memory. Furthermore, the proposed method was able to reduce noise more accurately, especially around edges, and was able to process more details of objects than the conventional method. The super-resolution of the proposed method also produced a high-resolution depth image with smoother and more accurate edges than the conventional methods.

  • Smaller Residual Network for Single Image Depth Estimation

    Andi HENDRA  Yasushi KANAZAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/08/17
      Vol:
    E104-D No:11
      Page(s):
    1992-2001

    We propose a new framework for estimating depth information from a single image. Our framework is relatively small and straightforward by employing a two-stage architecture: a residual network and a simple decoder network. Our residual network in this paper is a remodeled of the original ResNet-50 architecture, which consists of only thirty-eight convolution layers in the residual block following by pair of two up-sampling and layers. While the simple decoder network, stack of five convolution layers, accepts the initial depth to be refined as the final output depth. During training, we monitor the loss behavior and adjust the learning rate hyperparameter in order to improve the performance. Furthermore, instead of using a single common pixel-wise loss, we also compute loss based on gradient-direction, and their structure similarity. This setting in our network can significantly reduce the number of network parameters, and simultaneously get a more accurate image depth map. The performance of our approach has been evaluated by conducting both quantitative and qualitative comparisons with several prior related methods on the publicly NYU and KITTI datasets.

  • Effects of Initial Configuration on Attentive Tracking of Moving Objects Whose Depth in 3D Changes

    Anis Ur REHMAN  Ken KIHARA  Sakuichi OHTSUKA  

     
    PAPER-Vision

      Pubricized:
    2021/02/25
      Vol:
    E104-A No:9
      Page(s):
    1339-1344

    In daily reality, people often pay attention to several objects that change positions while being observed. In the laboratory, this process is investigated by a phenomenon known as multiple object tracking (MOT) which is a task that evaluates attentive tracking performance. Recent findings suggest that the attentional set for multiple moving objects whose depth changes in three dimensions from one plane to another is influenced by the initial configuration of the objects. When tracking objects, it is difficult for people to expand their attentional set to multiple-depth planes once attention has been focused on a single plane. However, less is known about people contracting their attentional set from multiple-depth planes to a single-depth plane. In two experiments, we examined tracking accuracy when four targets or four distractors, which were initially distributed on two planes, come together on one of the planes during an MOT task. The results from this study suggest that people have difficulty changing the depth range of their attention during attentive tracking, and attentive tracking performance depends on the initial attentional set based on the configuration prior to attentive tracking.

  • Simultaneous Attack on CNN-Based Monocular Depth Estimation and Optical Flow Estimation

    Koichiro YAMANAKA  Keita TAKAHASHI  Toshiaki FUJII  Ryuraroh MATSUMOTO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/02/08
      Vol:
    E104-D No:5
      Page(s):
    785-788

    Thanks to the excellent learning capability of deep convolutional neural networks (CNNs), CNN-based methods have achieved great success in computer vision and image recognition tasks. However, it has turned out that these methods often have inherent vulnerabilities, which makes us cautious of the potential risks of using them for real-world applications such as autonomous driving. To reveal such vulnerabilities, we propose a method of simultaneously attacking monocular depth estimation and optical flow estimation, both of which are common artificial-intelligence-based tasks that are intensively investigated for autonomous driving scenarios. Our method can generate an adversarial patch that can fool CNN-based monocular depth estimation and optical flow estimation methods simultaneously by simply placing the patch in the input images. To the best of our knowledge, this is the first work to achieve simultaneous patch attacks on two or more CNNs developed for different tasks.

  • Depth Range Control in Visually Equivalent Light Field 3D Open Access

    Munekazu DATE  Shinya SHIMIZU  Hideaki KIMATA  Dan MIKAMI  Yoshinori KUSACHI  

     
    INVITED PAPER-Electronic Displays

      Pubricized:
    2020/08/13
      Vol:
    E104-C No:2
      Page(s):
    52-58

    3D video contents depend on the shooting condition, which is camera positioning. Depth range control in the post-processing stage is not easy, but essential as the video from arbitrary camera positions must be generated. If light field information can be obtained, video from any viewpoint can be generated exactly and post-processing is possible. However, a light field has a huge amount of data, and capturing a light field is not easy. To compress data quantity, we proposed the visually equivalent light field (VELF), which uses the characteristics of human vision. Though a number of cameras are needed, VELF can be captured by a camera array. Since camera interpolation is made using linear blending, calculation is so simple that we can construct a ray distribution field of VELF by optical interpolation in the VELF3D display. It produces high image quality due to its high pixel usage efficiency. In this paper, we summarize the relationship between the characteristics of human vision, VELF and VELF3D display. We then propose a method to control the depth range for the observed image on the VELF3D display and discuss the effectiveness and limitations of displaying the processed image on the VELF3D display. Our method can be applied to other 3D displays. Since the calculation is just weighted averaging, it is suitable for real-time applications.

  • A Simple Depth-Key-Based Image Composition Considering Object Movement in Depth Direction

    Mami NAGOYA  Tomoaki KIMURA  Hiroyuki TSUJI  

     
    LETTER-Computer Graphics

      Vol:
    E103-A No:12
      Page(s):
    1603-1608

    A simple depth-key-based image composition is proposed, which uses two still images with depth information, background and foreground object. The proposed method can place the object at various locations in the background considering the depth in the 3D world coordinate system. The main feature is that a simple algorithm is provided, which enables us to achieve the depthward movement within the camera plane, without being aware of the 3D world coordinate system. Two algorithms are proposed (P-OMDD and O-OMDD), which are based on the pin-hole camera model. As an advantage, camera calibration is not required before applying the algorithm in these methods. Since a single image is used for the object representation, each of the proposed methods has its limitations in terms of fidelity of the composite image. P-OMDD faithfully reproduces the angle at which the object is seen, but the pixels of the hidden surface are missing. On the contrary, O-OMDD can avoid the hidden surface problem, but the angle of the object is fixed, wherever it moves. It is verified through several experiments that, when using O-OMDD, subjectively natural composite images can be obtained under any object movement, in terms of size and position in the camera plane. Future tasks include improving the change in illumination due to positional changes and the partial loss of objects due to noise in depth images.

  • Multi-Layered DP Quantization Algorithm Open Access

    Yukihiro BANDOH  Seishi TAKAMURA  Hideaki KIMATA  

     
    PAPER-Image

      Vol:
    E103-A No:12
      Page(s):
    1552-1561

    Designing an optimum quantizer can be treated as the optimization problem of finding the quantization indices that minimize the quantization error. One solution to the optimization problem, DP quantization, is based on dynamic programming. Some applications, such as bit-depth scalable codec and tone mapping, require the construction of multiple quantizers with different quantization levels, for example, from 12bit/channel to 10bit/channel and 8bit/channel. Unfortunately, the above mentioned DP quantization optimizes the quantizer for just one quantization level. That is, it is unable to simultaneously optimize multiple quantizers. Therefore, when DP quantization is used to design multiple quantizers, there are many redundant computations in the optimization process. This paper proposes an extended DP quantization with a complexity reduction algorithm for the optimal design of multiple quantizers. Experiments show that the proposed algorithm reduces complexity by 20.8%, on average, compared to conventional DP quantization.

  • Simultaneous Estimation of Object Region and Depth in Participating Media Using a ToF Camera

    Yuki FUJIMURA  Motoharu SONOGASHIRA  Masaaki IIYAMA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2019/12/03
      Vol:
    E103-D No:3
      Page(s):
    660-673

    Three-dimensional (3D) reconstruction and scene depth estimation from 2-dimensional (2D) images are major tasks in computer vision. However, using conventional 3D reconstruction techniques gets challenging in participating media such as murky water, fog, or smoke. We have developed a method that uses a continuous-wave time-of-flight (ToF) camera to estimate an object region and depth in participating media simultaneously. The scattered light observed by the camera is saturated, so it does not depend on the scene depth. In addition, received signals bouncing off distant points are negligible due to light attenuation, and thus the observation of such a point contains only a scattering component. These phenomena enable us to estimate the scattering component in an object region from a background that only contains the scattering component. The problem is formulated as robust estimation where the object region is regarded as outliers, and it enables the simultaneous estimation of an object region and depth on the basis of an iteratively reweighted least squares (IRLS) optimization scheme. We demonstrate the effectiveness of the proposed method using captured images from a ToF camera in real foggy scenes and evaluate the applicability with synthesized data.

  • Posture Recognition Technology Based on Kinect

    Yan LI  Zhijie CHU  Yizhong XIN  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2019/12/12
      Vol:
    E103-D No:3
      Page(s):
    621-630

    Aiming at the complexity of posture recognition with Kinect, a method of posture recognition using distance characteristics is proposed. Firstly, depth image data was collected by Kinect, and three-dimensional coordinate information of 20 skeleton joints was obtained. Secondly, according to the contribution of joints to posture expression, 60 dimensional Kinect skeleton joint data was transformed into a vector of 24-dimensional distance characteristics which were normalized according to the human body structure. Thirdly, a static posture recognition method of the shortest distance and a dynamic posture recognition method of the minimum accumulative distance with dynamic time warping (DTW) were proposed. The experimental results showed that the recognition rates of static postures, non-cross-subject dynamic postures and cross-subject dynamic postures were 95.9%, 93.6% and 89.8% respectively. Finally, posture selection, Kinect placement, and comparisons with literatures were discussed, which provides a reference for Kinect based posture recognition technology and interaction design.

  • Cauchy Aperture and Perfect Reconstruction Filters for Extending Depth-of-Field from Focal Stack Open Access

    Akira KUBOTA  Kazuya KODAMA  Asami ITO  

     
    PAPER

      Pubricized:
    2019/08/16
      Vol:
    E102-D No:11
      Page(s):
    2093-2100

    A pupil function of aperture in image capturing systems is theoretically derived such that one can perfectly reconstruct all-in-focus image through linear filtering of the focal stack. The perfect reconstruction filters are also designed based on the derived pupil function. The designed filters are space-invariant; hence the presented method does not require region segmentation. Simulation results using synthetic scenes shows effectiveness of the derived pupil function and the filters.

  • Depth from Defocus Technique Based on Cross Reblurring

    Kazumi TAKEMURA  Toshiyuki YOSHIDA  

     
    PAPER

      Pubricized:
    2019/07/11
      Vol:
    E102-D No:11
      Page(s):
    2083-2092

    This paper proposes a novel Depth From Defocus (DFD) technique based on the property that two images having different focus settings coincide if they are reblurred with the opposite focus setting, which is referred to as the “cross reblurring” property in this paper. Based on the property, the proposed technique estimates the block-wise depth profile for a target object by minimizing the mean squared error between the cross-reblurred images. Unlike existing DFD techniques, the proposed technique is free of lens parameters and independent of point spread function models. A compensation technique for a possible pixel disalignment between images is also proposed to improve the depth estimation accuracy. The experimental results and comparisons with the other DFD techniques show the advantages of our technique.

  • Automatic and Accurate 3D Measurement Based on RGBD Saliency Detection

    Yibo JIANG  Hui BI  Hui LI  Zhihao XU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/12/21
      Vol:
    E102-D No:3
      Page(s):
    688-689

    The 3D measurement is widely required in modern industries. In this letter, a method based on the RGBD saliency detection with depth range adjusting (RGBD-DRA) is proposed for 3D measurement. By using superpixels and prior maps, RGBD saliency detection is utilized to detect and measure the target object automatically Meanwhile, the proposed depth range adjusting is processing while measuring to prompt the measuring accuracy further. The experimental results demonstrate the proposed method automatic and accurate, with 3 mm and 3.77% maximum deviation value and rate, respectively.

  • A Robust Depth Image Based Rendering Scheme for Stereoscopic View Synthesis with Adaptive Domain Transform Based Filtering Framework

    Wei LIU  Yun Qi TANG  Jian Wei DING  Ming Yue CUI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2018/08/31
      Vol:
    E101-D No:12
      Page(s):
    3138-3149

    Depth image based rendering (DIBR), which is utilized to render virtual views with a color image and the corresponding depth map, is one of the key procedures in the 2D to 3D conversion process. However, some troubling problems, such as depth edge misalignment, disocclusion occurrences and cracks at resampling, still exist in current DIBR systems. To solve these problems, in this paper, we present a robust depth image based rendering scheme for stereoscopic view synthesis. The cores of the proposed scheme are two depth map filters which share a common domain transform based filtering framework. As a first step, a filter of this framework is carried out to realize texture-depth boundary alignments and directional disocclusion reduction smoothing simultaneously. Then after depth map 3D warping, another adaptive filter is used on the warped depth maps with delivered scene gradient structures to further diminish the remaining cracks and noises. Finally, with the optimized depth map of the virtual view, backward texture warping is adopted to retrieve the final texture virtual view. The proposed scheme enables to yield visually satisfactory results for high quality 2D to 3D conversion. Experimental results demonstrate the excellent performances of the proposed approach.

  • A Secure In-Depth File System Concealed by GPS-Based Mounting Authentication for Mobile Devices

    Yong JIN  Masahiko TOMOISHI  Satoshi MATSUURA  Yoshiaki KITAGUCHI  

     
    PAPER-Mobile Application and Web Security

      Pubricized:
    2018/08/22
      Vol:
    E101-D No:11
      Page(s):
    2612-2621

    Data breach and data destruction attack have become the critical security threats for the ICT (Information and Communication Technology) infrastructure. Both the Internet service providers and users are suffering from the cyber threats especially those to confidential data and private information. The requirements of human social activities make people move carrying confidential data and data breach always happens during the transportation. The Internet connectivity and cryptographic technology have made the usage of confidential data much secure. However, even with the high deployment rate of the Internet infrastructure, the concerns for lack of the Internet connectivity make people carry data with their mobile devices. In this paper, we describe the main patterns of data breach occur on mobile devices and propose a secure in-depth file system concealed by GPS-based mounting authentication to mitigate data breach on mobile devices. In the proposed in-depth file system, data can be stored based on the level of credential with corresponding authentication policy and the mounting operation will be only successful on designated locations. We implemented a prototype system using Veracrypt and Perl language and confirmed that the in-depth file system worked exactly as we expected by evaluations on two locations. The contribution of this paper includes the clarification that GPS-based mounting authentication for a file system can reduce the risk of data breach for mobile devices and a realization of prototype system.

  • Advanced DBS (Direct-Binary Search) Method for Compensating Spatial Chromatic Errors on RGB Digital Holograms in a Wide-Depth Range with Binary Holograms

    Thibault LEPORTIER  Min-Chul PARK  

     
    LETTER-Digital Signal Processing

      Vol:
    E101-A No:5
      Page(s):
    848-849

    Direct-binary search method has been used for converting complex holograms into binary format. However, this algorithm is optimized to reconstruct monochromatic digital holograms and is accurate only in a narrow-depth range. In this paper, we proposed an advanced direct-binary search method to increase the depth of field of 3D scenes reconstructed in RGB by binary holograms.

  • Measurement of Accommodation and Convergence Eye Movement when a Display and 3D Movie Move in the Depth Direction Simultaneously

    Shinya MOCHIDUKI  Yuki YOKOYAMA  Keigo SUKEGAWA  Hiroki SATO  Miyuki SUGANUMA  Mitsuho YAMADA  

     
    PAPER-Image

      Vol:
    E101-A No:2
      Page(s):
    488-498

    In this study, we first developed a simultaneous measurement system for accommodation and convergence eye movement and evaluated its precision. Then, using a stuffed animal as the target, whose depth should be relatively easy to perceive, we measured convergence eye movement and accommodation at the same time while a tablet displaying a 3D movie was moved in the depth direction. By adding the real 3D display depth movement to the movement of the 3D image, subjects showed convergence eye movement that corresponds appropriately to the dual change of parallax in the 3D movie and real display, even when a subject's convergence changed very little. Accommodation also changed appropriately according to the change in depth.

  • Single Image Dehazing Using Invariance Principle

    Mingye JU  Zhenfei GU  Dengyin ZHANG  Jian LIU  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2017/09/01
      Vol:
    E100-D No:12
      Page(s):
    3068-3072

    In this letter, we propose a novel technique to increase the visibility of the hazy image. Benefiting from the atmospheric scattering model and the invariance principle for scene structure, we formulate structure constraint equations that derive from two simulated inputs by performing gamma correction on the input image. Relying on the inherent boundary constraint of the scattering function, the expected scene albedo can be well restored via these constraint equations. Extensive experimental results verify the power of the proposed dehazing technique.

  • Depth Map Estimation Using Census Transform for Light Field Cameras

    Takayuki TOMIOKA  Kazu MISHIBA  Yuji OYAMADA  Katsuya KONDO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2017/08/02
      Vol:
    E100-D No:11
      Page(s):
    2711-2720

    Depth estimation for a lense-array type light field camera is a challenging problem because of the sensor noise and the radiometric distortion which is a global brightness change among sub-aperture images caused by a vignetting effect of the micro-lenses. We propose a depth map estimation method which has robustness against sensor noise and radiometric distortion. Our method first binarizes sub-aperture images by applying the census transform. Next, the binarized images are matched by computing the majority operations between corresponding bits and summing up the Hamming distance. An initial depth obtained by matching has ambiguity caused by extremely short baselines among sub-aperture images. After an initial depth estimation process, we refine the result with following refinement steps. Our refinement steps first approximate the initial depth as a set of depth planes. Next, we optimize the result of plane fitting with an edge-preserving smoothness term. Experiments show that our method outperforms the conventional methods.

1-20hit(97hit)