The search functionality is under construction.

Author Search Result

[Author] Hideo SAITO(13hit)

1-13hit
  • Simultaneous Object Segmentation and Recognition by Merging CNN Outputs from Uniformly Distributed Multiple Viewpoints

    Yoshikatsu NAKAJIMA  Hideo SAITO  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Vol:
    E101-D No:5
      Page(s):
    1308-1316

    We propose a novel object recognition system that is able to (i) work in real-time while reconstructing segmented 3D maps and simultaneously recognize objects in a scene, (ii) manage various kinds of objects, including those with smooth surfaces and those with a large number of categories, utilizing a CNN for feature extraction, and (iii) maintain high accuracy no matter how the camera moves by distributing the viewpoints for each object uniformly and aggregating recognition results from each distributed viewpoint as the same weight. Through experiments, the advantages of our system with respect to current state-of-the-art object recognition approaches are demonstrated on the UW RGB-D Dataset and Scenes and on our own scenes prepared to verify the effectiveness of the Viewpoint-Class-based approach.

  • Foldable Augmented Maps

    Sandy MARTEDI  Hideaki UCHIYAMA  Guillermo ENRIQUEZ  Hideo SAITO  Tsutomu MIYASHITA  Takenori HARA  

     
    PAPER-Multimedia Pattern Processing

      Vol:
    E95-D No:1
      Page(s):
    256-266

    This paper presents a folded surface detection and tracking method for augmented maps. First, we model a folded surface as two connected planes. Therefore, in order to detect a folded surface, the plane detection method is iteratively applied to the 2D correspondences between an input image and a reference plane. In order to compute the exact folding line from the detected planes for visualization purpose, the intersection line of the planes is computed from their positional relationship. After the detection is done, each plane is individually tracked by the frame-by-frame descriptor update method. We overlay virtual geographic data on each detected plane. As scenario of use, some interactions on the folded surface are introduced. Experimental results show the accuracy and performance of folded surface detection for evaluating the effectiveness of our approach.

  • Human Foot Reconstruction from Multiple Camera Images with Foot Shape Database

    Jiahui WANG  Hideo SAITO  Makoto KIMURA  Masaaki MOCHIMARU  Takeo KANADE  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E89-D No:5
      Page(s):
    1732-1742

    Recently, researches and developments for measuring and modeling of the human body have been receiving much attention. Our aim is to reconstruct an accurate shape of a human foot from multiple camera images, which can capture dynamic behavior of the object. In this paper, a foot-shape database is used for accurate reconstruction of human foot. By using Principal Component Analysis, the foot shape can be represented with new meaningful variables. The dimensionality of the data is also reduced. Thus, the shape of object can be recovered efficiently, even though the object is partially occluded in some input views. To demonstrate the proposed method, two kinds of experiments are presented: reconstruction of human foot in a virtual reality environment with CG multi-camera images, and in real world with eight CCD cameras. In the experiments, the reconstructed shape error with our method is around 2 mm in average, while the error is more than 4 mm with conventional volume intersection method.

  • Calibration Free Virtual Display System Using Video Projector onto Real Object Surface

    Shinichiro HIROOKA  Hideo SAITO  

     
    PAPER

      Vol:
    E89-D No:1
      Page(s):
    88-97

    In this paper, we propose a novel virtual display system for a real object surface by using a video projector, so that the viewer can feel as if digital images are printed on the real surface with arbitrary shape. This system consists of an uncalibrated camera and video projector connected to a same PC and creates a virtual object by rendering 2D contents preserved beforehand onto a white object in a real world via a projector. For geometry registration between the rendered image and the object surface correctly, we regard the object surface as a set of a number of small rectangular regions and perform geometry registration by calculating homographies between the projector image plane and the each divided regions. By using such a homography-based method, we can avoid calibration of a camera and a projector that is necessary in a conventional method. In this system, we perform following two processes. First of all, we acquire the status of the object surface from images which capture the scene that color-coded checker patterns are projected on it and generate image rendered on it without distortion by calculating homographies. After once the projection image is generated, the rendered image can be updated if the object surface moves, or refined when it is stationary by observing the object surface. By this second process, the system always offers more accurate display. In implementation, we demonstrate our system in various conditions. This system enables it to project them as if it is printed on a real paper surface of a book. By using this system, we expect the realization of a virtual museum or other industrial application.

  • Line-Based SLAM Using Non-Overlapping Cameras in an Urban Environment

    Atsushi KAWASAKI  Kosuke HARA  Hideo SAITO  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Vol:
    E101-D No:5
      Page(s):
    1232-1242

    We propose a method of line-based Simultaneous Localization and Mapping (SLAM) using non-overlapping multiple cameras for vehicles running in an urban environment. It uses corresponding line segments between images taken by different frames and different cameras. The contribution is a novel line segment matching algorithm by warping processing based on urban structures. This idea significantly improves the accuracy of line segment matching when viewing direction are very different, so that a number of correspondences between front-view and rear-view cameras can be found and the accuracy of SLAM can be improved. Additionally, to enhance the accuracy of SLAM we apply a geometrical constraint of urban area for initial estimation of 3D mapping of line segments and optimization by bundle adjustment. We can further improve the accuracy of SLAM by combining points and lines. The position error is stable within 1.5m for the entire image dataset evaluated in this paper. The estimation accuracy of our method is as high as that of ground truth captured by RTK-GPS. Our high accuracy SLAM algorithm can be apply for generating a road map represented by line segments. According to an evaluation of our generating map, true positive rate around the vehicle exceeding 70% is achieved.

  • Stereo Matching between Three Images by Iterative Refinement in PVS

    Makoto KIMURA  Hideo SAITO  Takeo KANADE  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E86-D No:1
      Page(s):
    89-100

    In the field of computer vision and computer graphics, Image-Based-Rendering (IBR) methods are often used to synthesize images from real scene. The image synthesis by IBR requires dense correct matching points in the images. However, IBR does not require 3D geometry reconstruction or camera calibration in Euclidean geometry. On the other hand, 3D reconstructed model can easily point out the occlusion in images. In this paper, we propose an approach to reconstruct 3D shape in a voxel space, which is named Projective Voxel Space (PVS). Since PVS is defined by projective geometry, it requires only weak calibration. PVS is determined by rectifications of the epipolar lines in three images. Three rectified images are orthogonal projected images of a scene in PVS, so processing about image projection is easy in PVS. In both PVS and Euclidean geometry, a point in an image is on a projection from a point on a surface of the object in the scene. Then the other image might have a correct matching point without occlusion, or no matching point because of occlusion. This is a kind of restriction about searching matching points or surface of the object. Taking advantage of simplicity of projection in PVS, the correlation values of points in images are computed, and the values are iteratively refined using the restriction described above. Finally, the shapes of the objects in the scene are acquired in PVS. The reconstructed shape in PVS does not have similarity to 3D shape in Euclidean geometry. However, it denotes consistent matching points in three images, and also indicates the existence of occluded points. Therefore, the reconstructed shape in PVS is sufficient for image synthesis by IBR.

  • Fluorinated Liquid Crystalline Materials for AM-LCD Applications

    Hideo SAITO  Etsuo NAKAGAWA  Tetsuya MATSUSHITA  Fusayuki TAKESHITA  Yasuhiro KUBO  Shuichi MATSUI  Kazutoshi MIYAZAWA  Yasuyuki GOTO  

     
    PAPER

      Vol:
    E79-C No:8
      Page(s):
    1027-1034

    Flurorinated liquid crystal compounds having fluorophenyl, difluorophenyl and trifluorophenyl moieties combined with ester linkages, 1,2-ethylenes and covalent bonds were prepared and checked for their physical properties i.e. mesophases, dielectric and optical anisotropy. viscosity, pretilt angle and threshold voltage. By introducing fluorine atom(s) into the molecules, optical anisotropy and threshold voltage decreased, though the nematic temperature range diminished. The investigated compounds were all chemically stable and by using the compounds nematic liquid crystalline mixtures having low threshold voltage, low viscosity, large optical anisotropy and wide nematic ranges which were suitable for AM-LCDs, could be obtained.

  • Extraction of Blood Vessels in Retinal Images Using Resampling High-Order Background Estimation

    Sukritta PARIPURANA  Werapon CHIRACHARIT  Kosin CHAMNONGTHAI  Hideo SAITO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2014/12/12
      Vol:
    E98-D No:3
      Page(s):
    692-703

    In retinal blood vessel extraction through background removal, the vessels in a fundus image which appear in a higher illumination variance area are often missing after the background is removed. This is because the intensity values of the vessel and the background are nearly the same. Thus, the estimated background should be robust to changes of the illumination intensity. This paper proposes retinal blood vessel extraction using background estimation. The estimated background is calculated by using a weight surface fitting method with a high degree polynomial. Bright pixels are defined as unwanted data and are set as zero in a weight matrix. To fit a retinal surface with a higher degree polynomial, fundus images are reduced in size by different scaling parameters in order to reduce the processing time and complexity in calculation. The estimated background is then removed from the original image. The candidate vessel pixels are extracted from the image by using the local threshold values. To identify the true vessel region, the candidate vessel pixels are dilated from the candidate. After that, the active contour without edge method is applied. The experimental results show that the efficiency of the proposed method is higher than the conventional low-pass filter and the conventional surface fitting method. Moreover, rescaling an image down using the scaling parameter at 0.25 before background estimation provides as good a result as a non-rescaled image does. The correlation value between the non-rescaled image and the rescaled image is 0.99. The results of the proposed method in the sensitivity, the specificity, the accuracy, the area under the receiver operating characteristic (ROC) curve (AUC) and the processing time per image are 0.7994, 0.9717, 0.9543, 0.9676 and 1.8320 seconds for the DRIVE database respectively.

  • Real-Time Counting People in Crowded Areas by Using Local Empirical Templates and Density Ratios

    Dao-Huu HUNG  Gee-Sern HSU  Sheng-Luen CHUNG  Hideo SAITO  

     
    PAPER-Recognition

      Vol:
    E95-D No:7
      Page(s):
    1791-1803

    In this paper, a fast and automated method of counting pedestrians in crowded areas is proposed along with three contributions. We firstly propose Local Empirical Templates (LET), which are able to outline the foregrounds, typically made by single pedestrians in a scene. LET are extracted by clustering foregrounds of single pedestrians with similar features in silhouettes. This process is done automatically for unknown scenes. Secondly, comparing the size of group foreground made by a group of pedestrians to that of appropriate LET captured in the same image patch with the group foreground produces the density ratio. Because of the local scale normalization between sizes, the density ratio appears to have a bound closely related to the number of pedestrians who induce the group foreground. Finally, to extract the bounds of density ratios for groups of different number of pedestrians, we propose a 3D human models based simulation in which camera viewpoints and pedestrians' proximity are easily manipulated. We collect hundreds of typical occluded-people patterns with distinct degrees of human proximity and under a variety of camera viewpoints. Distributions of density ratios with respect to the number of pedestrians are built based on the computed density ratios of these patterns for extracting density ratio bounds. The simulation is performed in the offline learning phase to extract the bounds from the distributions, which are used to count pedestrians in online settings. We reveal that the bounds seem to be invariant to camera viewpoints and humans' proximity. The performance of our proposed method is evaluated with our collected videos and PETS 2009's datasets. For our collected videos with the resolution of 320 × 240, our method runs in real-time with good accuracy and frame rate of around 30 fps, and consumes a small amount of computing resources. For PETS 2009's datasets, our proposed method achieves competitive results with other methods tested on the same datasets [1],[2].

  • Superimposing Thermal-Infrared Data on 3D Structure Reconstructed by RGB Visual Odometry

    Masahiro YAMAGUCHI  Trong Phuc TRUONG  Shohei MORI  Vincent NOZICK  Hideo SAITO  Shoji YACHIDA  Hideaki SATO  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Vol:
    E101-D No:5
      Page(s):
    1296-1307

    In this paper, we propose a method to generate a three-dimensional (3D) thermal map and RGB + thermal (RGB-T) images of a scene from thermal-infrared and RGB images. The scene images are acquired by moving both a RGB camera and an thermal-infrared camera mounted on a stereo rig. Before capturing the scene with those cameras, we estimate their respective intrinsic parameters and their relative pose. Then, we reconstruct the 3D structures of the scene by using Direct Sparse Odometry (DSO) using the RGB images. In order to superimpose thermal information onto each point generated from DSO, we propose a method for estimating the scale of the point cloud corresponding to the extrinsic parameters between both cameras by matching depth images recovered from the RGB camera and the thermal-infrared camera based on mutual information. We also generate RGB-T images using the 3D structure of the scene and Delaunay triangulation. We do not rely on depth cameras and, therefore, our technique is not limited to scenes within the measurement range of the depth cameras. To demonstrate this technique, we generate 3D thermal maps and RGB-T images for both indoor and outdoor scenes.

  • 3D Reconstruction of Skin Surface from Image Sequence

    Takeshi YAMADA  Hideo SAITO  Shinji OZAWA  

     
    PAPER

      Vol:
    E83-D No:7
      Page(s):
    1415-1421

    This paper proposes a new method for reconstruction a shape of skin surface replica from shaded image sequence taken with different light source directions. Since the shaded images include shadows caused by surface height fluctuation, and specular and inter reflections, the conventional photometric stereo method is not suitable for reconstructing its surface accurately. In the proposed method, we choose measured intensity which does not include specular and inter reflections and self-shadows so that we can calculate accurate normal vector from the selected measured intensity using SVD (Singular Value Decomposition) method. The experimental results from real images demonstrate that the proposed method is effective for shape reconstruction from shaded images, which include specular and inter reflections and self-shadows.

  • 3D Reconstruction Based on Epipolar Geometry

    Makoto KIMURA  Hideo SAITO  

     
    PAPER

      Vol:
    E84-D No:12
      Page(s):
    1690-1697

    Recently, it becomes popular to synthesize new viewpoint images based on some sampled viewpoint images of real scene using technique of computer vision. 3D shape reconstruction in Euclidean space is not necessarily required, but information of dense matching points is basically enough to synthesize new viewpoint images. In this paper, we propose a new method for 3D reconstruction from three cameras based on projective geometry. In the proposed method, three input camera images are rectified based on projective geometry, so that the vertical and horizontal directions can be completely aligned with the epipolar planes between the cameras. This rectification provides Projective Voxel Space (PVS), in which the three axes are aligned with the directions of camera projection. Such alignment simplifies the procedure for projection between the 3D space and the image planes in PVS. Taking this advantage of PVS, silhouettes of the objects are projected into PVS, so that the searching area of matching points can be reduced. The consistency of color value between the images is also evaluated for final determination of the matching point. The finally acquired matching points in the proposed method are described as the surface of the objects in PVS. The acquired surface of the objects in PVS also includes knowledge about occlusion. Finally, images from new viewpoints can be synthesized from the matching points and occlusions. Although the proposed method requires only weak calibration, plausible occlusions are also synthesized in the images. In the experiments, images of virtual viewpoints, which were set among three cameras, are synthesized from three real images.

  • Automatic Road Area Extraction from Printed Maps Based on Linear Feature Detection

    Sebastien CALLIER  Hideo SAITO  

     
    PAPER-Segmentation

      Vol:
    E95-D No:7
      Page(s):
    1758-1765

    Raster maps are widely available in the everyday life, and can contain a huge amount of information of any kind using labels, pictograms, or color code e.g. However, it is not an easy task to extract roads from those maps due to those overlapping features. In this paper, we focus on an automated method to extract roads by using linear features detection to search for seed points having a high probability to belong to roads. Those linear features are lines of pixels of homogenous color in each direction around each pixel. After that, the seeds are then expanded before choosing to keep or to discard the extracted element. Because this method is not mainly based on color segmentation, it is also suitable for handwritten maps for example. The experimental results demonstrate that in most cases our method gives results similar to usual methods without needing any previous data or user input, but do need some knowledge on the target maps; and does work with handwritten maps if drawn following some basic rules whereas usual methods fail.