Yusuke HAYASHI Norihiko KAWAI Tomokazu SATO Miyuki OKUMOTO Naokazu YOKOYA
This paper proposes a novel approach to generate stereo video in which the zoom magnification is not constant. Although this has been achieved mechanically in a conventional way, it is necessary for this approach to develop a mechanically complex system for each stereo camera system. Instead of a mechanical solution, we employ an approach from the software side: by using a pair of zoomed and non-zoomed video, a part of the non-zoomed video image is cut out and super-resolved for generating stereo video without a special hardware. To achieve this, (1) the zoom magnification parameter is automatically determined by using distributions of intensities, and (2) the cutout image is super-resolved by using optically zoomed images as exemplars. The effectiveness of the proposed method is quantitatively and qualitatively validated through experiments.
Kenichiro FUKUSHI Itsuo KUMAZAWA
In this paper, we present a computer vision-based human tracking system with multiple stereo cameras. Many widely used methods, such as KLT-tracker, update the trackers “frame-to-frame,” so that features extracted from one frame are utilized to update their current state. In contrast, we propose a novel optimization technique for the “multi-frame” approach that computes resultant trajectories directly from video sequences, in order to achieve high-level robustness against severe occlusion, which is known to be a challenging problem in computer vision. We developed a heuristic optimization technique to estimate human trajectories, instead of using dynamic programming (DP) or an iterative approach, which makes our method sufficiently computationally efficient to operate in realtime. Six video sequences where one to six people walk in a narrow laboratory space are processed using our system. The results confirm that our system is capable of tracking cluttered scenes in which severe occlusion occurs and people are frequently in close proximity to each other. Moreover, minimal information is required for tracking, instead of full camera images, which is communicated over the network. Hence, commonly used network devices are sufficient for constructing our tracking system.
Jae-woong JEONG Young-cheol PARK Dae-hee YOUN
This paper presents an approximated virtual source imaging system based on crosstalk cancellation with a pair of closely spaced loudspeakers. Utilizing the frequency-dependent relative importance of sound localization cues, the proposed system provides separate approximations for the low- and high-frequency bands. Experimental results show that the system provides good approximations within ±55° in the stereo dipole setup with natural sound quality.
Jangwon LEE Kugjin YUN Doug Young SUH Kyuheon KIM
This letter proposes a new delivery format in order to realize unified transmissions of stereoscopic video contents over a dynamic adaptive streaming scheme. With the proposed delivery format, various forms of stereoscopic video contents regardless of their encoding and composition types can be delivered over the current dynamic adaptive streaming scheme. In addition, the proposed delivery format supports dynamic and efficient switching between 2D and 3D sequences in an interoperable manner for both 2D and 3D digital devices, regardless of their capabilities. This letter describes the designed delivery format and shows dynamic interoperable applications for 2D and 3D mixed contents with the implemented system in order to verify its features and efficiency.
Norimichi UKITA Kazuki MATSUDA
This paper proposes a method for reconstructing accurate 3D surface points. To this end, robust and dense reconstruction with Shape-from-Silhouettes (SfS) and accurate multiview stereo are integrated. Unlike gradual shape shrinking and/or bruteforce large space search by existing space carving approaches, our method obtains 3D points by SfS and stereo independently, and then selects correct ones from them. The point selection is achieved in accordance with spatial consistency and smoothness of 3D point coordinates and normals. The globally optimized points are selected by graph-cuts. Experimental results with several subjects containing complex shapes demonstrate that our method outperforms existing approaches and our previous method.
Histogram modification based image enhancement algorithms have been extensively used in 2-D image applications. In this letter, we apply a histogram modification framework to stereoscopic image enhancement. The proposed algorithm estimates the histogram of a stereo image pair without explicitly computing the pixel-wise disparity. Then, the histogram in the occluded regions is estimated and used to determine the target histogram of the stereo image. Experimental results demonstrate the effectiveness of the proposed algorithm.
Kei SADAKUNI Takuya INOUE Hirotsugu YAMAMOTO Shiro SUYAMA
Three methods of presenting a three-dimensional (3-D) image – a real object, a protruding stereoscopic display, and the depth-fused 3-D (DFD) display – have different tendencies for the change in perceived depth produced when the visual acuity of the dominant eye is decreased by an occlusion foil. These different tendencies are estimated from the slope and correlation coefficient of the plot of perceived depth difference versus stimuli depth difference. This estimation was derived using the same experimental system setup composed of two displays and a half mirror for all three 3-D display methods. The perceived depth difference was measured for four subjects by calipers using two fingers. The slope and correlation coefficient had almost the same tendencies as follows. The real object had the smallest decrease among the three 3-D display methods when the dominant eye's visual acuity was decreased and the protruding stereoscopic display had the largest decrease. The DFD display method had an intermediate decrease between those of the real object and protruding stereoscopic display. When the dominant eye's visual acuity was high enough, the differences among the three 3-D display methods were small. When its visual acuity was decreased, the differences increased among the three 3-D display methods and became statistically significant.
Our research is focused on examining a stereoscopic quality assessment model for stereoscopic images with disparate quality in left and right images for glasses-free stereo vision. In this paper, we examine the objective assessment model of 3-D images, considering the difference in image quality between each view-point generated by the disparity-compensated coding. A overall stereoscopic image quality can be estimated by using only predicted values of left and right 2-D image qualities based on the MPEG-7 descriptor information without using any disparity information. As a result, the stereoscopic still image quality is assessed with high prediction accuracy with correlation coefficient=0.98 and average error=0.17.
Jegoon RYU Sei-ichiro KAMATA Alireza AHRARY
In this paper, we propose a novel gait recognition framework - Spherical Space Model with Human Point Clouds (SSM-HPC) to recognize front view of human gait. A new gait representation - Marching in Place (MIP) gait is also introduced which preserves the spatiotemporal characteristics of individual gait manner. In comparison with the previous studies on gait recognition which usually use human silhouette images from image sequences, this research applies three dimensional (3D) point clouds data of human body obtained from stereo camera. The proposed framework exhibits gait recognition rates superior to those of other gait recognition methods.
Kazuki MATSUDA Norimichi UKITA
This paper proposes a method for reconstructing a smooth and accurate 3D surface. Recent machine vision techniques can reconstruct accurate 3D points and normals of an object. The reconstructed point cloud is used for generating its 3D surface by surface reconstruction. The more accurate the point cloud, the more correct the surface becomes. For improving the surface, how to integrate the advantages of existing techniques for point reconstruction is proposed. Specifically, robust and dense reconstruction with Shape-from-Silhouettes (SfS) and accurate stereo reconstruction are integrated. Unlike gradual shape shrinking by space carving, our method obtains 3D points by SfS and stereo independently and accepts the correct points reconstructed. Experimental results show the improvement by our method.
In this paper, we propose an optimized virtual re-convergence system especially to reduce the visual fatigue caused by binocular stereoscopy. Our unique idea to reduce visual fatigue is to utilize the virtual re-convergence based on the optimized disparity-map that contains more depth information in the negative disparity area than in the positive area. Therefore, our system facilitates a unique search-range scheme, especially for negative disparity exploration. In addition, we used a dedicated method, using a so-called Global-Shift Value (GSV), which are the total shift values of each image in stereoscopy to converge a main object that can mostly affect visual fatigue. The experimental result, which is a subjective assessment by participants, shows that the proposed method makes stereoscopy significantly comfortable and attractive to view than existing methods.
Nitin SINGHAL Jin Woo YOO Ho Yeol CHOI In Kyu PARK
In this paper, we analyze the key factors underlying the implementation, evaluation, and optimization of image processing and computer vision algorithms on embedded GPU using OpenGL ES 2.0 shader model. First, we present the characteristics of the embedded GPU and its inherent advantage when compared to embedded CPU. Additionally, we propose techniques to achieve increased performance with optimized shader design. To show the effectiveness of the proposed techniques, we employ cartoon-style non-photorealistic rendering (NPR), speeded-up robust feature (SURF) detection, and stereo matching as our example algorithms. Performance is evaluated in terms of the execution time and speed-up achieved in comparison with the implementation on embedded CPU.
Chenbo SHI Guijin WANG Xiaokang PEI Bei HE Xinggang LIN
In this paper, we propose an interleaving updating framework of disparity and confidence map (IUFDCM) for stereo matching to eliminate the redundant and interfere information from unreliable pixels. Compared with other propagation algorithms using matching cost as messages, IUFDCM updates the disparity map and the confidence map in an interleaving manner instead. Based on the Confidence-based Support Window (CSW), disparity map is updated adaptively to alleviate the effect of input parameters. The reassignment for unreliable pixels with larger probability keeps ground truth depending on reliable messages. Consequently, the confidence map is updated according to the previous disparity map and the left-right consistency. The top ranks on Middlebury benchmark corresponding to different error thresholds demonstrate that our algorithm is competitive with the best stereo matching algorithms at present.
Chenbo SHI Guijin WANG Xiaokang PEI Bei HE Xinggang LIN
This paper addresses stereo matching under scenarios of smooth region and obviously slant plane. We explore the flexible handling of color disparity, spatial relation and the reliability of matching pixels in support windows. Building upon these key ingredients, a robust stereo matching algorithm using local plane fitting by Confidence-based Support Window (CSW) is presented. For each CSW, only these pixels with high confidence are employed to estimate optimal disparity plane. Considering that RANSAC has shown to be robust in suppressing the disturbance resulting from outliers, we employ it to solve local plane fitting problem. Compared with the state of the art local methods in the computer vision community, our approach achieves the better performance and time efficiency on the Middlebury benchmark.
In this paper, we propose a quantitative metric of measuring the degree of the visual fatigue in a stereoscopy. To the best of our knowledge, this is the first simplified relative quantitative approach describing visual fatigue value of a stereoscopy. Our experimental result shows that the correlation index of more than 98% is obtained between our Simplified Relative Visual Fatigue (SRVF) model and Mean Opinion Score (MOS).
In this paper, we deal with the pedestrian detection task in outdoor scenes. Because of the complexity of such scenes, generally used gradient-feature-based detectors do not work well on them. We propose to use sparse 3D depth information as an additional cue to do the detection task, in order to achieve a fast improvement in performance. Our proposed method uses a probabilistic model to integrate image-feature-based classification with sparse depth estimation. Benefiting from the depth estimates, we map the prior distribution of human's actual height onto the image, and update the image-feature-based classification result probabilistically. We have two contributions in this paper: 1) a simplified graphical model which can efficiently integrate depth cue in detection; and 2) a sparse depth estimation method which could provide fast and reliable estimation of depth information. An experiment shows that our method provides a promising enhancement over baseline detector within minimal additional time.
Lili MENG Yao ZHAO Anhong WANG Jeng-Shyang PAN Huihui BAI
A stereo video coding scheme which is compatible with monoview-processor is presented in this paper. At the same time, this paper proposes an adaptive prediction structure which can make different prediction modes to be applied to different groups of picture (GOPs) according to temporal correlations and interview correlations to improve the coding efficiency. Moreover, the most advanced video coding standard H.264 is used conveniently for maximize the coding efficiency in this paper. Finally, the effectiveness of the proposed scheme is verified by extensive experimental results.
Ryo NAKASHIMA Kei UTSUGI Keita TAKAHASHI Takeshi NAEMURA
We propose a new stereo image retargeting method based on the framework of shift-map image editing. Retargeting is the process of changing the image size according to the target display while preserving as much of the richness of the image as possible, and is often applied to monocular images and videos. Retargeting stereo images poses a new challenge because pixel correspondences between the stereo pair should be preserved to keep the scene's structure. The main contribution of this paper is integrating a stereo correspondence constraint into the retargeting process. Among several retargeting methods, we adopt shift-map image editing because this framework can be extended naturally to stereo images, as we show in this paper. We confirmed the effectiveness of our method through experiments.
Trung Thanh NGO Yuichiro KOJIMA Hajime NAGAHARA Ryusuke SAGAWA Yasuhiro MUKAIGAWA Masahiko YACHIDA Yasushi YAGI
For fast egomotion of a camera, computing feature correspondence and motion parameters by global search becomes highly time-consuming. Therefore, the complexity of the estimation needs to be reduced for real-time applications. In this paper, we propose a compound omnidirectional vision sensor and an algorithm for estimating its fast egomotion. The proposed sensor has both multi-baselines and a large field of view (FOV). Our method uses the multi-baseline stereo vision capability to classify feature points as near or far features. After the classification, we can estimate the camera rotation and translation separately by using random sample consensus (RANSAC) to reduce the computational complexity. The large FOV also improves the robustness since the translation and rotation are clearly distinguished. To date, there has been no work on combining multi-baseline stereo with large FOV characteristics for estimation, even though these characteristics are individually are important in improving egomotion estimation. Experiments showed that the proposed method is robust and produces reasonable accuracy in real time for fast motion of the sensor.
Young Han LEE Deok Su KIM Hong Kook KIM Jongmo SUNG Mi Suk LEE Hyun Joo BAE
In this paper, we propose a bandwidth-scalable stereo audio coding method based on a layered structure. The proposed stereo coding method encodes super-wideband (SWB) stereo signals and is able to decode either wideband (WB) stereo signals or SWB stereo signals, depending on the network congestion. The performance of the proposed stereo coding method is then compared with that of a conventional stereo coding method that separately decodes WB or SWB stereo signals, in terms of subjective quality, algorithmic delay, and computational complexity. Experimental results show that when stereo audio signals sampled at a rate of 32 kHz are compressed to 64 kbit/s, the proposed method provides significantly better audio quality with a 64-sample shorter algorithmic delay, and comparable computational complexity.