Ryutaro OI Takayuki HAMAMOTO Kiyoharu AIZAWA
We have studied an image acquisition system for a real-time image- based rendering (IBR) system. In this area, most conventional systems sacrifice spatial or temporal resolution for a large number of input images. However, only a portion of the image data is needed for rendering, and the portion required is determined by the position of the imaginary viewpoint. In this paper, we propose an acquisition system for a real-time image-based rendering system that uses pixel-based random-access image sensors to eliminate the main bottleneck in conventional systems. We have developed a prototype CMOS image sensor, which has 128 128 pixels. We verified the prototype chip's selective readout function. We also verified the sample & hold feature.
Qing YU Masashi ANZAWA Sosuke AMANO Kiyoharu AIZAWA
Since the development of food diaries could enable people to develop healthy eating habits, food image recognition is in high demand to reduce the effort in food recording. Previous studies have worked on this challenging domain with datasets having fixed numbers of samples and classes. However, in the real-world setting, it is impossible to include all of the foods in the database because the number of classes of foods is large and increases continually. In addition to that, inter-class similarity and intra-class diversity also bring difficulties to the recognition. In this paper, we solve these problems by using deep convolutional neural network features to build a personalized classifier which incrementally learns the user's data and adapts to the user's eating habit. As a result, we achieved the state-of-the-art accuracy of food image recognition by the personalization of 300 food records per user.
Sanghoon SONG Yoonki CHOI Kiyoharu AIZAWA Mitsutoshi HATORI
In land mobile communication, CMA (Constant Modulus Algorithm) has been studied to reduce multipath fading effect. By this method, the transmitted power is not used efficiently since all the multipath components have the same information. To make use of received power efficiently, we propose a Blind Multiple Beam Adaptive Array. It has the following three feature points. First, we use CMA which can reduce the multipath fading effect to some extent without training signal. Second, LMS algorithm which can capture the multipath components which are separated from the reference signal by some extent. Third, we use FDF (Fractional Delay Filter) and TED (Timing Error Detector) loop which can detect and compensate fractional delay. As a result of utilizing the multipath components which is suppressed by CMA, the proposed technique achieves better performance than CMA adaptive array.
This paper introduces our work on a Movie Map, which will enable users to explore a given city area using 360° videos. Visual exploration of a city is always needed. Nowadays, we are familiar with Google Street View (GSV) that is an interactive visual map. Despite the wide use of GSV, it provides sparse images of streets, which often confuses users and lowers user satisfaction. Forty years ago, a video-based interactive map was created - it is well-known as Aspen Movie Map. Movie Map uses videos instead of sparse images and seems to improve the user experience dramatically. However, Aspen Movie Map was based on analog technology with a huge effort and never built again. Thus, we renovate the Movie Map using state-of-the-art technology. We build a new Movie Map system with an interface for exploring cities. The system consists of four stages; acquisition, analysis, management, and interaction. After acquiring 360° videos along streets in target areas, the analysis of videos is almost automatic. Frames of the video are localized on the map, intersections are detected, and videos are segmented. Turning views at intersections are synthesized. By connecting the video segments following the specified movement in an area, we can watch a walking view along a street. The interface allows for easy exploration of a target area. It can also show virtual billboards in the view.
Takahiro NARUKO Hiroaki AKUTSU Koki TSUBOTA Kiyoharu AIZAWA
We propose Quality Enhancement via a Side bitstream Network (QESN) technique for lossy image compression. The proposed QESN utilizes the network architecture of deep image compression to produce a bitstream for enhancing the quality of conventional compression. We also present a loss function that directly optimizes the Bjontegaard delta bit rate (BD-BR) by using a differentiable model of a rate-distortion curve. Experimental results show that QESN improves the rate by 16.7% in the BD-BR compared to Better Portable Graphics.
Kazuki EGASHIRA Atsuyuki MIYAI Qing YU Go IRIE Kiyoharu AIZAWA
We propose a novel classification problem setting where Undesirable Classes (UCs) are defined for each class. UC is the class you specifically want to avoid misclassifying. To address this setting, we propose a framework to reduce the probabilities for UCs while increasing the probability for a correct class.
While deep image compression performs better than traditional codecs like JPEG on natural images, it faces a challenge as a learning-based approach: compression performance drastically decreases for out-of-domain images. To investigate this problem, we introduce a novel task that we call universal deep image compression, which involves compressing images in arbitrary domains, such as natural images, line drawings, and comics. Furthermore, we propose a content-adaptive optimization framework to tackle this task. This framework adapts a pre-trained compression model to each target image during testing for addressing the domain gap between pre-training and testing. For each input image, we insert adapters into the decoder of the model and optimize the latent representation extracted by the encoder and the adapter parameters in terms of rate-distortion, with the adapter parameters transmitted per image. To achieve the evaluation of the proposed universal deep compression, we constructed a benchmark dataset containing uncompressed images of four domains: natural images, line drawings, comics, and vector arts. We compare our proposed method with non-adaptive and existing adaptive compression methods, and the results show that our method outperforms them. Our code and dataset are publicly available at https://github.com/kktsubota/universal-dic.
Hiroshi OHNO Kiyoharu AIZAWA Mitsutoshi HATORI
Fractal image coding using iterated transformations compresses image data by exploiting the self–similarity of an image. Its compression performance has already been discussed in [2] and several other papers. However the relation between the performance and the self–similarity remains unclear. In this paper, we evaluate fractal coding from the perspective of this relationship.
Miwa SAKAI Kiyoharu AIZAWA Mitsutoshi HATORI
An adaptive digital filter with adaptive sampling phase is proposed. The structure of the filter makes use of an adaptive delay device at the input of the filter. The algorithm is derived to determine the value of the delay and the filter coefficients by minimizing MSE (mean square error) between the desired signal and the filter output. The computer simulation of the convergence of the proposed adaptive filter with the input of sinusoidal wave and BPSK modulated wave are shown. According to the simulation, the MSE of the proposed adaptive delay algorithm is lower than that of the conventional LMS algorithm.
Cha-Keon CHEONG Kiyoharu AIZAWA
This paper addresses a novel scheme for variable rate error correction coding with interleavered puncturing serially concatenated convolutional code. In order to obtain a variable coding rate, the bits of the outer coder are perforated with a given puncturing pattern, and randomly interleaved. The effect of interleavered puncturing on the overall coding performance is analyzed, and the upper bound to the bit error probability of the proposed coder is derived. Moreover, to evaluate the effectiveness of the proposed scheme some simulation results are presented with the iterative decoding procedure, in which the channel models of Rayleigh fading and additive white Gaussian noises are assumed.
Jiafeng MAO Qing YU Kiyoharu AIZAWA
Well annotated dataset is crucial to the training of object detectors. However, the production of finely annotated datasets for object detection tasks is extremely labor-intensive, therefore, cloud sourcing is often used to create datasets, which leads to these datasets tending to contain incorrect annotations such as inaccurate localization bounding boxes. In this study, we highlight a problem of object detection with noisy bounding box annotations and show that these noisy annotations are harmful to the performance of deep neural networks. To solve this problem, we further propose a framework to allow the network to modify the noisy datasets by alternating refinement. The experimental results demonstrate that our proposed framework can significantly alleviate the influences of noise on model performance.
Nagul COOHAROJANANONE Kiyoharu AIZAWA
In this paper we will present a new color distance measure, that is, angular distance of cumulative histogram. The proposed measure is robust to light variation. We also applied the weitght value to DR, DG, DB according to a Hue histogram of the query image. Moreover, we have compared the measure to previous popular measure that is cumulative L1 distance measure. We show that our method performed more accurate and perceptually relevant result.
Koki TSUBOTA Hiroaki AKUTSU Kiyoharu AIZAWA
Image quality assessment (IQA) is a fundamental metric for image processing tasks (e.g., compression). With full-reference IQAs, traditional IQAs, such as PSNR and SSIM, have been used. Recently, IQAs based on deep neural networks (deep IQAs), such as LPIPS and DISTS, have also been used. It is known that image scaling is inconsistent among deep IQAs, as some perform down-scaling as pre-processing, whereas others instead use the original image size. In this paper, we show that the image scale is an influential factor that affects deep IQA performance. We comprehensively evaluate four deep IQAs on the same five datasets, and the experimental results show that image scale significantly influences IQA performance. We found that the most appropriate image scale is often neither the default nor the original size, and the choice differs depending on the methods and datasets used. We visualized the stability and found that PieAPP is the most stable among the four deep IQAs.
Toshihiko YAMASAKI Takayuki ISHIKAWA Kiyoharu AIZAWA
Recently, cars are equipped with a lot of sensors for safety driving. We have been trying to store the driving-scene video with such sensor data and to detect the change of scenery of streets. Detection results can be used for building historical database of town scenery, automatic landmark updating of maps, and so forth. In order to compare images to detect changes, image retrieval taken at nearly identical locations is required as the first step. Since Global Positioning System (GPS) data essentially contain some noises, we cannot rely only on GPS data for our image retrieval. Therefore, we have developed an image retrieval algorithm employing edge-histogram-based image features in conjunction with hierarchical search. By using edge histograms projected onto the vertical and horizontal axes, the retrieval has been made robust to image variation due to weather change, clouds, obstacles, and so on. In addition, matching cost has been made small by limiting the matching candidates employing the hierarchical search. Experimental results have demonstrated that the mean retrieval accuracy has been improved from 65% to 76% for the front-view images and from 34% to 53% for the side-view images.
Masayuki TANIMOTO Kohichi SAKANIWA Kiyoharu AIZAWA Kazuyoshi OSHIMA Kiyomi KUMOZAKI Shuji TASAKA Yoichi MAEDA Takeshi MIZUIKE Mikio YAMASHITA Hideaki YAMANAKA Koichiro WAKASUGI Masaaki KATAYAMA
Yoshinori HATORI Shuichi MATSUMOTO Hiroshi KOTERA Kiyoharu AIZAWA Fumitaka ONO Hideo KITAJIMA Taizo KINOSHITA Shigeru KUROE Yutaka TANAKA Hideo HASHIMOTO Mitsuharu YANO Toshiaki WATANABE
Vincent van de LAAR Kiyoharu AIZAWA
This paper describes a scheme to capture a wide-view image using a camera setup with uncalibrated cameras. The setup is such that the optical axes are pointed in divergent directions. The direction of view of the resulting image can be chosen freely in any direction between these two optical axes. The scheme uses eight-parameter perspective transformations to warp the images, the parameters of which are obtained by using a relative orientation algorithm. The focal length and scale factor of the two images are estimated by using Powell's multi-dimensional optimization technique. Experiments on real images show the accuracy of the scheme.
Yasuhiro OHTSUKA Takayuki HAMAMOTO Kiyoharu AIZAWA
We propose a new sampling control system on image sensor array. Contrary to the random access pixels, the proposed sensor is able to read out spatially variant sampled pixels at high speed, without inputting pixel address for each access. The sampling positions can be changed dynamically by rewriting the sampling position memory. The proposed sensor has a memory array that stores the sampling positions. It can achieve any spatially varying sampling patterns. A prototype of 64 64 pixels are fabricated under 0.7 µm CMOS precess.
Hiroshi HARASHIMA Kiyoharu AIZAWA Takahiro SAITO
This paper deals with the recent trends of reseaches on intelligent image coding technology focusing on model-based analysis synthesis coding. By means of the intelligent image coding scheme, we will be able to realize epock-making ultra-low-rate image transmission and/or so-called value-added visual telecommunications. In order to categorize the various image coding systems and examine their potential applications in the future, an approach to define generations of image coding technologies is presented. The future generation coding systems include the model-based analysis synthesis coding and knowledge-based intelligent coding. The latter half of the paper will be devoted to the recent work of the authors on the model-based analysis-synthesis coding system for facial images.