Takahiro OGAWA Yoshiaki YAMAGUCHI Satoshi ASAMIZU Miki HASEYAMA
This paper presents human-centered video feature selection via mRMR-SCMMCCA (minimum Redundancy and Maximum Relevance-Specific Correlation Maximization Multiset Canonical Correlation Analysis) algorithm for preference extraction. The proposed method derives SCMMCCA, which simultaneously maximizes two kinds of correlations, correlation between video features and users' viewing behavior features and correlation between video features and their corresponding rating scores. By monitoring the derived correlations, the selection of the optimal video features that represent users' individual preference becomes feasible.
Tomoki HIRAMATSU Takahiro OGAWA Miki HASEYAMA
In this paper, a Kalman filter-based method for restoration of video images acquired by an in-vehicle camera in foggy conditions is proposed. In order to realize Kalman filter-based restoration, the proposed method clips local blocks from the target frame by using a sliding window and regards the intensities in each block as elements of the state variable of the Kalman filter. Furthermore, the proposed method designs the following two models for restoration of foggy images. The first one is an observation model, which represents a fog deterioration model. The proposed method automatically determines all parameters of the fog deterioration model from only the foggy images to design the observation model. The second one is a non-linear state transition model, which represents the target frame in the original video image from its previous frame based on motion vectors. By utilizing the observation and state transition models, the correlation between successive frames can be effectively utilized for restoration, and accurate restoration of images obtained in foggy conditions can be achieved. Experimental results show that the proposed method has better performance than that of the traditional method based on the fog deterioration model.
Soh YOSHIDA Takahiro OGAWA Miki HASEYAMA Mitsuji MUNEYASU
Video reranking is an effective way for improving the retrieval performance of text-based video search engines. This paper proposes a graph-based Web video search reranking method with local and global consistency analysis. Generally, the graph-based reranking approach constructs a graph whose nodes and edges respectively correspond to videos and their pairwise similarities. A lot of reranking methods are built based on a scheme which regularizes the smoothness of pairwise relevance scores between adjacent nodes with regard to a user's query. However, since the overall consistency is measured by aggregating only the local consistency over each pair, errors in score estimation increase when noisy samples are included within query-relevant videos' neighbors. To deal with the noisy samples, the proposed method leverages the global consistency of the graph structure, which is different from the conventional methods. Specifically, in order to detect this consistency, the propose method introduces a spectral clustering algorithm which can detect video groups, in which videos have strong semantic correlation, on the graph. Furthermore, a new regularization term, which smooths ranking scores within the same group, is introduced to the reranking framework. Since the score regularization is performed by both local and global aspects simultaneously, the accurate score estimation becomes feasible. Experimental results obtained by applying the proposed method to a real-world video collection show its effectiveness.
Miki HASEYAMA Takahiro OGAWA Sho TAKAHASHI Shuhei NOMURA Masatsugu SHIMOMURA
Biomimetics is a new research field that creates innovation through the collaboration of different existing research fields. However, the collaboration, i.e., the exchange of deep knowledge between different research fields, is difficult for several reasons such as differences in technical terms used in different fields. In order to overcome this problem, we have developed a new retrieval platform, “Biomimetics image retrieval platform,” using a visualization-based image retrieval technique. A biological database contains a large volume of image data, and by taking advantage of these image data, we are able to overcome limitations of text-only information retrieval. By realizing such a retrieval platform that does not depend on technical terms, individual biological databases of various species can be integrated. This will allow not only the use of data for the study of various species by researchers in different biological fields but also access for a wide range of researchers in fields ranging from materials science, mechanical engineering and manufacturing. Therefore, our platform provides a new path bridging different fields and will contribute to the development of biomimetics since it can overcome the limitation of the traditional retrieval platform.
Hirokazu TANAKA Sunmi KIM Takahiro OGAWA Miki HASEYAMA
A new spatial and temporal error concealment method for three-dimensional discrete wavelet transform (3D DWT) video coding is analyzed. 3D DWT video coding employing dispersive grouping (DG) and two-step error concealment is an efficient method in a packet loss channel [20],[21]. In the two-step error concealment method, the interpolations are only spatially applied however, higher efficiency of the interpolation can be expected by utilizing spatial and temporal similarities. In this paper, we propose an enhanced spatial and temporal error concealment method in order to achieve higher error concealment (EC) performance in packet loss networks. In the temporal error concealment method, structural similarity (SSIM) index is employed for inter group of pictures (GOP) EC and minimum mean square error (MMSE) is used for intra GOP EC. Experimental results show that the proposed method can obtain remarkable performance compared with the conventional methods.
Takahiro OGAWA Sho TAKAHASHI Naofumi WADA Akira TANAKA Miki HASEYAMA
Binary sparse representation based on arbitrary quality metrics and its applications are presented in this paper. The novelties of the proposed method are twofold. First, the proposed method newly derives sparse representation for which representation coefficients are binary values, and this enables selection of arbitrary image quality metrics. This new sparse representation can generate quality metric-independent subspaces with simplification of the calculation procedures. Second, visual saliency is used in the proposed method for pooling the quality values obtained for all of the parts within target images. This approach enables visually pleasant approximation of the target images more successfully. By introducing the above two novel approaches, successful image approximation considering human perception becomes feasible. Since the proposed method can provide lower-dimensional subspaces that are obtained by better image quality metrics, realization of several image reconstruction tasks can be expected. Experimental results showed high performance of the proposed method in terms of two image reconstruction tasks, image inpainting and super-resolution.
Marie KATSURAI Takahiro OGAWA Miki HASEYAMA
In this paper, a novel framework for extracting visual feature-based keyword relationships from an image database is proposed. From the characteristic that a set of relevant keywords tends to have common visual features, the keyword relationships in a target image database are extracted by using the following two steps. First, the relationship between each keyword and its corresponding visual features is modeled by using a classifier. This step enables detection of visual features related to each keyword. In the second step, the keyword relationships are extracted from the obtained results. Specifically, in order to measure the relevance between two keywords, the proposed method removes visual features related to one keyword from training images and monitors the performance of the classifier obtained for the other keyword. This measurement is the biggest difference from other conventional methods that focus on only keyword co-occurrences or visual similarities. Results of experiments conducted using an image database showed the effectiveness of the proposed method.
Guang LI Ren TOGO Takahiro OGAWA Miki HASEYAMA
In this study, we propose a novel dataset distillation method based on parameter pruning. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on two benchmark datasets show the superiority of the proposed method.
A new framework for reconstruction of missing textures in digital images is introduced in this paper. The framework is based on a projection onto convex sets (POCS) algorithm including a novel constraint. In the proposed method, a nonlinear eigenspace of each cluster obtained by classification of known textures within the target image is applied to the constraint. The main advantage of this approach is that the eigenspace can approximate the textures classified into the same cluster in the least-squares sense. Furthermore, by monitoring the errors converged by the POCS algorithm, a selection of the optimal cluster to reconstruct the target texture including missing intensities can be achieved. This POCS-based approach provides a solution to the problem in traditional methods of not being able to perform the selection of the optimal cluster due to the missing intensities within the target texture. Consequently, all of the missing textures are successfully reconstructed by the selected cluster's eigenspaces which correctly approximate the same kinds of textures. Experimental results show subjective and quantitative improvement of the proposed reconstruction technique over previously reported reconstruction techniques.
Shigeki TAKAHASHI Takahiro OGAWA Hirokazu TANAKA Miki HASEYAMA
A novel error concealment method using a Kalman filter is presented in this paper. In order to successfully utilize the Kalman filter, its state transition and observation models that are suitable for the video error concealment are newly defined as follows. The state transition model represents the video decoding process by a motion-compensated prediction. Furthermore, the new observation model that represents an image blurring process is defined, and calculation of the Kalman gain becomes possible. The problem of the traditional methods is solved by using the Kalman filter in the proposed method, and accurate reconstruction of corrupted video frames is achieved. Consequently, an effective error concealment method using the Kalman filter is realized. Experimental results showed that the proposed method has better performance than that of traditional methods.
Yasutaka HATAKEYAMA Takahiro OGAWA Hironori IKEDA Miki HASEYAMA
In this paper, we propose a method to estimate the most resource-consuming disease from electronic claim data based on Labeled Latent Dirichlet Allocation (Labeled LDA). The proposed method models each electronic claim from its medical procedures as a mixture of resource-consuming diseases. Thus, the most resource-consuming disease can be automatically estimated by applying Labeled LDA to the electronic claim data. Although our method is composed of a simple scheme, this is the first trial for realizing estimation of the most resource-consuming disease.
Tomoki HIRAMATSU Takahiro OGAWA Miki HASEYAMA
In this paper, an ER (Error-Reduction) algorithm-based method for removal of adherent water drops from images obtained by a rear view camera mounted on a vehicle in rainy conditions is proposed. Since Fourier-domain and object-domain constraints are needed for any ER algorithm-based method, the proposed method introduces the following two novel constraints for the removal of adherent water drops. The first one is the Fourier-domain constraint that utilizes the Fourier transform magnitude of the previous frame in the obtained images as that of the target frame. Noting that images obtained by the rear view camera have the unique characteristics of objects moving like ripples because the rear view camera is generally composed of a fish-eye lens for a wide view angle, the proposed method assumes that the Fourier transform magnitudes of the target frame and the previous frame are the same in the polar coordinate system. The second constraint is the object-domain constraint that utilizes intensities in an area of the target frame to which water drops have adhered. Specifically, the proposed method models a deterioration process of intensities that are corrupted by the water drop adhering to the rear view camera lens. By utilizing these novel constraints, the proposed ER algorithm can remove adherent water drops from images obtained by the rear view camera. Experimental results that verify the performance of the proposed method are represented.
Yasutaka HATAKEYAMA Takahiro OGAWA Satoshi ASAMIZU Miki HASEYAMA
A novel video retrieval method based on Web community extraction using audio and visual features and textual features of video materials is proposed in this paper. In this proposed method, canonical correlation analysis is applied to these three features calculated from video materials and their Web pages, and transformation of each feature into the same variate space is possible. The transformed variates are based on the relationships between visual, audio and textual features of video materials, and the similarity between video materials in the same feature space for each feature can be calculated. Next, the proposed method introduces the obtained similarities of video materials into the link relationship between their Web pages. Furthermore, by performing link analysis of the obtained weighted link relationship, this approach extracts Web communities including similar topics and provides the degree of attribution of video materials in each Web community for each feature. Therefore, by calculating similarities of the degrees of attribution between the Web communities extracted from the three kinds of features, the desired ones are automatically selected. Consequently, by monitoring the degrees of attribution of the obtained Web communities, the proposed method can perform effective video retrieval. Some experimental results obtained by applying the proposed method to video materials obtained from actual Web pages are shown to verify the effectiveness of the proposed method.
Keisuke MAEDA Kazaha HORII Takahiro OGAWA Miki HASEYAMA
A multi-task convolutional neural network leading to high performance and interpretability via attribute estimation is presented in this letter. Our method can provide interpretation of the classification results of CNNs by outputting attributes that explain elements of objects as a judgement reason of CNNs in the middle layer. Furthermore, the proposed network uses the estimated attributes for the following prediction of classes. Consequently, construction of a novel multi-task CNN with improvements in both of the interpretability and classification performance is realized.
Takahiro OGAWA Akira TANAKA Miki HASEYAMA
A Wiener-based inpainting quality prediction method is presented in this paper. The proposed method is the first method that can predict inpainting quality both before and after the intensities have become missing even if their inpainting methods are unknown. Thus, when the target image does not include any missing areas, the proposed method estimates the importance of intensities for all pixels, and then we can know which areas should not be removed. Interestingly, since this measure can be also derived in the same manner for its corrupted image already including missing areas, the expected difficulty in reconstruction of these missing pixels is predicted, i.e., we can know which missing areas can be successfully reconstructed. The proposed method focuses on expected errors derived from the Wiener filter, which enables least-squares reconstruction, to predict the inpainting quality. The greatest advantage of the proposed method is that the same inpainting quality prediction scheme can be used in the above two different situations, and their results have common trends. Experimental results show that the inpainting quality predicted by the proposed method can be successfully used as a universal quality measure.
Kohei TATENO Takahiro OGAWA Miki HASEYAMA
A novel dimensionality reduction method, Fisher Discriminant Locality Preserving Canonical Correlation Analysis (FDLP-CCA), for visualizing Web images is presented in this paper. FDLP-CCA can integrate two modalities and discriminate target items in terms of their semantics by considering unique characteristics of the two modalities. In this paper, we focus on Web images with text uploaded on Social Networking Services for these two modalities. Specifically, text features have high discriminate power in terms of semantics. On the other hand, visual features of images give their perceptual relationships. In order to consider both of the above unique characteristics of these two modalities, FDLP-CCA estimates the correlation between the text and visual features with consideration of the cluster structure based on the text features and the local structures based on the visual features. Thus, FDLP-CCA can integrate the different modalities and provide separated manifolds to organize enhanced compactness within each natural cluster.
Sunmi KIM Hirokazu TANAKA Takahiro OGAWA Miki HASEYAMA
In this paper, we propose a two-step error concealment algorithm based on an error resilient three-dimensional discrete wavelet transform (3-D DWT) video coding scheme. The proposed scheme consists of an error-resilient encoder duplicating the lowest sub-band bit-streams for dispersive grouped frames and an error concealment decoder. The error concealment method of this decoder is decomposed of two steps, the first step is replacement of erroneous coefficients in the lowest sub-band by the duplicated coefficients, and the second step is interpolation of the missing wavelet coefficients by minimum mean square error (MMSE) estimation. The proposed scheme can achieve robust transmission over unreliable channels. Experimental results provide performance comparisons in terms of peak signal-to-noise ratio (PSNR) and demonstrate increased performances compared to state-of-the-art error concealment schemes.
Perceptually optimized missing texture reconstruction via neighboring embedding (NE) is presented in this paper. The proposed method adopts the structural similarity (SSIM) index as a measure for representing texture reconstruction performance of missing areas. This provides a solution to the problem of previously reported methods not being able to perform perceptually optimized reconstruction. Furthermore, in the proposed method, a new scheme for selection of the known nearest neighbor patches for reconstruction of target patches including missing areas is introduced. Specifically, by monitoring the SSIM index observed by the proposed NE-based reconstruction algorithm, selection of known patches optimal for the reconstruction becomes feasible even if target patches include missing pixels. The above novel approaches enable successful reconstruction of missing areas. Experimental results show improvement of the proposed method over previously reported methods.
In this paper, a method for adaptive reconstruction of missing textures based on kernel canonical correlation analysis (CCA) with a new clustering scheme is presented. The proposed method estimates the correlation between two areas, which respectively correspond to a missing area and its neighboring area, from known parts within the target image and realizes reconstruction of the missing texture. In order to obtain this correlation, the kernel CCA is applied to each cluster containing the same kind of textures, and the optimal result is selected for the target missing area. Specifically, a new approach monitoring errors caused in the above kernel CCA-based reconstruction process enables selection of the optimal result. This approach provides a solution to the problem in traditional methods of not being able to perform adaptive reconstruction of the target textures due to missing intensities. Consequently, all of the missing textures are successfully estimated by the optimal cluster's correlation, which provides accurate reconstruction of the same kinds of textures. In addition, the proposed method can obtain the correlation more accurately than our previous works, and more successful reconstruction performance can be expected. Experimental results show impressive improvement of the proposed reconstruction technique over previously reported reconstruction techniques.
Yoshiki ITO Takahiro OGAWA Miki HASEYAMA
A method for accurate estimation of personalized video preference using multiple users' viewing behavior is presented in this paper. The proposed method uses three kinds of features: a video, user's viewing behavior and evaluation scores for the video given by a target user. First, the proposed method applies Supervised Multiview Spectral Embedding (SMSE) to obtain lower-dimensional video features suitable for the following correlation analysis. Next, supervised Multi-View Canonical Correlation Analysis (sMVCCA) is applied to integrate the three kinds of features. Then we can get optimal projections to obtain new visual features, “canonical video features” reflecting the target user's individual preference for a video based on sMVCCA. Furthermore, in our method, we use not only the target user's viewing behavior but also other users' viewing behavior for obtaining the optimal canonical video features of the target user. This unique approach is the biggest contribution of this paper. Finally, by integrating these canonical video features, Support Vector Ordinal Regression with Implicit Constraints (SVORIM) is trained in our method. Consequently, the target user's preference for a video can be estimated by using the trained SVORIM. Experimental results show the effectiveness of our method.