1-8hit |
Shoya OOHARA Mitsuji MUNEYASU Soh YOSHIDA Makoto NAKASHIZUKA
For image restoration, an image prior that is obtained from the morphological gradient has been proposed. In the field of mathematical morphology, the optimization of the structuring element (SE) used for this morphological gradient using a genetic algorithm (GA) has also been proposed. In this paper, we introduce a new image prior that is the sum of the morphological gradients and total variation for an image restoration problem to improve the restoration accuracy. The proposed image prior makes it possible to almost match the fitness to a quantitative evaluation such as the mean square error. It also solves the problem of the artifact due to the unsuitability of the SE for the image. An experiment shows the effectiveness of the proposed image restoration method.
Soh YOSHIDA Mitsuji MUNEYASU Takahiro OGAWA Miki HASEYAMA
In this paper, we address the problem of analyzing topics, included in a social video group, to improve the retrieval performance of videos. Unlike previous methods that focused on an individual visual aspect of videos, the proposed method aims to leverage the “mutual reinforcement” of heterogeneous modalities such as tags and users associated with video on the Internet. To represent multiple types of relationships between each heterogeneous modality, the proposed method constructs three subgraphs: user-tag, video-video, and video-tag graphs. We combine the three types of graphs to obtain a heterogeneous graph. Then the extraction of latent features, i.e., topics, becomes feasible by applying graph-based soft clustering to the heterogeneous graph. By estimating the membership of each grouped cluster for each video, the proposed method defines a new video similarity measure. Since the understanding of video content is enhanced by exploiting latent features obtained from different types of data that complement each other, the performance of visual reranking is improved by the proposed method. Results of experiments on a video dataset that consists of YouTube-8M videos show the effectiveness of the proposed method, which achieves a 24.3% improvement in terms of the mean normalized discounted cumulative gain in a search ranking task compared with the baseline method.
Mitsuji MUNEYASU Nayuta JINDA Yuuya MORITANI Soh YOSHIDA
In this paper, we propose a method of embedding and detecting data in printed images with several formats, such as different resolutions and numbers of blocks, using the camera of a tablet device. To specify the resolution of an image and the number of blocks, invisible markers that are embedded in the amplitude domain of the discrete Fourier transform of the target image are used. The proposed method can increase the variety of images suitable for data embedding.
Masahiro YASUDA Soh YOSHIDA Mitsuji MUNEYASU
Methods that embed data into printed images and retrieve data from printed images captured using the camera of a mobile device have been proposed. Evaluating these methods requires printing and capturing actual embedded images, which is burdensome. In this paper, we propose a method for reducing the workload for evaluating the performance of data embedding algorithms by simulating the degradation caused by printing and capturing images using generative adversarial networks. The proposed method can represent various captured conditions. Experimental results demonstrate that the proposed method achieves the same accuracy as detecting embedded data under actual conditions.
Hojun SHIMOYAMA Soh YOSHIDA Takao FUJITA Mitsuji MUNEYASU
Recent character detectors have been modeled using deep neural networks and have achieved high performance in various tasks, such as text detection in natural scenes and character detection in historical documents. However, existing methods cannot achieve high detection accuracy for wooden slips because of their multi-scale character sizes and aspect ratios, high character density, and close character-to-character distance. In this study, we propose a new U-Net-based character detection and localization framework that learns character regions and boundaries between characters. The proposed method enhances the learning performance of character regions by simultaneously learning the vertical and horizontal boundaries between characters. Furthermore, by adding simple and low-cost post-processing using the learned regions of character boundaries, it is possible to more accurately detect the location of a group of characters in a close neighborhood. In this study, we construct a wooden slip dataset. Experiments demonstrated that the proposed method outperformed existing character detection methods, including state-of-the-art character detection methods for historical documents.
Ryota HIGASHIMOTO Soh YOSHIDA Takashi HORIHATA Mitsuji MUNEYASU
Noisy labels in training data can significantly harm the performance of deep neural networks (DNNs). Recent research on learning with noisy labels uses a property of DNNs called the memorization effect to divide the training data into a set of data with reliable labels and a set of data with unreliable labels. Methods introducing semi-supervised learning strategies discard the unreliable labels and assign pseudo-labels generated from the confident predictions of the model. So far, this semi-supervised strategy has yielded the best results in this field. However, we observe that even when models are trained on balanced data, the distribution of the pseudo-labels can still exhibit an imbalance that is driven by data similarity. Additionally, a data bias is seen that originates from the division of the training data using the semi-supervised method. If we address both types of bias that arise from pseudo-labels, we can avoid the decrease in generalization performance caused by biased noisy pseudo-labels. We propose a learning method with noisy labels that introduces unbiased pseudo-labeling based on causal inference. The proposed method achieves significant accuracy gains in experiments at high noise rates on the standard benchmarks CIFAR-10 and CIFAR-100.
Soh YOSHIDA Takahiro OGAWA Miki HASEYAMA Mitsuji MUNEYASU
Video reranking is an effective way for improving the retrieval performance of text-based video search engines. This paper proposes a graph-based Web video search reranking method with local and global consistency analysis. Generally, the graph-based reranking approach constructs a graph whose nodes and edges respectively correspond to videos and their pairwise similarities. A lot of reranking methods are built based on a scheme which regularizes the smoothness of pairwise relevance scores between adjacent nodes with regard to a user's query. However, since the overall consistency is measured by aggregating only the local consistency over each pair, errors in score estimation increase when noisy samples are included within query-relevant videos' neighbors. To deal with the noisy samples, the proposed method leverages the global consistency of the graph structure, which is different from the conventional methods. Specifically, in order to detect this consistency, the propose method introduces a spectral clustering algorithm which can detect video groups, in which videos have strong semantic correlation, on the graph. Furthermore, a new regularization term, which smooths ranking scores within the same group, is introduced to the reranking framework. Since the score regularization is performed by both local and global aspects simultaneously, the accurate score estimation becomes feasible. Experimental results obtained by applying the proposed method to a real-world video collection show its effectiveness.
Takamasa FUJII Soh YOSHIDA Mitsuji MUNEYASU
In video search reranking, in addition to the well-known semantic gap, the intent gap, which is the gap between the representation of the users' demand and the real search intention, is becoming a major problem restricting the improvement of reranking performance. To address this problem, we propose video search reranking based on a semantic representation by multiple tags. In the proposed method, we use relevance feedback, which the user can interact with by specifying some example videos from the initial search results. We apply the relevance feedback to reduce the gap between the real intent of the users and the video search results. In addition, we focus on the fact that multiple tags are used to represent video contents. By vectorizing multiple tags associated with videos on the basis of the Word2Vec algorithm and calculating the centroid of the tag vector as a collective representation, we can evaluate the semantic similarity between videos by using tag features. We conduct experiments on the YouTube-8M dataset, and the results show that our reranking approach is effective and efficient.