IEICE global.ieice.org Site

Author Search Result

[Author] Shin'ichi SATOH(10hit)

1-10hit

Bayesian Exponential Inverse Document Frequency and Region-of-Interest Effect for Enhancing Instance Search Accuracy
Masaya MURATA Hidehisa NAGANO Kaoru HIRAMATSU Kunio KASHINO Shin'ichi SATOH

PAPER-Image Processing and Video Processing

Pubricized:
2016/06/03
Vol:
E99-D No:9
Page(s):
2320-2331
In this paper, we first analyze the discriminative power in the Best Match (BM) 25 formula and provide its calculation method from the Bayesian point of view. The resulting, derived discriminative power is quite similar to the exponential inverse document frequency (EIDF) that we have previously proposed [1] but retains more preferable theoretical advantages. In our previous paper [1], we proposed the EIDF in the framework of the probabilistic information retrieval (IR) method BM25 to address the instance search task, which is a specific object search for videos using an image query. Although the effectiveness of our EIDF was experimentally demonstrated, we did not consider its theoretical justification and interpretation. We also did not describe the use of region-of-interest (ROI) information, which is supposed to be input to the instance search system together with the original image query showing the instance. Therefore, here, we justify the EIDF by calculating the discriminative power in the BM25 from the Bayesian viewpoint. We also investigate the effect of the ROI information for improving the instance search accuracy and propose two search methods incorporating the ROI effect into the BM25 video ranking function. We validated the proposed methods through a series of experiments using the TREC Video Retrieval Evaluation instance search task dataset.
An Efficient Extraction Method for Closed Loops Using a Graph Search Technique
Shin'ichi SATOH Hiroshi MO Masao SAKAUCHI

LETTER

Vol:
E78-A No:5
Page(s):
583-586
This letter presents a new method to efficiently extract closed loops as primitive symbols in line drawings. Our method uses a graph search technique for efficiency and exhaustibility, and also incorporates feasibility criteria of symbols. Experiments clearly demonstrated the method's effectiveness.
Query Bootstrapping: A Visual Mining Based Query Expansion
Siriwat KASAMWATTANAROTE Yusuke UCHIDA Shin'ichi SATOH

PAPER-Image Processing and Video Processing

Pubricized:
2015/11/10
Vol:
E99-D No:2
Page(s):
454-466
Bag of Visual Words (BoVW) is an effective framework for image retrieval. Query expansion (QE) further boosts retrieval performance by refining a query with relevant visual words found from the geometric consistency check between the query image and highly ranked retrieved images obtained from the first round of retrieval. Since QE checks the pairwise consistency between query and highly ranked images, its performance may deteriorate when there are slight degradations in the query image. We propose Query Bootstrapping as a variant of QE to circumvent this problem by using the consistency of highly ranked images instead of pairwise consistency. In so doing, we regard frequently co-occurring visual words in highly ranked images as relevant visual words. Frequent itemset mining (FIM) is used to find such visual words efficiently. However, the FIM-based approach requires sensitive parameters to be fine-tuned, namely, support (min/max-support) and the number of top ranked images (top-k). Here, we propose an adaptive support algorithm that adaptively determines both the minimum support and maximum support by referring to the first round's retrieval list. Selecting relevant images by using a geometric consistency check further boosts retrieval performance by reducing outlier images from a mining process. An important parameter for the LO-RANSAC algorithm that is used for the geometric consistency check, namely, inlier threshold, is automatically determined by our algorithm. We further introduce tf-fi-idf on top of tf-idf in order to take into account the frequency of inliers (fi) in the retrieved images. We evaluated the performance of QB in terms of mean average precision (mAP) on three benchmark datasets and found that it gave significant performance boosts of 5.37%, 9.65%, and 8.52% over that of state-of-the-art QE on Oxford 5k, Oxford 105k, and Paris 6k, respectively.
Human Action Recognition from Depth Videos Using Pool of Multiple Projections with Greedy Selection
Chien-Quang LE Sang PHAN Thanh Duc NGO Duy-Dinh LE Shin'ichi SATOH Duc Anh DUONG

PAPER-Pattern Recognition

Pubricized:
2016/04/25
Vol:
E99-D No:8
Page(s):
2161-2171
Depth-based action recognition has been attracting the attention of researchers because of the advantages of depth cameras over standard RGB cameras. One of these advantages is that depth data can provide richer information from multiple projections. In particular, multiple projections can be used to extract discriminative motion patterns that would not be discernible from one fixed projection. However, high computational costs have meant that recent studies have exploited only a small number of projections, such as front, side, and top. Thus, a large number of projections, which may be useful for discriminating actions, are discarded. In this paper, we propose an efficient method to exploit pools of multiple projections for recognizing actions in depth videos. First, we project 3D data onto multiple 2D-planes from different viewpoints sampled on a geodesic dome to obtain a large number of projections. Then, we train and test action classifiers independently for each projection. To reduce the computational cost, we propose a greedy method to select a small yet robust combination of projections. The idea is that best complementary projections will be considered first when searching for optimal combination. We conducted extensive experiments to verify the effectiveness of our method on three challenging benchmarks: MSR Action 3D, MSR Gesture 3D, and 3D Action Pairs. The experimental results show that our method outperforms other state-of-the-art methods while using a small number of projections.
Multilevel Thresholding Color Image Segmentation Using a Modified Artificial Bee Colony Algorithm
Sipeng ZHANG Wei JIANG Shin'ichi SATOH

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2018/05/09
Vol:
E101-D No:8
Page(s):
2064-2071
In this paper, a multilevel thresholding color image segmentation method is proposed using a modified Artificial Bee Colony(ABC) algorithm. In this work, in order to improve the local search ability of ABC algorithm, Krill Herd algorithm is incorporated into its onlooker bees phase. The proposed algorithm is named as Krill herd-inspired modified Artificial Bee Colony algorithm (KABC algorithm). Experiment results verify the robustness of KABC algorithm, as well as its improvement in optimizing accuracy and convergence speed. In this work, KABC algorithm is used to solve the problem of multilevel thresholding for color image segmentation. To deal with luminance variation, rather than using gray scale histogram, a HSV space-based pre-processing method is proposed to obtain 1D feature vector. KABC algorithm is then applied to find thresholds of the feature vector. At last, an additional local search around the quasi-optimal solutions is employed to improve segmentation accuracy. In this stage, we use a modified objective function which combines Structural Similarity Index Matrix (SSIM) with Kapur's entropy. The pre-processing method, the global optimization with KABC algorithm and the local optimization stage form the whole color image segmentation method. Experiment results show enhance in accuracy of segmentation with the proposed method.
Face Retrieval in Large-Scale News Video Datasets
Thanh Duc NGO Hung Thanh VU Duy-Dinh LE Shin'ichi SATOH

PAPER-Image Recognition, Computer Vision

Vol:
E96-D No:8
Page(s):
1811-1825
Face retrieval in news video has been identified as a challenging task due to the huge variations in the visual appearance of the human face. Although several approaches have been proposed to deal with this problem, their extremely high computational cost limits their scalability to large-scale video datasets that may contain millions of faces of hundreds of characters. In this paper, we introduce approaches for face retrieval that are scalable to such datasets while maintaining competitive performances with state-of-the-art approaches. To utilize the variability of face appearances in video, we use a set of face images called face-track to represent the appearance of a character in a video shot. Our first proposal is an approach for extracting face-tracks. We use a point tracker to explore the connections between detected faces belonging to the same character and then group them into one face-track. We present techniques to make the approach robust against common problems caused by flash lights, partial occlusions, and scattered appearances of characters in news videos. In the second proposal, we introduce an efficient approach to match face-tracks for retrieval. Instead of using all the faces in the face-tracks to compute their similarity, our approach obtains a representative face for each face-track. The representative face is computed from faces that are sampled from the original face-track. As a result, we significantly reduce the computational cost of face-track matching while taking into account the variability of faces in face-tracks to achieve high matching accuracy. Experiments are conducted on two face-track datasets extracted from real-world news videos, of such scales that have never been considered in the literature. One dataset contains 1,497 face-tracks of 41 characters extracted from 370 hours of TRECVID videos. The other dataset provides 5,567 face-tracks of 111 characters observed from a television news program (NHK News 7) over 11 years. We make both datasets publically accessible by the research community. The experimental results show that our proposed approaches achieved a remarkable balance between accuracy and efficiency.
Efficient Tracking of News Topics Based on Chronological Semantic Structures in a Large-Scale News Video Archive
Ichiro IDE Tomoyoshi KINOSHITA Tomokazu TAKAHASHI Hiroshi MO Norio KATAYAMA Shin'ichi SATOH Hiroshi MURASE

PAPER-Video Processing

Vol:
E95-D No:5
Page(s):
1288-1300
Recent advance in digital storage technology has enabled us to archive a large volume of video data. Thanks to this trend, we have archived more than 1,800 hours of video data from a daily Japanese news show in the last ten years. When considering the effective use of such a large news video archive, we assumed that analysis of its chronological and semantic structure becomes important. We also consider that providing the users with the development of news topics is more important to help their understanding of current affairs, rather than providing a list of relevant news stories as in most of the current news video retrieval systems. Therefore, in this paper, we propose a structuring method for a news video archive, together with an interface that visualizes the structure, so that users could track the development of news topics according to their interest, efficiently. The proposed news video structure, namely the “topic thread structure”, is obtained as a result of an analysis of the chronological and semantic relation between news stories. Meanwhile, the proposed interface, namely “mediaWalker II”, allows users to track the development of news topics along the topic thread structure, and at the same time watch the video footage corresponding to each news story. Analyses on the topic thread structures obtained by applying the proposed method to actual news video footages revealed interesting and comprehensible relations between news topics in the real world. At the same time, analyses on their size quantified the efficiency of tracking a user's topic-of-interest based on the proposed topic thread structure. We consider this as a first step towards facilitating video authoring by users based on existing contents in a large-scale news video archive.
Rephrasing Visual Questions by Specifying the Entropy of the Answer Distribution
Kento TERAO Toru TAMAKI Bisser RAYTCHEV Kazufumi KANEDA Shin'ichi SATOH

PAPER-Image Recognition, Computer Vision

Pubricized:
2020/08/20
Vol:
E103-D No:11
Page(s):
2362-2370
Visual question answering (VQA) is a task of answering a visual question that is a pair of question and image. Some visual questions are ambiguous and some are clear, and it may be appropriate to change the ambiguity of questions from situation to situation. However, this issue has not been addressed by any prior work. We propose a novel task, rephrasing the questions by controlling the ambiguity of the questions. The ambiguity of a visual question is defined by the use of the entropy of the answer distribution predicted by a VQA model. The proposed model rephrases a source question given with an image so that the rephrased question has the ambiguity (or entropy) specified by users. We propose two learning strategies to train the proposed model with the VQA v2 dataset, which has no ambiguity information. We demonstrate the advantage of our approach that can control the ambiguity of the rephrased questions, and an interesting observation that it is harder to increase than to reduce ambiguity.
A Multi-Stage Approach to Fast Face Detection
Duy-Dinh LE Shin'ichi SATOH

PAPER-Image Recognition, Computer Vision

Vol:
E89-D No:7
Page(s):
2275-2285
A multi-stage approach -- which is fast, robust and easy to train -- for a face-detection system is proposed. Motivated by the work of Viola and Jones [1], this approach uses a cascade of classifiers to yield a coarse-to-fine strategy to reduce significantly detection time while maintaining a high detection rate. However, it is distinguished from previous work by two features. First, a new stage has been added to detect face candidate regions more quickly by using a larger window size and larger moving step size. Second, support vector machine (SVM) classifiers are used instead of AdaBoost classifiers in the last stage, and Haar wavelet features selected by the previous stage are reused for the SVM classifiers robustly and efficiently. By combining AdaBoost and SVM classifiers, the final system can achieve both fast and robust detection because most non-face patterns are rejected quickly in earlier layers, while only a small number of promising face patterns are classified robustly in later layers. The proposed multi-stage-based system has been shown to run faster than the original AdaBoost-based system while maintaining comparable accuracy.
Drawing Understanding System Incorporating Rule Generation Support with Man-Machine Interactions
Shin'ichi SATOH Hiroshi MO Masao SAKAUCHI

PAPER

Vol:
E77-D No:7
Page(s):
735-742
The present study describes using the state transition type of drawing understanding framework to construct a multi-purpose drawing understanding system. This new system employs an understanding process that complies with the understanding rules, which are easily obtained by the user. The same set of user-provided rules must be used for the same type of target drawings, but for slightly different ones, fine tuning is required to obtain understanding rules. To overcome this inherent drawback in constructing drawing understanding systems, we extended the system using a newly constructed understanding rule generating support system. The resultant integrated system is based on a man-machine cooperation type interface, and can automatically generate rules from user-provided simple interactions using a graphical user interace (GUI). To obtain efficient rule generation, the system employs an inductive inference method as a learning algorithm. Map-drawing experiments were successfully carried out, and an evaluation based on a rule leaning error criterion subsequently revealed an efficient rule generation process.

Author Search Result

[Author] Shin'ichi SATOH(10hit)

Bayesian Exponential Inverse Document Frequency and Region-of-Interest Effect for Enhancing Instance Search Accuracy

An Efficient Extraction Method for Closed Loops Using a Graph Search Technique

Query Bootstrapping: A Visual Mining Based Query Expansion

Human Action Recognition from Depth Videos Using Pool of Multiple Projections with Greedy Selection

Multilevel Thresholding Color Image Segmentation Using a Modified Artificial Bee Colony Algorithm

Face Retrieval in Large-Scale News Video Datasets

Efficient Tracking of News Topics Based on Chronological Semantic Structures in a Large-Scale News Video Archive

Rephrasing Visual Questions by Specifying the Entropy of the Answer Distribution

A Multi-Stage Approach to Fast Face Detection

Drawing Understanding System Incorporating Rule Generation Support with Man-Machine Interactions

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles