The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] annotation(17hit)

  • Noisy Localization Annotation Refinement for Object Detection

    Jiafeng MAO  Qing YU  Kiyoharu AIZAWA  

    PAPER-Image Recognition, Computer Vision

    E104-D No:9

    Well annotated dataset is crucial to the training of object detectors. However, the production of finely annotated datasets for object detection tasks is extremely labor-intensive, therefore, cloud sourcing is often used to create datasets, which leads to these datasets tending to contain incorrect annotations such as inaccurate localization bounding boxes. In this study, we highlight a problem of object detection with noisy bounding box annotations and show that these noisy annotations are harmful to the performance of deep neural networks. To solve this problem, we further propose a framework to allow the network to modify the noisy datasets by alternating refinement. The experimental results demonstrate that our proposed framework can significantly alleviate the influences of noise on model performance.

  • Collaborative Ontology Development and its Use for Video Annotation in Elderly Care Domain

    Satoshi NISHIMURA  Julio VIZCARRA  Yuichi OOTA  Ken FUKUDA  


    E104-D No:5

    Multimedia data and information management is an important task according to the development of media processing technology. Multimedia is a useful resource that people understand complex situations such as the elderly care domain. Appropriate annotation is beneficial in several tasks of information management, such as storing, retrieval, and summarization of data, from a semantic perspective. However, the metadata annotation for multimedia data remains problematic because metadata is obtained as a result of interpretation depending on domain-specific knowledge, and it needs well-controlled and comprehensive vocabulary for annotation. In this study, we proposed a collaborative methodology for developing ontologies and annotation with domain experts. The method includes (1) classification of knowledge types for collaborative construction of annotation data, (2) division of tasks among a team composed of domain experts, ontology engineers, and annotators, and (3) incremental approach to ontology development. We applied the proposed method to 11 videos on elderly care domain for the confirmation of its feasibility. We focused on annotation of actions occurring in these videos, thereby the annotated data is used as a support in evaluating staff skills. The application results show the content in the ontology during annotation increases monotonically. The number of “action concepts” is saturated and reused among the case studies. This demonstrates that the ontology is reusable and could represent various case studies by using a small number of “action concepts”. This study concludes by presenting lessons learnt from the case studies.

  • Completion of Missing Labels for Multi-Label Annotation by a Unified Graph Laplacian Regularization

    Jonathan MOJOO  Yu ZHAO  Muthu Subash KAVITHA  Junichi MIYAO  Takio KURITA  

    PAPER-Artificial Intelligence, Data Mining

    E103-D No:10

    The task of image annotation is becoming enormously important for efficient image retrieval from the web and other large databases. However, huge semantic information and complex dependency of labels on an image make the task challenging. Hence determining the semantic similarity between multiple labels on an image is useful to understand any incomplete label assignment for image retrieval. This work proposes a novel method to solve the problem of multi-label image annotation by unifying two different types of Laplacian regularization terms in deep convolutional neural network (CNN) for robust annotation performance. The unified Laplacian regularization model is implemented to address the missing labels efficiently by generating the contextual similarity between labels both internally and externally through their semantic similarities, which is the main contribution of this study. Specifically, we generate similarity matrices between labels internally by using Hayashi's quantification method-type III and externally by using the word2vec method. The generated similarity matrices from the two different methods are then combined as a Laplacian regularization term, which is used as the new objective function of the deep CNN. The Regularization term implemented in this study is able to address the multi-label annotation problem, enabling a more effectively trained neural network. Experimental results on public benchmark datasets reveal that the proposed unified regularization model with deep CNN produces significantly better results than the baseline CNN without regularization and other state-of-the-art methods for predicting missing labels.

  • Human Pose Annotation Using a Motion Capture System for Loose-Fitting Clothes

    Takuya MATSUMOTO  Kodai SHIMOSATO  Takahiro MAEDA  Tatsuya MURAKAMI  Koji MURAKOSO  Kazuhiko MINO  Norimichi UKITA  


    E103-D No:6

    This paper proposes a framework for automatically annotating the keypoints of a human body in images for learning 2D pose estimation models. Ground-truth annotations for supervised learning are difficult and cumbersome in most machine vision tasks. While considerable contributions in the community provide us a huge number of pose-annotated images, all of them mainly focus on people wearing common clothes, which are relatively easy to annotate the body keypoints. This paper, on the other hand, focuses on annotating people wearing loose-fitting clothes (e.g., Japanese Kimono) that occlude many body keypoints. In order to automatically and correctly annotate these people, we divert the 3D coordinates of the keypoints observed without loose-fitting clothes, which can be captured by a motion capture system (MoCap). These 3D keypoints are projected to an image where the body pose under loose-fitting clothes is similar to the one captured by the MoCap. Pose similarity between bodies with and without loose-fitting clothes is evaluated with 3D geometric configurations of MoCap markers that are visible even with loose-fitting clothes (e.g., markers on the head, wrists, and ankles). Experimental results validate the effectiveness of our proposed framework for human pose estimation.

  • Incremental Environmental Monitoring for Revealing the Ecology of Endangered Fish Open Access

    Yoshinari SHIRAI  Yasue KISHINO  Shin MIZUTANI  Yutaka YANAGISAWA  Takayuki SUYAMA  Takuma OTSUKA  Tadao KITAGAWA  Futoshi NAYA  


    E101-B No:10

    This paper proposes a novel environmental monitoring strategy, incremental environmental monitoring, that enables scientists to reveal the ecology of wild animals in the field. We applied this strategy to the habitat of endangered freshwater fish. Specifically, we designed and implemented a network-based system using distributed sensors to continuously monitor and record the habitat of endangered fish. Moreover, we developed a set of analytical tools to exploit a variety of sensor data, including environmental time-series data such as amount of dissolved oxygen, as well as underwater video capturing the interaction of fish and their environment. We also describe the current state of monitoring the behavior and habitat of endangered fish and discuss solutions for making such environmental monitoring more efficient in the field.

  • Manifold Kernel Metric Learning for Larger-Scale Image Annotation

    Lihua GUO  

    LETTER-Pattern Recognition

    E98-D No:7

    An appropriate similarity measure between images is one of the key techniques in search-based image annotation models. In order to capture the nonlinear relationships between visual features and image semantics, many kernel distance metric learning(KML) algorithms have been developed. However, when challenged with large-scale image annotation, their metrics can't explicitly represent the similarity between image semantics, and their algorithms suffer from high computation cost. Therefore, they always lose their efficiency. In this paper, we propose a manifold kernel metric learning (M_KML) algorithm. Our M_KML algorithm will simultaneously learn the manifold structure and the image annotation metrics. The main merit of our M_KML algorithm is that the distance metrics are builded on image feature's interior manifold structure, and the dimensionality reduction on manifold structure can handle the high dimensionality challenge faced by KML. Final experiments verify our method's efficiency and effectiveness by comparing it with state-of-the-art image annotation approaches.

  • Multi-Stage Automatic NE and PoS Annotation Using Pattern-Based and Statistical-Based Techniques for Thai Corpus Construction

    Nattapong TONGTEP  Thanaruk THEERAMUNKONG  

    PAPER-Natural Language Processing

    E96-D No:10

    Automated or semi-automated annotation is a practical solution for large-scale corpus construction. However, the special characteristics of Thai language, such as lack of word-boundary and sentence-boundary markers, trigger several issues in automatic corpus annotation. This paper presents a multi-stage annotation framework, containing two stages of chunking and three stages of tagging. The two chunking stages are pattern matching-based named entity (NE) extraction and dictionary-based word segmentation while the three succeeding tagging stages are dictionary-, pattern- and statist09812490981249ical-based tagging. Applying heuristics of ambiguity priority, NE extraction is performed first on an original text using a set of patterns, in the order of pattern ambiguity. Next, the remaining text is segmented into words with a dictionary. The obtained chunks are then tagged with types of named entities or parts-of-speech (PoS) using dictionaries, patterns and statistics. Focusing on the reduction of human intervention in corpus construction, our experimental results show that the dictionary-based tagging process can assign unique tags to 64.92% of the words, with the remaining of 24.14% unknown words and 10.94% ambiguously tagged words. Later, the pattern-based tagging can reduce unknown words to only 13.34% while the statistical-based tagging can solve the ambiguously tagged words to only 3.01%.

  • A Novel Framework for Extracting Visual Feature-Based Keyword Relationships from an Image Database

    Marie KATSURAI  Takahiro OGAWA  Miki HASEYAMA  


    E95-A No:5

    In this paper, a novel framework for extracting visual feature-based keyword relationships from an image database is proposed. From the characteristic that a set of relevant keywords tends to have common visual features, the keyword relationships in a target image database are extracted by using the following two steps. First, the relationship between each keyword and its corresponding visual features is modeled by using a classifier. This step enables detection of visual features related to each keyword. In the second step, the keyword relationships are extracted from the obtained results. Specifically, in order to measure the relevance between two keywords, the proposed method removes visual features related to one keyword from training images and monitors the performance of the classifier obtained for the other keyword. This measurement is the biggest difference from other conventional methods that focus on only keyword co-occurrences or visual similarities. Results of experiments conducted using an image database showed the effectiveness of the proposed method.

  • Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar

    Yoshihide KATO  Shigeki MATSUBARA  

    LETTER-Natural Language Processing

    E93-D No:9

    This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.

  • Novel Confidence Feature Extraction Algorithm Based on Latent Topic Similarity

    Wei CHEN  Gang LIU  Jun GUO  Shinichiro OMACHI  Masako OMACHI  Yujing GUO  

    PAPER-Speech and Hearing

    E93-D No:8

    In speech recognition, confidence annotation adopts a single confidence feature or a combination of different features for classification. These confidence features are always extracted from decoding information. However, it is proved that about 30% of knowledge of human speech understanding is mainly derived from high-level information. Thus, how to extract a high-level confidence feature statistically independent of decoding information is worth researching in speech recognition. In this paper, a novel confidence feature extraction algorithm based on latent topic similarity is proposed. Each word topic distribution and context topic distribution in one recognition result is firstly obtained using the latent Dirichlet allocation (LDA) topic model, and then, the proposed word confidence feature is extracted by determining the similarities between these two topic distributions. The experiments show that the proposed feature increases the number of information sources of confidence features with a good information complementary effect and can effectively improve the performance of confidence annotation combined with confidence features from decoding information.

  • Tag-Annotated Text Search Using Extended Region Algebra

    Katsuya MASUDA  Jun'ichi TSUJII  

    PAPER-Information Retrieval

    E92-D No:12

    This paper presents algorithms for searching text regions with specifying annotated information in tag-annotated text by using Region Algebra. The original algebra and its efficient algorithms are extended to handle both nested regions and crossed regions. The extensions are necessary for text search by using rich linguistic annotations. We first assign a depth number to every nested tag region to order these regions and write efficient algorithms using the depth number for the containment operations which can treat nested tag regions. Next, we introduce variables for attribute values of tags into the algebra to treat annotations in which attributes indicate another tag regions, and propose an efficient method of treating re-entrancy by incrementally determining values for variables. Our algorithms have been implemented in a text search engine for MEDLINE, which is a large textbase of abstracts in medical science. Experiments in tag-annotated MEDLINE abstracts demonstrate the effectiveness of specifying annotations and the efficiency of our algorithms. The system is made publicly accessible at

  • MR-MIL: Manifold Ranking Based Multiple-Instance Learning for Automatic Image Annotation

    Yufeng ZHAO  Yao ZHAO  Zhenfeng ZHU  Jeng-Shyang PAN  


    E91-A No:10

    A novel automatic image annotation (AIA) scheme is proposed based on multiple-instance learning (MIL). For a given concept, manifold ranking (MR) is first employed to MIL (referred as MR-MIL) for effectively mining the positive instances (i.e. regions in images) embedded in the positive bags (i.e. images). With the mined positive instances, the semantic model of the concept is built by the probabilistic output of SVM classifier. The experimental results reveal that high annotation accuracy can be achieved at region-level.

  • "Web-Com": Interactive Browser for Web-Based Education

    Kazuki HIRAKI  Tatsuhiro YONEKURA  Susumu SHIBUSAWA  


    E88-D No:5

    We developed a Web-based education system called "Web-Com". It supports synchronous and asynchronous learning. It consists of an interactive web browser and voice server. Web-Com provides a multi-layer drawable canvas on which the user can draw annotations. Each layer can be shared with other users in real-time via the Internet to enable synchronous learning. In conjunction with the voice server, Web-Com can support voice communication. It can also replay the process of annotation in order, which enables asynchronous learning. Finally, a subject experiment is conducted to evaluate the scheme's workability and explore various issues that arise during the course of learning. The experimental results show that learners can learn fairly interactively with an instructor in a Web-Based class using Web-Com's synchronous style.

  • Efficient Web Browsing with Semantic Annotation: A Case Study of Product Images in E-Commerce Sites

    Jason J. JUNG  Kee-Sung LEE  Seung-Bo PARK  Geun-Sik JO  


    E88-D No:5

    Web browsing task is based on depth-first searching scheme, so that searching relevant information from Web may be very tedious. In this paper, we propose personal browsing assistant system based on user intentions modeling. Before explicitly requested by a user, this system can analyze the prefetched resources from the hyperlinked Webpages and compare them with the estimated user intention, so that it can help him to make a better decision like which Webpage should be requested next. More important problem is the semantic heterogeneity between Web spaces. It makes the understandability of locally annotated resources more difficult. We apply semantic annotation, which is a transcoding procedure with the global ontology. Therefore, each local metadata can be semantically enriched, and efficiently comparable. As testing bed of our experiment, we organized three different online clothes stores whose images are annotated by semantically heterogeneous metadata. We simulated virtual customers navigating these cyberspaces. According to the predefined preferences of customer models, they conducted comparison-shopping. We have shown the reasonability of supporting the Web browsing, and its performance was evaluated as measuring the total size of browsed hyperspace.

  • Automatic Measurement of Pressed/Breathy Phonation at Acoustic Centres of Reliability in Continuous Speech

    Parham MOKHTARI  Nick CAMPBELL  

    PAPER-Speech Synthesis and Prosody

    E86-D No:3

    With the aim of enabling concatenative synthesis of expressive speech, we herein report progress towards developing robust and automatic algorithms for paralinguistic annotation of very large recorded-speech corpora. In particular, we describe a method of combining robust acoustic-prosodic and cepstral analyses to locate centres of acoustic-phonetic reliability in the speech stream, wherein physiologically meaningful parameters related to voice quality can be estimated more reliably. We then report some evaluations of a specific voice-quality parameter known as the glottal Amplitude Quotient (AQ), which was proposed in [2],[6] and is here measured automatically at centres of reliability in continuous speech. Analyses of a large, single-speaker corpus of emotional speech first validate the perceptual importance of the AQ parameter in quantifying the mode of phonation along the pressed-modal-breathy continuum, then reveal some of its phonetic, prosodic, and paralinguistic dependencies.

  • Organization and Retrieval of Video Data

    Katsumi TANAKA  Yasuo ARIKI  Kuniaki UEHARA  


    E82-D No:1

    This paper focuses on the problems how to organize and retrieve video data in an effective manner. First we identify several issues to be solved for the problems. Next, we overview our current research results together with a brief survey in the research area of video databases. We especially describe the following research results obtained by the the Japanese Ministry of Education under Grant-in-Aid for Scientific Research on Priority Area: "Advanced Databases" concerned with organization and retrieval of video data: Instance-Based Video Annotation Models, Self-Organization of Video Data, and A Query Model for Fragmentally Indexed Video.

  • A Cooperation Method via Metaphor of Explanation

    Tetsuya YOSHIDA  Koichi HORI  Shinichi NAKASUKA  


    E81-A No:4

    This paper proposes a new method to improve cooperation in concurrent systems within the framework of Multi-Agent Systems (MAS). Since subsystems work concurrently, achieving appropriate cooperation among them is important to improve the effectiveness of the overall system. When subsystems are modeled as agents, it is easy to explicitly deal with the interactions among them since they can be modeled naturally as communication among agents with intended information. Contrary to previous approaches which provided the syntax of communication protocols without semantics, we focus on the semantics of cooperation in MAS and aim at allowing agents to exploit the communicated information for cooperation. This is attempted by utilizing more coarse-grained communication based on the different perspective for the balance between formality and richness of communication contents so that each piece of communication contents can convey more meaningful information in application domains. In our approach agents cooperate each other by giving feedbacks based on the metaphor of explanation which is widely used in human interactions, in contrast to previous approaches which use direct orders given by the leader based on the pre-defined cooperation strategies. Agents show the difference between the proposal and counter-proposals for it, which are constructed with respect to the former and given as the feedbacks in the easily understandable terms for the receiver. From the comparison of proposals agents retrieve the information on which parts are agreed and disagreed by the relevant agents, and reflect the analysis in their following behavior. Furthermore, communication contents are annotated by agents to indicate the degree of importance in decision making for them, which contributes to making explanations or feedbacks more understandable. Our cooperation method was examined through experiments on the design of micro satellites and the result showed that it was effective to some extent to facilitate cooperation among agents.