The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Koji Eguchi(10hit)

  • Multimedia Topic Models Considering Burstiness of Local Features Open Access

    Yang XIE  Koji EGUCHI  


    E97-D No:4

    A number of studies have been conducted on topic modeling for various types of data, including text and image data. We focus particularly on the burstiness of the local features in modeling topics within video data in this paper. Burstiness is a phenomenon that is often discussed for text data. The idea is that if a word is used once in a document, it is more likely to be used again within the document. It is also observed in video data; for example, an object or visual word in video data is more likely to appear repeatedly within the same video data. Based on the idea mentioned above, we propose a new topic model, the Correspondence Dirichlet Compound Multinomial LDA (Corr-DCMLDA), which takes into account the burstiness of the local features in video data. The unknown parameters and latent variables in the model are estimated by conducting a collapsed Gibbs sampling and the hyperparameters are estimated by focusing on the fixed-point iterations. We demonstrate through experimentation on the genre classification of social video data that our model works more effectively than several baselines.

  • Sequential Bayesian Nonparametric Multimodal Topic Models for Video Data Analysis

    Jianfei XUE  Koji EGUCHI  


    E101-D No:4

    Topic modeling as a well-known method is widely applied for not only text data mining but also multimedia data analysis such as video data analysis. However, existing models cannot adequately handle time dependency and multimodal data modeling for video data that generally contain image information and speech information. In this paper, we therefore propose a novel topic model, sequential symmetric correspondence hierarchical Dirichlet processes (Seq-Sym-cHDP) extended from sequential conditionally independent hierarchical Dirichlet processes (Seq-CI-HDP) and sequential correspondence hierarchical Dirichlet processes (Seq-cHDP), to improve the multimodal data modeling mechanism via controlling the pivot assignments with a latent variable. An inference scheme for Seq-Sym-cHDP based on a posterior representation sampler is also developed in this work. We finally demonstrate that our model outperforms other baseline models via experiments.

  • Hybrid Parallel Inference for Hierarchical Dirichlet Processes Open Access

    Tsukasa OMOTO  Koji EGUCHI  Shotaro TORA  


    E97-D No:4

    The hierarchical Dirichlet process (HDP) can provide a nonparametric prior for a mixture model with grouped data, where mixture components are shared across groups. However, the computational cost is generally very high in terms of both time and space complexity. Therefore, developing a method for fast inference of HDP remains a challenge. In this paper, we assume a symmetric multiprocessing (SMP) cluster, which has been widely used in recent years. To speed up the inference on an SMP cluster, we explore hybrid two-level parallelization of the Chinese restaurant franchise sampling scheme for HDP, especially focusing on the application to topic modeling. The methods we developed, Hybrid-AD-HDP and Hybrid-Diff-AD-HDP, make better use of SMP clusters, resulting in faster HDP inference. While the conventional parallel algorithms with a full message-passing interface does not benefit from using SMP clusters due to higher communication costs, the proposed hybrid parallel algorithms have lower communication costs and make better use of the computational resources.

  • MPI/OpenMP Hybrid Parallel Inference Methods for Latent Dirichlet Allocation – Approximation and Evaluation

    Shotaro TORA  Koji EGUCHI  

    PAPER-Advanced Search

    E96-D No:5

    Recently, probabilistic topic models have been applied to various types of data, including text, and their effectiveness has been demonstrated. Latent Dirichlet allocation (LDA) is a well known topic model. Variational Bayesian inference or collapsed Gibbs sampling is often used to estimate parameters in LDA; however, these inference methods incur high computational cost for large-scale data. Therefore, highly efficient technology is needed for this purpose. We use parallel computation technology for efficient collapsed Gibbs sampling inference for LDA. We assume a symmetric multiprocessing (SMP) cluster, which has been widely used in recent years. In prior work on parallel inference for LDA, either MPI or OpenMP has often been used alone. For an SMP cluster, however, it is more suitable to adopt hybrid parallelization that uses message passing for communication between SMP nodes and loop directives for parallelization within each SMP node. We developed an MPI/OpenMP hybrid parallel inference method for LDA, and evaluated the performance of the inference under various settings of an SMP cluster. We further investigated the approximation that controls the inter-node communications, and found out that it achieved noticeable increase in inference speed while maintaining inference accuracy.

  • Evaluation Methods for Web Retrieval Tasks Considering Hyperlink Structure

    Koji EGUCHI  Keizo OYAMA  Emi ISHIDA  Noriko KANDO  Kazuko KURIYAMA  


    E86-D No:9

    This paper proposes the evaluation methods for measuring retrieval effectiveness of Web search engine systems, attempting to make them suitable for real Web environment. With this objective, we constructed 100-gigabyte and 10-gigabyte document sets that were mainly gathered from the '.jp' domain, and conducted an evaluation workshop at the third NTCIR Workshop from 2001 to 2002, where we assessed the retrieval effectiveness of a certain number of Web search engine systems using the common data set. Conventional evaluation workshops assessed the relevance of the retrieved documents, which were submitted by the workshop participants, by considering the contents of individual pages. On the other hand, we assessed the relevance of the retrieved pages by considering the relationship between the pages referenced by hyperlinks.

  • Video Data Modeling Using Sequential Correspondence Hierarchical Dirichlet Processes

    Jianfei XUE  Koji EGUCHI  


    E100-D No:1

    Video data mining based on topic models as an emerging technique recently has become a very popular research topic. In this paper, we present a novel topic model named sequential correspondence hierarchical Dirichlet processes (Seq-cHDP) to learn the hidden structure within video data. The Seq-cHDP model can be deemed as an extended hierarchical Dirichlet processes (HDP) model containing two important features: one is the time-dependency mechanism that connects neighboring video frames on the basis of a time dependent Markovian assumption, and the other is the correspondence mechanism that provides a solution for dealing with the multimodal data such as the mixture of visual words and speech words extracted from video files. A cascaded Gibbs sampling method is applied for implementing the inference task of Seq-cHDP. We present a comprehensive evaluation for Seq-cHDP through experimentation and finally demonstrate that Seq-cHDP outperforms other baseline models.

  • Entity Network Prediction Using Multitype Topic Models

    Hitohiro SHIOZAKI  Koji EGUCHI  Takenao OHKAWA  

    PAPER-Knowledge Discovery and Data Mining

    E91-D No:11

    Conveying information about who, what, when and where is a primary purpose of some genres of documents, typically news articles. Statistical models that capture dependencies between named entities and topics can play an important role in handling such information. Although some relationships between who and where should be mentioned in such a document, no statistical topic models explicitly address the textual interactions between a who-entity and a where-entity. This paper presents a statistical model that directly captures the dependencies between an arbitrary number of word types, such as who-entities, where-entities and topics, mentioned in each document. We show that this multitype topic model performs better at making predictions on entity networks, in which each vertex represents an entity and each edge weight represents how a pair of entities at the incident vertices is closely related, through our experiments on predictions of who-entities and links between them. We also demonstrate the scale-free property in the weighted networks of entities extracted from written mentions.

  • FOREWORD Open Access

    Koji Eguchi  


    E102-D No:4
  • Online Inference of Mixed Membership Stochastic Blockmodels for Network Data Streams Open Access

    Tomoki KOBAYASHI  Koji EGUCHI  


    E97-D No:4

    Many kinds of data can be represented as a network or graph. It is crucial to infer the latent structure underlying such a network and to predict unobserved links in the network. Mixed Membership Stochastic Blockmodel (MMSB) is a promising model for network data. Latent variables and unknown parameters in MMSB have been estimated through Bayesian inference with the entire network; however, it is important to estimate them online for evolving networks. In this paper, we first develop online inference methods for MMSB through sequential Monte Carlo methods, also known as particle filters. We then extend them for time-evolving networks, taking into account the temporal dependency of the network structure. We demonstrate through experiments that the time-dependent particle filter outperformed several baselines in terms of prediction performance in an online condition.

  • Relation Prediction in Multilingual Data Based on Multimodal Relational Topic Models

    Yosuke SAKATA  Koji EGUCHI  


    E100-D No:4

    There are increasing demands for improved analysis of multimodal data that consist of multiple representations, such as multilingual documents and text-annotated images. One promising approach for analyzing such multimodal data is latent topic models. In this paper, we propose conditionally independent generalized relational topic models (CI-gRTM) for predicting unknown relations across different multiple representations of multimodal data. We developed CI-gRTM as a multimodal extension of discriminative relational topic models called generalized relational topic models (gRTM). We demonstrated through experiments with multilingual documents that CI-gRTM can more effectively predict both multilingual representations and relations between two different language representations compared with several state-of-the-art baseline models that enable to predict either multilingual representations or unimodal relations.