The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] topic(46hit)

21-40hit(46hit)

  • Topic-Based Knowledge Transfer Algorithm for Cross-View Action Recognition

    Changhong CHEN  Shunqing YANG  Zongliang GAN  

     
    LETTER-Pattern Recognition

      Vol:
    E97-D No:3
      Page(s):
    614-617

    Cross-view action recognition is a challenging research field for human motion analysis. Appearance-based features are not credible if the viewpoint changes. In this paper, a new framework is proposed for cross-view action recognition by topic based knowledge transfer. First, Spatio-temporal descriptors are extracted from the action videos and each video is modeled by a bag of visual words (BoVW) based on the codebook constructed by the k-means cluster algorithm. Second, Latent Dirichlet Allocation (LDA) is employed to assign topics for the BoVW representation. The topic distribution of visual words (ToVW) is normalized and taken to be the feature vector. Third, in order to bridge different views, we transform ToVW into bilingual ToVW by constructing bilingual dictionaries, which guarantee that the same action has the same representation from different views. We demonstrate the effectiveness of the proposed algorithm on the IXMAS multi-view dataset.

  • Confidence Measure Based on Context Consistency Using Word Occurrence Probability and Topic Adaptation for Spoken Term Detection

    Haiyang LI  Tieran ZHENG  Guibin ZHENG  Jiqing HAN  

     
    PAPER-Speech and Hearing

      Vol:
    E97-D No:3
      Page(s):
    554-561

    In this paper, we propose a novel confidence measure to improve the performance of spoken term detection (STD). The proposed confidence measure is based on the context consistency between a hypothesized word and its context in a word lattice. The main contribution of this paper is to compute the context consistency by considering the uncertainty in the results of speech recognition and the effect of topic. To measure the uncertainty of the context, we employ the word occurrence probability, which is obtained through combining the overlapping hypotheses in a word posterior lattice. To handle the effect of topic, we propose a method of topic adaptation. The adaptation method firstly classifies the spoken document according to the topics and then computes the context consistency of the hypothesized word with the topic-specific measure of semantic similarity. Additionally, we apply the topic-specific measure of semantic similarity by two means, and they are performed respectively with the information of the top-1 topic and the mixture of all topics according to topic classification. The experiments conducted on the Hub-4NE Mandarin database show that both the occurrence probability of context word and the topic adaptation are effective for the confidence measure of STD. The proposed confidence measure performs better compared with the one ignoring the uncertainty of the context or the one using a non-topic method.

  • Online High-Quality Topic Detection for Bulletin Board Systems

    Jungang XU  Hui LI  Yan ZHAO  Ben HE  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E97-D No:2
      Page(s):
    255-265

    Even with the recent development of new types of social networking services such as microblogs, Bulletin Board Systems (BBS) remains popular for local communities and vertical discussions. These BBS sites have high volume of traffic everyday with user discussions on a variety of topics. Therefore it is difficult for BBS visitors to find the posts that they are interested in from the large amount of discussion threads. We attempt to explore several main characteristics of BBS, including organizational flexibility of BBS texts, high data volume and aging characteristic of BBS topics. Based on these characteristics, we propose a novel method of Online Topic Detection (OTD) on BBS, which mainly includes a representative post selection procedure based on Markov chain model and an efficient topic clustering algorithm with candidate topic set generation based on Aging Theory. Experimental results show that our method improves the performance of OTD in BBS environment in both detection accuracy and time efficiency. In addition, analysis on the aging characteristic of discussion topics shows that the generation and aging of topics on BBS is very fast, so it is wise to introduce candidate topic set generation strategy based on Aging Theory into the topic clustering algorithm.

  • Characterizing Web APIs Combining Supervised Topic Model with Ontology

    Yuanbin HAN  Shizhan CHEN  Zhiyong FENG  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E96-D No:7
      Page(s):
    1548-1551

    This paper presents a novel topic modeling (TM) approach for discovering meaningful topics for Web APIs, which is a potential dimensionality reduction way for efficient and effective classification, retrieval, organization, and management of numerous APIs. We exploit the possibility of conducting TM on multi-labeled APIs by combining a supervised TM (known as Labeled LDA) with ontology. Experiments conducting on real-world API data set show that the proposed method outperforms standard Labeled LDA with an average gain of 7.0% in measuring quality of the generated topics. In addition, we also evaluate the similarity matching between topics generated by our method and standard Labeled LDA, which demonstrates the significance of incorporating ontology.

  • MPI/OpenMP Hybrid Parallel Inference Methods for Latent Dirichlet Allocation – Approximation and Evaluation

    Shotaro TORA  Koji EGUCHI  

     
    PAPER-Advanced Search

      Vol:
    E96-D No:5
      Page(s):
    1006-1015

    Recently, probabilistic topic models have been applied to various types of data, including text, and their effectiveness has been demonstrated. Latent Dirichlet allocation (LDA) is a well known topic model. Variational Bayesian inference or collapsed Gibbs sampling is often used to estimate parameters in LDA; however, these inference methods incur high computational cost for large-scale data. Therefore, highly efficient technology is needed for this purpose. We use parallel computation technology for efficient collapsed Gibbs sampling inference for LDA. We assume a symmetric multiprocessing (SMP) cluster, which has been widely used in recent years. In prior work on parallel inference for LDA, either MPI or OpenMP has often been used alone. For an SMP cluster, however, it is more suitable to adopt hybrid parallelization that uses message passing for communication between SMP nodes and loop directives for parallelization within each SMP node. We developed an MPI/OpenMP hybrid parallel inference method for LDA, and evaluated the performance of the inference under various settings of an SMP cluster. We further investigated the approximation that controls the inter-node communications, and found out that it achieved noticeable increase in inference speed while maintaining inference accuracy.

  • Automatic Topic Identification for Idea Summarization in Idea Visualization Programs

    Kobkrit VIRIYAYUDHAKORN  Susumu KUNIFUJI  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E96-D No:1
      Page(s):
    64-72

    Recent idea visualization programs still lack automatic idea summarization capabilities. This paper presents a knowledge-based method for automatically providing a short piece of English text about a topic to each idea group in idea charts. This automatic topic identification makes used Yet Another General Ontology (YAGO) and Wordnet as its knowledge bases. We propose a novel topic selection method and we compared its performance with three existing methods using two experimental datasets constructed using two idea visualization programs, i.e., the KJ Method (Kawakita Jiro Method) and mind-mapping programs. Our proposed topic identification method outperformed the baseline method in terms of both performance and consistency.

  • Topic Extraction for Documents Based on Compressibility Vector

    Nuo ZHANG  Toshinori WATANABE  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E95-D No:10
      Page(s):
    2438-2446

    Nowadays, there are a great deal of e-documents being accessed on the Internet. It would be helpful if those documents and significant extract contents could be automatically analyzed. Similarity analysis and topic extraction are widely used as document relation analysis techniques. Most of the methods being proposed need some processes such as stemming, stop words removal, and etc. In those methods, natural language processing (NLP) technology is necessary and hence they are dependent on the language feature and the dataset. In this study, we propose novel document relation analysis and topic extraction methods based on text compression. Our proposed approaches do not require NLP, and can also automatically evaluate documents. We challenge our proposal with model documents, URCS and Reuters-21578 dataset, for relation analysis and topic extraction. The effectiveness of the proposed methods is shown by the simulations.

  • Efficient Tracking of News Topics Based on Chronological Semantic Structures in a Large-Scale News Video Archive

    Ichiro IDE  Tomoyoshi KINOSHITA  Tomokazu TAKAHASHI  Hiroshi MO  Norio KATAYAMA  Shin'ichi SATOH  Hiroshi MURASE  

     
    PAPER-Video Processing

      Vol:
    E95-D No:5
      Page(s):
    1288-1300

    Recent advance in digital storage technology has enabled us to archive a large volume of video data. Thanks to this trend, we have archived more than 1,800 hours of video data from a daily Japanese news show in the last ten years. When considering the effective use of such a large news video archive, we assumed that analysis of its chronological and semantic structure becomes important. We also consider that providing the users with the development of news topics is more important to help their understanding of current affairs, rather than providing a list of relevant news stories as in most of the current news video retrieval systems. Therefore, in this paper, we propose a structuring method for a news video archive, together with an interface that visualizes the structure, so that users could track the development of news topics according to their interest, efficiently. The proposed news video structure, namely the “topic thread structure”, is obtained as a result of an analysis of the chronological and semantic relation between news stories. Meanwhile, the proposed interface, namely “mediaWalker II”, allows users to track the development of news topics along the topic thread structure, and at the same time watch the video footage corresponding to each news story. Analyses on the topic thread structures obtained by applying the proposed method to actual news video footages revealed interesting and comprehensible relations between news topics in the real world. At the same time, analyses on their size quantified the efficiency of tracking a user's topic-of-interest based on the proposed topic thread structure. We consider this as a first step towards facilitating video authoring by users based on existing contents in a large-scale news video archive.

  • Spoken Document Retrieval Leveraging Unsupervised and Supervised Topic Modeling Techniques

    Kuan-Yu CHEN  Hsin-Min WANG  Berlin CHEN  

     
    PAPER-Speech Processing

      Vol:
    E95-D No:5
      Page(s):
    1195-1205

    This paper describes the application of two attractive categories of topic modeling techniques to the problem of spoken document retrieval (SDR), viz. document topic model (DTM) and word topic model (WTM). Apart from using the conventional unsupervised training strategy, we explore a supervised training strategy for estimating these topic models, imagining a scenario that user query logs along with click-through information of relevant documents can be utilized to build an SDR system. This attempt has the potential to associate relevant documents with queries even if they do not share any of the query words, thereby improving on retrieval quality over the baseline system. Likewise, we also study a novel use of pseudo-supervised training to associate relevant documents with queries through a pseudo-feedback procedure. Moreover, in order to lessen SDR performance degradation caused by imperfect speech recognition, we investigate leveraging different levels of index features for topic modeling, including words, syllable-level units, and their combination. We provide a series of experiments conducted on the TDT (TDT-2 and TDT-3) Chinese SDR collections. The empirical results show that the methods deduced from our proposed modeling framework are very effective when compared with a few existing retrieval approaches.

  • Enhancing Digital Book Clustering by LDAC Model

    Lidong WANG  Yuan JIE  

     
    PAPER

      Vol:
    E95-D No:4
      Page(s):
    982-988

    In Digital Library (DL) applications, digital book clustering is an important and urgent research task. However, it is difficult to conduct effectively because of the great length of digital books. To do the correct clustering for digital books, a novel method based on probabilistic topic model is proposed. Firstly, we build a topic model named LDAC. The main goal of LDAC topic modeling is to effectively extract topics from digital books. Subsequently, Gibbs sampling is applied for parameter inference. Once the model parameters are learned, each book is assigned to the cluster which maximizes the posterior probability. Experimental results demonstrate that our approach based on LDAC is able to achieve significant improvement as compared to the related methods.

  • Visual Knowledge Structure Reasoning with Intelligent Topic Map

    Huimin LU  Boqin FENG  Xi CHEN  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E93-D No:10
      Page(s):
    2805-2812

    This paper presents a visual knowledge structure reasoning method using Intelligent Topic Map which extends the conventional Topic Map in structure and enhances its reasoning functions. Visual knowledge structure reasoning method integrates two types of knowledge reasoning: the knowledge logical relation reasoning and the knowledge structure reasoning. The knowledge logical relation reasoning implements knowledge consistency checking and the implicit associations reasoning between knowledge points. We propose a Knowledge Unit Circle Search strategy for the knowledge structure reasoning. It implements the semantic implication extension, the semantic relevant extension and the semantic class belonging confirmation. Moreover, the knowledge structure reasoning results are visualized using ITM Toolkit. A prototype system of visual knowledge structure reasoning has been implemented and applied to the massive knowledge organization, management and service for education.

  • Novel Confidence Feature Extraction Algorithm Based on Latent Topic Similarity

    Wei CHEN  Gang LIU  Jun GUO  Shinichiro OMACHI  Masako OMACHI  Yujing GUO  

     
    PAPER-Speech and Hearing

      Vol:
    E93-D No:8
      Page(s):
    2243-2251

    In speech recognition, confidence annotation adopts a single confidence feature or a combination of different features for classification. These confidence features are always extracted from decoding information. However, it is proved that about 30% of knowledge of human speech understanding is mainly derived from high-level information. Thus, how to extract a high-level confidence feature statistically independent of decoding information is worth researching in speech recognition. In this paper, a novel confidence feature extraction algorithm based on latent topic similarity is proposed. Each word topic distribution and context topic distribution in one recognition result is firstly obtained using the latent Dirichlet allocation (LDA) topic model, and then, the proposed word confidence feature is extracted by determining the similarities between these two topic distributions. The experiments show that the proposed feature increases the number of information sources of confidence features with a good information complementary effect and can effectively improve the performance of confidence annotation combined with confidence features from decoding information.

  • A Topic-Independent Method for Scoring Student Essay Content

    Ryo NAGATA  Jun-ichi KAKEGAWA  Yukiko YABUTA  

     
    PAPER-Educational Technology

      Vol:
    E93-D No:2
      Page(s):
    335-340

    This paper proposes a topic-independent method for automatically scoring essay content. Unlike conventional topic-dependent methods, it predicts the human-assigned score of a given essay without training essays written to the same topic as the target essay. To achieve this, this paper introduces a new measure called MIDF that measures how important and relevant a word is in a given essay. The proposed method predicts the score relying on the distribution of MIDF. Surprisingly, experiments show that the proposed method achieves an accuracy of 0.848 and performs as well as or even better than conventional topic-dependent methods.

  • Novel Topic Maps to RDF/RDF Schema Translation Method

    Shinae SHIN  Dongwon JEONG  Doo-Kwon BAIK  

     
    PAPER-Knowledge Representation

      Vol:
    E91-D No:11
      Page(s):
    2626-2637

    We propose an enhanced method for translating Topic Maps to RDF/RDF Schema, to realize the Semantic Web. A critical issue for the Semantic Web is to efficiently and precisely describe Web information resources, i.e., Web metadata. Two representative standards, Topic Maps and RDF have been used for Web metadata. RDF-based standardization and implementation of the Semantic Web have been actively performed. Since the Semantic Web must accept and understand all Web information resources that are represented with the other methods, Topic Maps-to-RDF translation has become an issue. Even though many Topic Maps to RDF translation methods have been devised, they still have several problems (e.g. semantic loss, complex expression, etc.). Our translation method provides an improved solution to these problems. This method shows lower semantic loss than the previous methods due to extract both explicit semantics and implicit semantics. Compared to the previous methods, our method reduces the encoding complexity of resulting RDF. In addition, in terms of reversibility, the proposed method regenerates all Topic Maps constructs in an original source when is reverse translated.

  • Entity Network Prediction Using Multitype Topic Models

    Hitohiro SHIOZAKI  Koji EGUCHI  Takenao OHKAWA  

     
    PAPER-Knowledge Discovery and Data Mining

      Vol:
    E91-D No:11
      Page(s):
    2589-2598

    Conveying information about who, what, when and where is a primary purpose of some genres of documents, typically news articles. Statistical models that capture dependencies between named entities and topics can play an important role in handling such information. Although some relationships between who and where should be mentioned in such a document, no statistical topic models explicitly address the textual interactions between a who-entity and a where-entity. This paper presents a statistical model that directly captures the dependencies between an arbitrary number of word types, such as who-entities, where-entities and topics, mentioned in each document. We show that this multitype topic model performs better at making predictions on entity networks, in which each vertex represents an entity and each edge weight represents how a pair of entities at the incident vertices is closely related, through our experiments on predictions of who-entities and links between them. We also demonstrate the scale-free property in the weighted networks of entities extracted from written mentions.

  • 3D Virtual Environment Navigation Aid Techniques for Novice Users Using Topic Map

    Hak-Keun KIM  Teuk-Seob SONG  Yoon-Chul CHOY  Soon-Bum LIM  

     
    PAPER-Fundamentals of Software and Theory of Programs

      Vol:
    E89-D No:8
      Page(s):
    2411-2419

    3D virtual environment provides a limited amount of information, mainly focusing on visual information. This is the main cause of users losing the sense of direction in the environment. Many researches for developing a navigation tools that address this problem have been carried out. In this study, a navigation tool is designed by applying topic map, one of the technologies for semantic web construction, to a 3D virtual environment. Topic map constructs a semantic link map by defining the connection relation between topics. According to an experiment done to evaluate the proposed navigation tool, the tool was more helpful in finding detailed object than highly represented objects. Also, it could be seen that providing the surrounding knowledge is effective for object selection by users when that target for searching is not defined.

  • Using Topic Keyword Clusters for Automatic Document Clustering

    Hsi-Cheng CHANG  Chiun-Chieh HSU  

     
    PAPER-Document Clustering

      Vol:
    E88-D No:8
      Page(s):
    1852-1860

    Data clustering is a technique for grouping similar data items together for convenient understanding. Conventional data clustering methods, including agglomerative hierarchical clustering and partitional clustering algorithms, frequently perform unsatisfactorily for large text collections, since the computation complexities of the conventional data clustering methods increase very quickly with the number of data items. Poor clustering results degrade intelligent applications such as event tracking and information extraction. This paper presents an unsupervised document clustering method which identifies topic keyword clusters of the text corpus. The proposed method adopts a multi-stage process. First, an aggressive data cleaning approach is employed to reduce the noise in the free text and further identify the topic keywords in the documents. All extracted keywords are then grouped into topic keyword clusters using the k-nearest neighbor approach and the keyword clustering technique. Finally, all documents in the corpus are clustered based on the topic keyword clusters. The proposed method is assessed against conventional data clustering methods on a web news corpus. The experimental results show that the proposed method is an efficient and effective clustering approach.

  • Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching

    Ian R. LANE  Tatsuya KAWAHARA  Tomoko MATSUI  Satoshi NAKAMURA  

     
    PAPER-Spoken Language Systems

      Vol:
    E88-D No:3
      Page(s):
    446-454

    An efficient, scalable speech recognition architecture combining topic detection and topic-dependent language modeling is proposed for multi-domain spoken language systems. In the proposed approach, the inferred topic is automatically detected from the user's utterance, and speech recognition is then performed by applying an appropriate topic-dependent language model. This approach enables users to freely switch between domains while maintaining high recognition accuracy. As topic detection is performed on a single utterance, detection errors may occur and propagate through the system. To improve robustness, a hierarchical back-off mechanism is introduced where detailed topic models are applied when topic detection is confident and wider models that cover multiple topics are applied in cases of uncertainty. The performance of the proposed architecture is evaluated when combined with two topic detection methods: unigram likelihood and SVMs (Support Vector Machines). On the ATR Basic Travel Expression Corpus, both methods provide a significant reduction in WER (9.7% and 10.3%, respectively) compared to a single language model system. Furthermore, recognition accuracy is comparable to performing decoding with all topic-dependent models in parallel, while the required computational cost is much reduced.

  • Language Model Adaptation Based on PLSA of Topics and Speakers for Automatic Transcription of Panel Discussions

    Yuya AKITA  Tatsuya KAWAHARA  

     
    PAPER-Spoken Language Systems

      Vol:
    E88-D No:3
      Page(s):
    439-445

    Appropriate language modeling is one of the major issues for automatic transcription of spontaneous speech. We propose an adaptation method for statistical language models based on both topic and speaker characteristics. This approach is applied for automatic transcription of meetings and panel discussions, in which multiple participants speak on a given topic in their own speaking style. A baseline language model is a mixture of two models, which are trained with different corpora covering various topics and speakers, respectively. Then, probabilistic latent semantic analysis (PLSA) is performed on the same respective corpora and the initial ASR result to provide two sets of unigram probabilities conditioned on input speech, with regard to topics and speaker characteristics, respectively. Finally, the baseline model is adapted by scaling N-gram probabilities with these unigram probabilities. For speaker adaptation purpose, we make use of a portion of the Corpus of Spontaneous Japanese (CSJ) in which a large number of speakers gave talks for given topics. Experimental evaluation with real discussions showed that both topic and speaker adaptation reduced test-set perplexity, and in total, an average reduction rate of 8.5% was obtained. Furthermore, improvement on word accuracy was also achieved by the proposed adaptation method.

  • Topic Keyword Identification for Text Summarization Using Lexical Clustering

    Youngjoong KO  Kono KIM  Jungyun SEO  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1695-1701

    Automatic text summarization has the goal of reducing the size of a document while preserving its content. Generally, producing a summary as extracts is achieved by including only sentences which are the most topic-related. DOCUSUM is our summarization system based on a new topic keyword identification method. The process of DOCUSUM is as follows. First, DOCUSUM converts the content words of a document into elements of a context vector space. It then constructs lexical clusters from the context vector space and identifies core clusters. Next, it selects topic keywords from the core clusters. Finally, it generates a summary of the document using the topic keywords. In the experiments on various compression ratios (the compression of 30%, the compression of 10%, and the extraction of the fixed number of sentences: 4 or 8 sentences), DOCUSUM showed better performance than other methods.

21-40hit(46hit)