IEICE global.ieice.org Site

Keyword Search Result

[Keyword] topic(46hit)

21-40hit(46hit)

Topic-Based Knowledge Transfer Algorithm for Cross-View Action Recognition
Changhong CHEN Shunqing YANG Zongliang GAN

LETTER-Pattern Recognition

Vol:
E97-D No:3
Page(s):
614-617
Cross-view action recognition is a challenging research field for human motion analysis. Appearance-based features are not credible if the viewpoint changes. In this paper, a new framework is proposed for cross-view action recognition by topic based knowledge transfer. First, Spatio-temporal descriptors are extracted from the action videos and each video is modeled by a bag of visual words (BoVW) based on the codebook constructed by the k-means cluster algorithm. Second, Latent Dirichlet Allocation (LDA) is employed to assign topics for the BoVW representation. The topic distribution of visual words (ToVW) is normalized and taken to be the feature vector. Third, in order to bridge different views, we transform ToVW into bilingual ToVW by constructing bilingual dictionaries, which guarantee that the same action has the same representation from different views. We demonstrate the effectiveness of the proposed algorithm on the IXMAS multi-view dataset.
Confidence Measure Based on Context Consistency Using Word Occurrence Probability and Topic Adaptation for Spoken Term Detection
Haiyang LI Tieran ZHENG Guibin ZHENG Jiqing HAN

PAPER-Speech and Hearing

Vol:
E97-D No:3
Page(s):
554-561
In this paper, we propose a novel confidence measure to improve the performance of spoken term detection (STD). The proposed confidence measure is based on the context consistency between a hypothesized word and its context in a word lattice. The main contribution of this paper is to compute the context consistency by considering the uncertainty in the results of speech recognition and the effect of topic. To measure the uncertainty of the context, we employ the word occurrence probability, which is obtained through combining the overlapping hypotheses in a word posterior lattice. To handle the effect of topic, we propose a method of topic adaptation. The adaptation method firstly classifies the spoken document according to the topics and then computes the context consistency of the hypothesized word with the topic-specific measure of semantic similarity. Additionally, we apply the topic-specific measure of semantic similarity by two means, and they are performed respectively with the information of the top-1 topic and the mixture of all topics according to topic classification. The experiments conducted on the Hub-4NE Mandarin database show that both the occurrence probability of context word and the topic adaptation are effective for the confidence measure of STD. The proposed confidence measure performs better compared with the one ignoring the uncertainty of the context or the one using a non-topic method.
Online High-Quality Topic Detection for Bulletin Board Systems
Jungang XU Hui LI Yan ZHAO Ben HE

PAPER-Artificial Intelligence, Data Mining

Vol:
E97-D No:2
Page(s):
255-265
Even with the recent development of new types of social networking services such as microblogs, Bulletin Board Systems (BBS) remains popular for local communities and vertical discussions. These BBS sites have high volume of traffic everyday with user discussions on a variety of topics. Therefore it is difficult for BBS visitors to find the posts that they are interested in from the large amount of discussion threads. We attempt to explore several main characteristics of BBS, including organizational flexibility of BBS texts, high data volume and aging characteristic of BBS topics. Based on these characteristics, we propose a novel method of Online Topic Detection (OTD) on BBS, which mainly includes a representative post selection procedure based on Markov chain model and an efficient topic clustering algorithm with candidate topic set generation based on Aging Theory. Experimental results show that our method improves the performance of OTD in BBS environment in both detection accuracy and time efficiency. In addition, analysis on the aging characteristic of discussion topics shows that the generation and aging of topics on BBS is very fast, so it is wise to introduce candidate topic set generation strategy based on Aging Theory into the topic clustering algorithm.
Characterizing Web APIs Combining Supervised Topic Model with Ontology
Yuanbin HAN Shizhan CHEN Zhiyong FENG

LETTER-Data Engineering, Web Information Systems

Vol:
E96-D No:7
Page(s):
1548-1551
This paper presents a novel topic modeling (TM) approach for discovering meaningful topics for Web APIs, which is a potential dimensionality reduction way for efficient and effective classification, retrieval, organization, and management of numerous APIs. We exploit the possibility of conducting TM on multi-labeled APIs by combining a supervised TM (known as Labeled LDA) with ontology. Experiments conducting on real-world API data set show that the proposed method outperforms standard Labeled LDA with an average gain of 7.0% in measuring quality of the generated topics. In addition, we also evaluate the similarity matching between topics generated by our method and standard Labeled LDA, which demonstrates the significance of incorporating ontology.
MPI/OpenMP Hybrid Parallel Inference Methods for Latent Dirichlet Allocation – Approximation and Evaluation
Shotaro TORA Koji EGUCHI

PAPER-Advanced Search

Vol:
E96-D No:5
Page(s):
1006-1015
Recently, probabilistic topic models have been applied to various types of data, including text, and their effectiveness has been demonstrated. Latent Dirichlet allocation (LDA) is a well known topic model. Variational Bayesian inference or collapsed Gibbs sampling is often used to estimate parameters in LDA; however, these inference methods incur high computational cost for large-scale data. Therefore, highly efficient technology is needed for this purpose. We use parallel computation technology for efficient collapsed Gibbs sampling inference for LDA. We assume a symmetric multiprocessing (SMP) cluster, which has been widely used in recent years. In prior work on parallel inference for LDA, either MPI or OpenMP has often been used alone. For an SMP cluster, however, it is more suitable to adopt hybrid parallelization that uses message passing for communication between SMP nodes and loop directives for parallelization within each SMP node. We developed an MPI/OpenMP hybrid parallel inference method for LDA, and evaluated the performance of the inference under various settings of an SMP cluster. We further investigated the approximation that controls the inter-node communications, and found out that it achieved noticeable increase in inference speed while maintaining inference accuracy.
Automatic Topic Identification for Idea Summarization in Idea Visualization Programs
Kobkrit VIRIYAYUDHAKORN Susumu KUNIFUJI

PAPER-Artificial Intelligence, Data Mining

Vol:
E96-D No:1
Page(s):
64-72
Recent idea visualization programs still lack automatic idea summarization capabilities. This paper presents a knowledge-based method for automatically providing a short piece of English text about a topic to each idea group in idea charts. This automatic topic identification makes used Yet Another General Ontology (YAGO) and Wordnet as its knowledge bases. We propose a novel topic selection method and we compared its performance with three existing methods using two experimental datasets constructed using two idea visualization programs, i.e., the KJ Method (Kawakita Jiro Method) and mind-mapping programs. Our proposed topic identification method outperformed the baseline method in terms of both performance and consistency.
Topic Extraction for Documents Based on Compressibility Vector
Nuo ZHANG Toshinori WATANABE

PAPER-Artificial Intelligence, Data Mining

Vol:
E95-D No:10
Page(s):
2438-2446
Nowadays, there are a great deal of e-documents being accessed on the Internet. It would be helpful if those documents and significant extract contents could be automatically analyzed. Similarity analysis and topic extraction are widely used as document relation analysis techniques. Most of the methods being proposed need some processes such as stemming, stop words removal, and etc. In those methods, natural language processing (NLP) technology is necessary and hence they are dependent on the language feature and the dataset. In this study, we propose novel document relation analysis and topic extraction methods based on text compression. Our proposed approaches do not require NLP, and can also automatically evaluate documents. We challenge our proposal with model documents, URCS and Reuters-21578 dataset, for relation analysis and topic extraction. The effectiveness of the proposed methods is shown by the simulations.
Efficient Tracking of News Topics Based on Chronological Semantic Structures in a Large-Scale News Video Archive
Ichiro IDE Tomoyoshi KINOSHITA Tomokazu TAKAHASHI Hiroshi MO Norio KATAYAMA Shin'ichi SATOH Hiroshi MURASE

PAPER-Video Processing

Vol:
E95-D No:5
Page(s):
1288-1300
Recent advance in digital storage technology has enabled us to archive a large volume of video data. Thanks to this trend, we have archived more than 1,800 hours of video data from a daily Japanese news show in the last ten years. When considering the effective use of such a large news video archive, we assumed that analysis of its chronological and semantic structure becomes important. We also consider that providing the users with the development of news topics is more important to help their understanding of current affairs, rather than providing a list of relevant news stories as in most of the current news video retrieval systems. Therefore, in this paper, we propose a structuring method for a news video archive, together with an interface that visualizes the structure, so that users could track the development of news topics according to their interest, efficiently. The proposed news video structure, namely the “topic thread structure”, is obtained as a result of an analysis of the chronological and semantic relation between news stories. Meanwhile, the proposed interface, namely “mediaWalker II”, allows users to track the development of news topics along the topic thread structure, and at the same time watch the video footage corresponding to each news story. Analyses on the topic thread structures obtained by applying the proposed method to actual news video footages revealed interesting and comprehensible relations between news topics in the real world. At the same time, analyses on their size quantified the efficiency of tracking a user's topic-of-interest based on the proposed topic thread structure. We consider this as a first step towards facilitating video authoring by users based on existing contents in a large-scale news video archive.
Spoken Document Retrieval Leveraging Unsupervised and Supervised Topic Modeling Techniques
Kuan-Yu CHEN Hsin-Min WANG Berlin CHEN

PAPER-Speech Processing

Vol:
E95-D No:5
Page(s):
1195-1205
This paper describes the application of two attractive categories of topic modeling techniques to the problem of spoken document retrieval (SDR), viz. document topic model (DTM) and word topic model (WTM). Apart from using the conventional unsupervised training strategy, we explore a supervised training strategy for estimating these topic models, imagining a scenario that user query logs along with click-through information of relevant documents can be utilized to build an SDR system. This attempt has the potential to associate relevant documents with queries even if they do not share any of the query words, thereby improving on retrieval quality over the baseline system. Likewise, we also study a novel use of pseudo-supervised training to associate relevant documents with queries through a pseudo-feedback procedure. Moreover, in order to lessen SDR performance degradation caused by imperfect speech recognition, we investigate leveraging different levels of index features for topic modeling, including words, syllable-level units, and their combination. We provide a series of experiments conducted on the TDT (TDT-2 and TDT-3) Chinese SDR collections. The empirical results show that the methods deduced from our proposed modeling framework are very effective when compared with a few existing retrieval approaches.
Enhancing Digital Book Clustering by LDAC Model
Lidong WANG Yuan JIE

PAPER

Vol:
E95-D No:4
Page(s):
982-988
In Digital Library (DL) applications, digital book clustering is an important and urgent research task. However, it is difficult to conduct effectively because of the great length of digital books. To do the correct clustering for digital books, a novel method based on probabilistic topic model is proposed. Firstly, we build a topic model named LDAC. The main goal of LDAC topic modeling is to effectively extract topics from digital books. Subsequently, Gibbs sampling is applied for parameter inference. Once the model parameters are learned, each book is assigned to the cluster which maximizes the posterior probability. Experimental results demonstrate that our approach based on LDAC is able to achieve significant improvement as compared to the related methods.
Visual Knowledge Structure Reasoning with Intelligent Topic Map
Huimin LU Boqin FENG Xi CHEN

PAPER-Artificial Intelligence, Data Mining

Vol:
E93-D No:10
Page(s):
2805-2812
This paper presents a visual knowledge structure reasoning method using Intelligent Topic Map which extends the conventional Topic Map in structure and enhances its reasoning functions. Visual knowledge structure reasoning method integrates two types of knowledge reasoning: the knowledge logical relation reasoning and the knowledge structure reasoning. The knowledge logical relation reasoning implements knowledge consistency checking and the implicit associations reasoning between knowledge points. We propose a Knowledge Unit Circle Search strategy for the knowledge structure reasoning. It implements the semantic implication extension, the semantic relevant extension and the semantic class belonging confirmation. Moreover, the knowledge structure reasoning results are visualized using ITM Toolkit. A prototype system of visual knowledge structure reasoning has been implemented and applied to the massive knowledge organization, management and service for education.
Novel Confidence Feature Extraction Algorithm Based on Latent Topic Similarity
Wei CHEN Gang LIU Jun GUO Shinichiro OMACHI Masako OMACHI Yujing GUO

PAPER-Speech and Hearing

Vol:
E93-D No:8
Page(s):
2243-2251
In speech recognition, confidence annotation adopts a single confidence feature or a combination of different features for classification. These confidence features are always extracted from decoding information. However, it is proved that about 30% of knowledge of human speech understanding is mainly derived from high-level information. Thus, how to extract a high-level confidence feature statistically independent of decoding information is worth researching in speech recognition. In this paper, a novel confidence feature extraction algorithm based on latent topic similarity is proposed. Each word topic distribution and context topic distribution in one recognition result is firstly obtained using the latent Dirichlet allocation (LDA) topic model, and then, the proposed word confidence feature is extracted by determining the similarities between these two topic distributions. The experiments show that the proposed feature increases the number of information sources of confidence features with a good information complementary effect and can effectively improve the performance of confidence annotation combined with confidence features from decoding information.
A Topic-Independent Method for Scoring Student Essay Content
Ryo NAGATA Jun-ichi KAKEGAWA Yukiko YABUTA

PAPER-Educational Technology

Vol:
E93-D No:2
Page(s):
335-340
This paper proposes a topic-independent method for automatically scoring essay content. Unlike conventional topic-dependent methods, it predicts the human-assigned score of a given essay without training essays written to the same topic as the target essay. To achieve this, this paper introduces a new measure called MIDF that measures how important and relevant a word is in a given essay. The proposed method predicts the score relying on the distribution of MIDF. Surprisingly, experiments show that the proposed method achieves an accuracy of 0.848 and performs as well as or even better than conventional topic-dependent methods.
Novel Topic Maps to RDF/RDF Schema Translation Method
Shinae SHIN Dongwon JEONG Doo-Kwon BAIK

PAPER-Knowledge Representation

Vol:
E91-D No:11
Page(s):
2626-2637
We propose an enhanced method for translating Topic Maps to RDF/RDF Schema, to realize the Semantic Web. A critical issue for the Semantic Web is to efficiently and precisely describe Web information resources, i.e., Web metadata. Two representative standards, Topic Maps and RDF have been used for Web metadata. RDF-based standardization and implementation of the Semantic Web have been actively performed. Since the Semantic Web must accept and understand all Web information resources that are represented with the other methods, Topic Maps-to-RDF translation has become an issue. Even though many Topic Maps to RDF translation methods have been devised, they still have several problems (e.g. semantic loss, complex expression, etc.). Our translation method provides an improved solution to these problems. This method shows lower semantic loss than the previous methods due to extract both explicit semantics and implicit semantics. Compared to the previous methods, our method reduces the encoding complexity of resulting RDF. In addition, in terms of reversibility, the proposed method regenerates all Topic Maps constructs in an original source when is reverse translated.
Entity Network Prediction Using Multitype Topic Models
Hitohiro SHIOZAKI Koji EGUCHI Takenao OHKAWA

PAPER-Knowledge Discovery and Data Mining

Vol:
E91-D No:11
Page(s):
2589-2598
- HTML
- PDF(518.1KB) >> Buy this Article
- Errata[Uploaded on December 1,2008]
Conveying information about who, what, when and where is a primary purpose of some genres of documents, typically news articles. Statistical models that capture dependencies between named entities and topics can play an important role in handling such information. Although some relationships between who and where should be mentioned in such a document, no statistical topic models explicitly address the textual interactions between a who-entity and a where-entity. This paper presents a statistical model that directly captures the dependencies between an arbitrary number of word types, such as who-entities, where-entities and topics, mentioned in each document. We show that this multitype topic model performs better at making predictions on entity networks, in which each vertex represents an entity and each edge weight represents how a pair of entities at the incident vertices is closely related, through our experiments on predictions of who-entities and links between them. We also demonstrate the scale-free property in the weighted networks of entities extracted from written mentions.
3D Virtual Environment Navigation Aid Techniques for Novice Users Using Topic Map
Hak-Keun KIM Teuk-Seob SONG Yoon-Chul CHOY Soon-Bum LIM

PAPER-Fundamentals of Software and Theory of Programs

Vol:
E89-D No:8
Page(s):
2411-2419
3D virtual environment provides a limited amount of information, mainly focusing on visual information. This is the main cause of users losing the sense of direction in the environment. Many researches for developing a navigation tools that address this problem have been carried out. In this study, a navigation tool is designed by applying topic map, one of the technologies for semantic web construction, to a 3D virtual environment. Topic map constructs a semantic link map by defining the connection relation between topics. According to an experiment done to evaluate the proposed navigation tool, the tool was more helpful in finding detailed object than highly represented objects. Also, it could be seen that providing the surrounding knowledge is effective for object selection by users when that target for searching is not defined.
Using Topic Keyword Clusters for Automatic Document Clustering
Hsi-Cheng CHANG Chiun-Chieh HSU

PAPER-Document Clustering

Vol:
E88-D No:8
Page(s):
1852-1860
Data clustering is a technique for grouping similar data items together for convenient understanding. Conventional data clustering methods, including agglomerative hierarchical clustering and partitional clustering algorithms, frequently perform unsatisfactorily for large text collections, since the computation complexities of the conventional data clustering methods increase very quickly with the number of data items. Poor clustering results degrade intelligent applications such as event tracking and information extraction. This paper presents an unsupervised document clustering method which identifies topic keyword clusters of the text corpus. The proposed method adopts a multi-stage process. First, an aggressive data cleaning approach is employed to reduce the noise in the free text and further identify the topic keywords in the documents. All extracted keywords are then grouped into topic keyword clusters using the k-nearest neighbor approach and the keyword clustering technique. Finally, all documents in the corpus are clustered based on the topic keyword clusters. The proposed method is assessed against conventional data clustering methods on a web news corpus. The experimental results show that the proposed method is an efficient and effective clustering approach.
Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching
Ian R. LANE Tatsuya KAWAHARA Tomoko MATSUI Satoshi NAKAMURA

PAPER-Spoken Language Systems

Vol:
E88-D No:3
Page(s):
446-454
An efficient, scalable speech recognition architecture combining topic detection and topic-dependent language modeling is proposed for multi-domain spoken language systems. In the proposed approach, the inferred topic is automatically detected from the user's utterance, and speech recognition is then performed by applying an appropriate topic-dependent language model. This approach enables users to freely switch between domains while maintaining high recognition accuracy. As topic detection is performed on a single utterance, detection errors may occur and propagate through the system. To improve robustness, a hierarchical back-off mechanism is introduced where detailed topic models are applied when topic detection is confident and wider models that cover multiple topics are applied in cases of uncertainty. The performance of the proposed architecture is evaluated when combined with two topic detection methods: unigram likelihood and SVMs (Support Vector Machines). On the ATR Basic Travel Expression Corpus, both methods provide a significant reduction in WER (9.7% and 10.3%, respectively) compared to a single language model system. Furthermore, recognition accuracy is comparable to performing decoding with all topic-dependent models in parallel, while the required computational cost is much reduced.
Language Model Adaptation Based on PLSA of Topics and Speakers for Automatic Transcription of Panel Discussions
Yuya AKITA Tatsuya KAWAHARA

PAPER-Spoken Language Systems

Vol:
E88-D No:3
Page(s):
439-445
Appropriate language modeling is one of the major issues for automatic transcription of spontaneous speech. We propose an adaptation method for statistical language models based on both topic and speaker characteristics. This approach is applied for automatic transcription of meetings and panel discussions, in which multiple participants speak on a given topic in their own speaking style. A baseline language model is a mixture of two models, which are trained with different corpora covering various topics and speakers, respectively. Then, probabilistic latent semantic analysis (PLSA) is performed on the same respective corpora and the initial ASR result to provide two sets of unigram probabilities conditioned on input speech, with regard to topics and speaker characteristics, respectively. Finally, the baseline model is adapted by scaling N-gram probabilities with these unigram probabilities. For speaker adaptation purpose, we make use of a portion of the Corpus of Spontaneous Japanese (CSJ) in which a large number of speakers gave talks for given topics. Experimental evaluation with real discussions showed that both topic and speaker adaptation reduced test-set perplexity, and in total, an average reduction rate of 8.5% was obtained. Furthermore, improvement on word accuracy was also achieved by the proposed adaptation method.
Topic Keyword Identification for Text Summarization Using Lexical Clustering
Youngjoong KO Kono KIM Jungyun SEO

PAPER

Vol:
E86-D No:9
Page(s):
1695-1701
Automatic text summarization has the goal of reducing the size of a document while preserving its content. Generally, producing a summary as extracts is achieved by including only sentences which are the most topic-related. DOCUSUM is our summarization system based on a new topic keyword identification method. The process of DOCUSUM is as follows. First, DOCUSUM converts the content words of a document into elements of a context vector space. It then constructs lexical clusters from the context vector space and identifies core clusters. Next, it selects topic keywords from the core clusters. Finally, it generates a summary of the document using the topic keywords. In the experiments on various compression ratios (the compression of 30%, the compression of 10%, and the extraction of the fixed number of sentences: 4 or 8 sentences), DOCUSUM showed better performance than other methods.

21-40hit(46hit)

Keyword Search Result

[Keyword] topic(46hit)

Topic-Based Knowledge Transfer Algorithm for Cross-View Action Recognition

Confidence Measure Based on Context Consistency Using Word Occurrence Probability and Topic Adaptation for Spoken Term Detection

Online High-Quality Topic Detection for Bulletin Board Systems

Characterizing Web APIs Combining Supervised Topic Model with Ontology

MPI/OpenMP Hybrid Parallel Inference Methods for Latent Dirichlet Allocation – Approximation and Evaluation

Automatic Topic Identification for Idea Summarization in Idea Visualization Programs

Topic Extraction for Documents Based on Compressibility Vector

Efficient Tracking of News Topics Based on Chronological Semantic Structures in a Large-Scale News Video Archive

Spoken Document Retrieval Leveraging Unsupervised and Supervised Topic Modeling Techniques

Enhancing Digital Book Clustering by LDAC Model

Visual Knowledge Structure Reasoning with Intelligent Topic Map

Novel Confidence Feature Extraction Algorithm Based on Latent Topic Similarity

A Topic-Independent Method for Scoring Student Essay Content

Novel Topic Maps to RDF/RDF Schema Translation Method

Entity Network Prediction Using Multitype Topic Models

3D Virtual Environment Navigation Aid Techniques for Novice Users Using Topic Map

Using Topic Keyword Clusters for Automatic Document Clustering

Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching

Language Model Adaptation Based on PLSA of Topics and Speakers for Automatic Transcription of Panel Discussions

Topic Keyword Identification for Text Summarization Using Lexical Clustering

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles