IEICE global.ieice.org Site

Keyword Search Result

[Keyword] summarization(25hit)

1-20hit(25hit)

TIG: A Multitask Temporal Interval Guided Framework for Key Frame Detection Open Access
Shijie WANG Xuejiao HU Sheng LIU Ming LI Yang LI Sidan DU

PAPER-Image Recognition, Computer Vision

Pubricized:
2024/05/17
Vol:
E107-D No:9
Page(s):
1253-1263
Detecting key frames in videos has garnered substantial attention in recent years, it is a point-level task and has deep research value and application prospect in daily life. For instances, video surveillance system, video cover generation and highlight moment flashback all demands the technique of key frame detection. However, the task is beset by challenges such as the sparsity of key frame instances, imbalances between target frames and background frames, and the absence of post-processing method. In response to these problems, we introduce a novel and effective Temporal Interval Guided (TIG) framework to precisely localize specific frames. The framework is incorporated with a proposed Point-Level-Soft non-maximum suppression (PLS-NMS) post-processing algorithm which is suitable for point-level task, facilitated by the well-designed confidence score decay function. Furthermore, we propose a TIG-loss, exhibiting sensitivity to temporal interval from target frame, to optimize the two-stage framework. The proposed method can be broadly applied to key frame detection in video understanding, including action start detection and static video summarization. Extensive experimentation validates the efficacy of our approach on action start detection benchmark datasets: THUMOS’14 and Activitynet v1.3, and we have reached state-of-the-art performance. Competitive results are also demonstrated on SumMe and TVSum datasets for deep learning based static video summarization.
Hierarchical Latent Alignment for Non-Autoregressive Generation under High Compression Ratio
Wang XU Yongliang MA Kehai CHEN Ming ZHOU Muyun YANG Tiejun ZHAO

PAPER-Natural Language Processing

Pubricized:
2023/12/01
Vol:
E107-D No:3
Page(s):
411-419
Non-autoregressive generation has attracted more and more attention due to its fast decoding speed. Latent alignment objectives, such as CTC, are designed to capture the monotonic alignments between the predicted and output tokens, which have been used for machine translation and sentence summarization. However, our preliminary experiments revealed that CTC performs poorly on document abstractive summarization, where a high compression ratio between the input and output is involved. To address this issue, we conduct a theoretical analysis and propose Hierarchical Latent Alignment (HLA). The basic idea is a two-step alignment process: we first align the sentences in the input and output, and subsequently derive token-level alignment using CTC based on aligned sentences. We evaluate the effectiveness of our proposed approach on two widely used datasets XSUM and CNNDM. The results indicate that our proposed method exhibits remarkable scalability even when dealing with high compression ratios.
A Knowledge Representation Based User-Driven Ontology Summarization Method
Yuehang DING Hongtao YU Jianpeng ZHANG Huanruo LI Yunjie GU

LETTER-Data Engineering, Web Information Systems

Pubricized:
2019/05/30
Vol:
E102-D No:9
Page(s):
1870-1873
As the superstructure of knowledge graph, ontology has been widely applied in knowledge engineering. However, it becomes increasingly difficult to be practiced and comprehended due to the growing data size and complexity of schemas. Hence, ontology summarization surfaced to enhance the comprehension and application of ontology. Existing summarization methods mainly focus on ontology's topology without taking semantic information into consideration, while human understand information based on semantics. Thus, we proposed a novel algorithm to integrate semantic information and topological information, which enables ontology to be more understandable. In our work, semantic and topological information are represented by concept vectors, a set of high-dimensional vectors. Distances between concept vectors represent concepts' similarity and we selected important concepts following these two criteria: 1) the distances from important concepts to normal concepts should be as short as possible, which indicates that important concepts could summarize normal concepts well; 2) the distances from an important concept to the others should be as long as possible which ensures that important concepts are not similar to each other. K-means++ is adopted to select important concepts. Lastly, we performed extensive evaluations to compare our algorithm with existing ones. The evaluations prove that our approach performs better than the others in most of the cases.
Improving Thai Word and Sentence Segmentation Using Linguistic Knowledge
Rungsiman NARARATWONG Natthawut KERTKEIDKACHORN Nagul COOHAROJANANONE Hitoshi OKADA

PAPER-Natural Language Processing

Pubricized:
2018/09/07
Vol:
E101-D No:12
Page(s):
3218-3225
Word boundary ambiguity in word segmentation has long been a fundamental challenge within Thai language processing. The Conditional Random Fields (CRF) model is among the best-known methods to have achieved remarkably accurate segmentation. Nevertheless, current advancements appear to have left the problem of compound words unaccounted for. Compound words lose their meaning or context once segmented. Hence, we introduce a dictionary-based word-merging algorithm, which merges all kinds of compound words. Our evaluation shows that the algorithm can accomplish a high-accuracy of word segmentation, with compound words being preserved. Moreover, it can also restore some incorrectly segmented words. Another problem involving a different word-chunking approach is sentence boundary ambiguity. In tackling the problem, utilizing the part of speech (POS) of a segmented word has been found previously to help boost the accuracy of CRF-based sentence segmentation. However, not all segmented words can be tagged. Thus, we propose a POS-based word-splitting algorithm, which splits words in order to increase POS tags. We found that with more identifiable POS tags, the CRF model performs better in segmenting sentences. To demonstrate the contributions of both methods, we experimented with three of their applications. With the word merging algorithm, we found that intact compound words in the product of topic extraction can help to preserve their intended meanings, offering more precise information for human interpretation. The algorithm, together with the POS-based word-splitting algorithm, can also be used to amend word-level Thai-English translations. In addition, the word-splitting algorithm improves sentence segmentation, thus enhancing text summarization.
Identifying Core Objects for Trace Summarization by Analyzing Reference Relations and Dynamic Properties
Kunihiro NODA Takashi KOBAYASHI Noritoshi ATSUMI

PAPER

Pubricized:
2018/04/20
Vol:
E101-D No:7
Page(s):
1751-1765
Behaviors of an object-oriented system can be visualized as reverse-engineered sequence diagrams from execution traces. This approach is a valuable tool for program comprehension tasks. However, owing to the massiveness of information contained in an execution trace, a reverse-engineered sequence diagram is often afflicted by a scalability issue. To address this issue, many trace summarization techniques have been proposed. Most of the previous techniques focused on reducing the vertical size of the diagram. To cope with the scalability issue, decreasing the horizontal size of the diagram is also very important. Nonetheless, few studies have addressed this point; thus, there is a lot of needs for further development of horizontal summarization techniques. We present in this paper a method for identifying core objects for trace summarization by analyzing reference relations and dynamic properties. Visualizing only interactions related to core objects, we can obtain a horizontally compactified reverse-engineered sequence diagram that contains system's key behaviors. To identify core objects, first, we detect and eliminate temporary objects that are trivial for a system by analyzing reference relations and lifetimes of objects. Then, estimating the importance of each non-trivial object based on their dynamic properties, we identify highly important ones (i.e., core objects). We implemented our technique in our tool and evaluated it by using traces from various open-source software systems. The results showed that our technique was much more effective in terms of the horizontal reduction of a reverse-engineered sequence diagram, compared with the state-of-the-art trace summarization technique. The horizontal compression ratio of our technique was 134.6 on average, whereas that of the state-of-the-art technique was 11.5. The runtime overhead imposed by our technique was 167.6% on average. This overhead is relatively small compared with recent scalable dynamic analysis techniques, which shows the practicality of our technique. Overall, our technique can achieve a significant reduction of the horizontal size of a reverse-engineered sequence diagram with a small overhead and is expected to be a valuable tool for program comprehension.
Towards an Improvement of Bug Report Summarization Using Two-Layer Semantic Information
Cheng-Zen YANG Cheng-Min AO Yu-Han CHUNG

PAPER

Pubricized:
2018/04/20
Vol:
E101-D No:7
Page(s):
1743-1750
Bug report summarization has been explored in past research to help developers comprehend important information for bug resolution process. As text mining technology advances, many summarization approaches have been proposed to provide substantial summaries on bug reports. In this paper, we propose an enhanced summarization approach called TSM by first extending a semantic model used in AUSUM with the anthropogenic and procedural information in bug reports and then integrating the extended semantic model with the shallow textual information used in BRC. We have conducted experiments with a dataset of realistic software projects. Compared with the baseline approaches BRC and AUSUM, TSM demonstrates the enhanced performance in achieving relative improvements of 34.3% and 7.4% in the F1 measure, respectively. The experimental results show that TSM can effectively improve the performance.
Entity Summarization Based on Entity Grouping in Multilingual Projected Entity Space
Eun-kyung KIM Key-Sun CHOI

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2017/06/02
Vol:
E100-D No:9
Page(s):
2138-2146
Entity descriptions have been exponentially growing in community-generated knowledge databases, such as DBpedia. However, many of those descriptions are not useful for identifying the underlying characteristics of their corresponding entities because semantically redundant facts or triples are included in the descriptions that represent the connections between entities without any semantic properties. Entity summarization is applied to filter out such non-informative triples and meaning-redundant triples and rank the remaining informative facts within the size of the triples for summarization. This study proposes an entity summarization approach based on pre-grouping the entities that share a set of attributes that can be used to characterize the entities we want to summarize. Entities are first grouped according to projected multilingual categories that provide the multi-angled semantics of each entity into a single entity space. Key facts about the entity are then determined through in-group-based rankings. As a result, our proposed approach produced summary information of significantly better quality (p-value =1.52×10-3 and 2.01×10-3 for the top-10 and -5 summaries, respectively) than the state-of-the-art method that requires additional external resources.
Cognition-Aware Summarization of Photos Representing Events
Bei LIU Makoto P. KATO Katsumi TANAKA

PAPER-Image Processing and Video Processing

Pubricized:
2016/09/01
Vol:
E99-D No:12
Page(s):
3140-3153
The use of photo summarization technology to summarize a photo collection is often oriented to users who own the photo collection. However, people's interest in sharing photos with others highlights the importance of cognition-aware summarization of photos by which viewers can easily recognize the exact event those photos represent. In this research, we address the problem of cognition-aware summarization of photos representing events, and propose to solve this problem and to improve the perceptual quality of a photo set by proactively preventing misrecognization that a photo set might bring. Three types of neighbor events that can possibly cause misrecognizations are discussed in this paper, namely sub-events, super-events and sibling-events. We analyze the reasons for these misrecognitions and then propose three criteria to prevent from them. A combination of the criteria is used to generate summarization of photos that can represent an event with several photos. Our approach was empirically demonstrated with photos from Flickr by utilizing their visual features and related tags. The results indicated the effectiveness of our proposed methods in comparison with a baseline method.
Key Frame Extraction Based on Chaos Theory and Color Information for Video Summarization
Jaeyong JU Taeyup SONG Bonhwa KU Hanseok KO

LETTER-Image Processing and Video Processing

Pubricized:
2016/02/23
Vol:
E99-D No:6
Page(s):
1698-1701
- HTML
- PDF(385.1KB) >> Buy this Article
- Errata[Uploaded on July 1,2016]
Key frame based video summarization has emerged as an important task for efficient video data management. This paper proposes a novel technique for key frame extraction based on chaos theory and color information. By applying chaos theory, a large content change between frames becomes more chaos-like and results in a more complex fractal trajectory in phase space. By exploiting the fractality measured in the phase space between frames, it is possible to evaluate inter-frame content changes invariant to effects of fades and illumination change. In addition to this measure, the color histogram-based measure is also used to complement the chaos-based measure which is sensitive to changes of camera /object motion. By comparing the last key frame with the current frame based on the proposed frame difference measure combining these two complementary measures, the key frames are robustly selected even under presence of video fades, changes of illumination, and camera/object motion. The experimental results demonstrate its effectiveness with significant improvement over the conventional method.
A Hybrid Topic Model for Multi-Document Summarization
JinAn XU JiangMing LIU Kenji ARAKI

PAPER-Natural Language Processing

Pubricized:
2015/02/09
Vol:
E98-D No:5
Page(s):
1089-1094
Topic features are useful in improving text summarization. However, independency among topics is a strong restriction on most topic models, and alleviating this restriction can deeply capture text structure. This paper proposes a hybrid topic model to generate multi-document summaries using a combination of the Hidden Topic Markov Model (HTMM), the surface texture model and the topic transition model. Based on the topic transition model, regular topic transition probability is used during generating summary. This approach eliminates the topic independence assumption in the Latent Dirichlet Allocation (LDA) model. Meanwhile, the results of experiments show the advantage of the combination of the three kinds of models. This paper includes alleviating topic independency, and integrating surface texture and shallow semantic in documents to improve summarization. In short, this paper attempts to realize an advanced summarization system.
Automatic Topic Identification for Idea Summarization in Idea Visualization Programs
Kobkrit VIRIYAYUDHAKORN Susumu KUNIFUJI

PAPER-Artificial Intelligence, Data Mining

Vol:
E96-D No:1
Page(s):
64-72
Recent idea visualization programs still lack automatic idea summarization capabilities. This paper presents a knowledge-based method for automatically providing a short piece of English text about a topic to each idea group in idea charts. This automatic topic identification makes used Yet Another General Ontology (YAGO) and Wordnet as its knowledge bases. We propose a novel topic selection method and we compared its performance with three existing methods using two experimental datasets constructed using two idea visualization programs, i.e., the KJ Method (Kawakita Jiro Method) and mind-mapping programs. Our proposed topic identification method outperformed the baseline method in terms of both performance and consistency.
Learning to Generate a Table-of-Contents with Supportive Knowledge
Viet Cuong NGUYEN Le Minh NGUYEN Akira SHIMAZU

PAPER

Vol:
E94-D No:3
Page(s):
423-431
In the text summarization field, a table-of-contents is a type of indicative summary that is especially suited for locating information in a long document, or a set of documents. It is also a useful summary for a reader to quickly get an overview of the entire contents. The current models for generating a table-of-contents produced relatively low quality output with many meaningless titles, or titles that have no overlapping meaning with the corresponding contents. This problem may be due to the lack of semantic information and topic information in those models. In this research, we propose to integrate supportive knowledge into the learning models to improve the quality of titles in a generated table-of-contents. The supportive knowledge is derived from a hierarchical clustering of words, which is built from a large collection of raw text, and a topic model, which is directly estimated from the training data. The relatively good results of the experiments showed that the semantic and topic information supplied by supportive knowledge have good effects on title generation, and therefore, they help to improve the quality of the generated table-of-contents.
User and Device Adaptation in Summarizing Sports Videos
Naoko NITTA Noboru BABAGUCHI

PAPER-Image Processing and Video Processing

Vol:
E92-D No:6
Page(s):
1280-1288
Video summarization is defined as creating a video summary which includes only important scenes in the original video streams. In order to realize automatic video summarization, the significance of each scene needs to be determined. When targeted especially on broadcast sports videos, a play scene, which corresponds to a play, can be considered as a scene unit. The significance of every play scene can generally be determined based on the importance of the play in the game. Furthermore, the following two issues should be considered: 1) what is important depends on each user's preferences, and 2) the summaries should be tailored for media devices that each user has. Considering the above issues, this paper proposes a unified framework for user and device adaptation in summarizing broadcast sports videos. The proposed framework summarizes sports videos by selecting play scenes based on not only the importance of each play itself but also the users' preferences by using the metadata, which describes the semantic content of videos with keywords, and user profiles, which describe users' preference degrees for the keywords. The selected scenes are then presented in a proper way using various types of media such as video, image, or text according to device profiles which describe the device type. We experimentally verified the effectiveness of user adaptation by examining how the generated summaries are changed by different preference degrees and by comparing our results with/without using user profiles. The validity of device adaptation is also evaluated by conducting questionnaires using PCs and mobile phones as the media devices.
A Model of Discourse Segmentation and Segment Title Assignment for Lecture Speech Indexing
Kazuhiro TAKEUCHI Yukie NAKAO Hitoshi ISAHARA

PAPER

Vol:
E90-D No:10
Page(s):
1601-1610
Dividing a lecture speech into segments and providing those segments as learning objects are quite general and convenient way to construct e-learning resources. However it is difficult to assign an appropriate title to each object that reflects its content. Since there are various aspects of analyzing discourse segments, it is inevitable that researchers will face the diversity when describing the "meanings" of discourse segments. In this paper, we propose the assignment of discourse segment titles from the representation of their "meanings." In this assigning procedure, we focus on the speaker's evaluation for the event or the speech object. To verify the effectiveness of our idea, we examined identification of the segment boundaries from the titles that were described in our procedure. We confirmed that the result of the identification was more accurate than that of intuitive identification.
Utilizing "Wisdom of Crowds" for Handling Multimedia Contents
Koichiro ISHIKAWA Yoshihisa SHINOZAWA Akito SAKURAI

PAPER

Vol:
E90-D No:10
Page(s):
1657-1662
We propose in this paper a SOM-like algorithm that accepts online, as inputs, starts and ends of viewing of a multimedia content by many users; a one-dimensional map is then self-organized, providing an approximation of density distribution showing how many users see a part of a multimedia content. In this way "viewing behavior of crowds" information is accumulated as experience accumulates, summarized into one SOM-like network as knowledge is extracted, and is presented to new users as the knowledge is transmitted. Accumulation of multimedia contents on the Internet increases the need for time-efficient viewing of the contents and the possibility of compiling information on many users' viewing experiences. In the circumstances, a system has been proposed that presents, in the Internet environment, a kind of summary of viewing records of many viewers of a multimedia content. The summary is expected to show that some part is seen by many users but some part is rarely seen. The function is similar to websites utilizing "wisdom of crowds" and is facilitated by our proposed algorithm.
Summarization of 3D Video by Rate-Distortion Trade-off
Jianfeng XU Toshihiko YAMASAKI Kiyoharu AIZAWA

PAPER-Image Processing and Video Processing

Vol:
E90-D No:9
Page(s):
1430-1438
3D video, which consists of a sequence of mesh models, can reproduce dynamic scenes containing 3D information. To summarize 3D video, a key frame extraction method is developed using rate-distortion (R-D) trade-off. For this purpose, an effective feature vector is extracted for each frame. Shot detection is performed using the feature vectors as a preprocessing followed by key frame extraction. Simple but reasonable definitions of rate and distortion are presented. Based on an assumption of linearity, an R-D curve is generated in each shot, where the locations of the key frames are optimized. Finally, R-D trade-off can be achieved by optimizing a cost function using a Lagrange multiplier, where the number of key frames is optimized in each shot. Therefore, our system will automatically determine the best locations and the number of key frames in the sense of R-D trade-off. Our experimental results show the extracted key frames are compact and faithful to the original 3D video.
Recent Progress in Corpus-Based Spontaneous Speech Recognition
Sadaoki FURUI

INVITED PAPER

Vol:
E88-D No:3
Page(s):
366-375
This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automatic speech recognition. Broadening the application of speech recognition depends crucially on raising recognition performance for spontaneous speech. For this purpose, it is necessary to build large spontaneous speech corpora for constructing acoustic and language models. This paper focuses on various achievements of a Japanese 5-year national project "Spontaneous Speech: Corpus and Processing Technology" that has recently been completed. Because of various spontaneous-speech specific phenomena, such as filled pauses, repairs, hesitations, repetitions and disfluencies, recognition of spontaneous speech requires various new techniques. These new techniques include flexible acoustic modeling, sentence boundary detection, pronunciation modeling, acoustic as well as language model adaptation, and automatic summarization. Particularly automatic summarization including indexing, a process which extracts important and reliable parts of the automatic transcription, is expected to play an important role in building various speech archives, speech-based information retrieval systems, and human-computer dialogue systems.
A Probabilistic Sentence Reduction Using Maximum Entropy Model
Minh LE NGUYEN Masaru FUKUSHI Susumu HORIGUCHI

PAPER-Natural Language Processing

Vol:
E88-D No:2
Page(s):
278-288
This paper describes a new probabilistic sentence reduction method using maximum entropy model. In contrast to previous methods, the proposed method has the ability to produce multiple best results for a given sentence, which is useful in text summarization applications. Experimental results show that the proposed method improves on earlier methods in both accuracy and computation time.
DCAA: A Dynamic Constrained Adaptive Aggregation Method for Effective Network Traffic Information Summarization
Kazuhide KOIDE Glenn Mansfield KEENI Gen KITAGATA Norio SHIRATORI

PAPER-Implementation and Operation

Vol:
E87-B No:3
Page(s):
413-420
Online and realtime traffic summarization is a challenge as, except for the routine cases, aggregation parameters or, the flows that need to be observed are not known a priori. Dynamic adaptive aggregation algorithms adapt to the network traffic to detect the important flows. But present day algorithms are inadequate as they often produce inaccurate or meaningless aggregates. In this work we propose a Dynamic Constrained Adaptive Aggregation algorithm that does not produce the meaningless aggregates by using information about the network's configuration. We compare the performance of this algorithm with the erstwhile Dynamic (Unconstrained) Adaptive Aggregation algorithm and show its efficacy. Further we use the network map context that shows the network flows in an intuitive manner. Several applications of the algorithm and network map based visualization are discussed.
Speech Summarization: An Approach through Word Extraction and a Method for Evaluation
Chiori HORI Sadaoki FURUI

PAPER

Vol:
E87-D No:1
Page(s):
15-25
In this paper, we propose a new method of automatic speech summarization for each utterance, where a set of words that maximizes a summarization score is extracted from automatic speech transcriptions. The summarization score indicates the appropriateness of summarized sentences. This extraction is achieved by using a dynamic programming technique according to a target summarization ratio. This ratio is the number of characters/words in the summarized sentence divided by the number of characters/words in the original sentence. The extracted set of words is then connected to build a summarized sentence. The summarization score consists of a word significance measure, linguistic likelihood, and a confidence measure. This paper also proposes a new method of measuring summarization accuracy based on a word network expressing manual summarization results. The summarization accuracy of each automatic summarization is calculated by comparing it with the most similar word string in the network. Japanese broadcast-news speech, transcribed using a large-vocabulary continuous-speech recognition (LVCSR) system, is summarized and evaluated using our proposed method with 20, 40, 60, 70 and 80% summarization ratios. Experimental results reveal that the proposed method can effectively extract relatively important information by removing redundant or irrelevant information.

1-20hit(25hit)

Keyword Search Result

[Keyword] summarization(25hit)

TIG: A Multitask Temporal Interval Guided Framework for Key Frame Detection Open Access

Hierarchical Latent Alignment for Non-Autoregressive Generation under High Compression Ratio

A Knowledge Representation Based User-Driven Ontology Summarization Method

Improving Thai Word and Sentence Segmentation Using Linguistic Knowledge

Identifying Core Objects for Trace Summarization by Analyzing Reference Relations and Dynamic Properties

Towards an Improvement of Bug Report Summarization Using Two-Layer Semantic Information

Entity Summarization Based on Entity Grouping in Multilingual Projected Entity Space

Cognition-Aware Summarization of Photos Representing Events

Key Frame Extraction Based on Chaos Theory and Color Information for Video Summarization

A Hybrid Topic Model for Multi-Document Summarization

Automatic Topic Identification for Idea Summarization in Idea Visualization Programs

Learning to Generate a Table-of-Contents with Supportive Knowledge

User and Device Adaptation in Summarizing Sports Videos

A Model of Discourse Segmentation and Segment Title Assignment for Lecture Speech Indexing

Utilizing "Wisdom of Crowds" for Handling Multimedia Contents

Summarization of 3D Video by Rate-Distortion Trade-off

Recent Progress in Corpus-Based Spontaneous Speech Recognition

A Probabilistic Sentence Reduction Using Maximum Entropy Model

DCAA: A Dynamic Constrained Adaptive Aggregation Method for Effective Network Traffic Information Summarization

Speech Summarization: An Approach through Word Extraction and a Method for Evaluation

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles