IEICE global.ieice.org Site

Keyword Search Result

[Keyword] labeling(30hit)

1-20hit(30hit)

Joint Domain Adaption and Pseudo-Labeling for Cross-Project Defect Prediction
Fei WU Xinhao ZHENG Ying SUN Yang GAO Xiao-Yuan JING

LETTER-Software Engineering

Pubricized:
2021/11/04
Vol:
E105-D No:2
Page(s):
432-435
Cross-project defect prediction (CPDP) is a hot research topic in recent years. The inconsistent data distribution between source and target projects and lack of labels for most of target instances bring a challenge for defect prediction. Researchers have developed several CPDP methods. However, the prediction performance still needs to be improved. In this paper, we propose a novel approach called Joint Domain Adaption and Pseudo-Labeling (JDAPL). The network architecture consists of a feature mapping sub-network to map source and target instances into a common subspace, followed by a classification sub-network and an auxiliary classification sub-network. The classification sub-network makes use of the label information of labeled instances to generate pseudo-labels. The auxiliary classification sub-network learns to reduce the distribution difference and improve the accuracy of pseudo-labels for unlabeled instances through loss maximization. Network training is guided by the adversarial scheme. Extensive experiments are conducted on 10 projects of the AEEEM and NASA datasets, and the results indicate that our approach achieves better performance compared with the baselines.
Automatic Drawing of Complex Metro Maps
Masahiro ONDA Masaki MORIGUCHI Keiko IMAI

PAPER-Graphs and Networks

Pubricized:
2021/03/08
Vol:
E104-A No:9
Page(s):
1150-1155
The Tokyo subway is one of the most complex subway networks in the world and it is difficult to compute a visually readable metro map using existing layout methods. In this paper, we present a new method that can generate complex metro maps such as the Tokyo subway network. Our method consists of two phases. The first phase generates rough metro maps. It decomposes the metro networks into smaller subgraphs and partially generates rough metro maps. In the second phase, we use a local search technique to improve the aesthetic quality of the rough metro maps. The experimental results including the Tokyo metro map are shown.
Extracting Knowledge Entities from Sci-Tech Intelligence Resources Based on BiLSTM and Conditional Random Field
Weizhi LIAO Mingtong HUANG Pan MA Yu WANG

PAPER

Pubricized:
2021/04/22
Vol:
E104-D No:8
Page(s):
1214-1221
There are many knowledge entities in sci-tech intelligence resources. Extracting these knowledge entities is of great importance for building knowledge networks, exploring the relationship between knowledge, and optimizing search engines. Many existing methods, which are mainly based on rules and traditional machine learning, require significant human involvement, but still suffer from unsatisfactory extraction accuracy. This paper proposes a novel approach for knowledge entity extraction based on BiLSTM and conditional random field (CRF).A BiLSTM neural network to obtain the context information of sentences, and CRF is then employed to integrate global label information to achieve optimal labels. This approach does not require the manual construction of features, and outperforms conventional methods. In the experiments presented in this paper, the titles and abstracts of 20,000 items in the existing sci-tech literature are processed, of which 50,243 items are used to build benchmark datasets. Based on these datasets, comparative experiments are conducted to evaluate the effectiveness of the proposed approach. Knowledge entities are extracted and corresponding knowledge networks are established with a further elaboration on the correlation of two different types of knowledge entities. The proposed research has the potential to improve the quality of sci-tech information services.
Partition-then-Overlap Method for Labeling Cyber Threat Intelligence Reports by Topics over Time
Ryusei NAGASAWA Keisuke FURUMOTO Makoto TAKITA Yoshiaki SHIRAISHI Takeshi TAKAHASHI Masami MOHRI Yasuhiro TAKANO Masakatu MORII

LETTER

Pubricized:
2021/02/24
Vol:
E104-D No:5
Page(s):
556-561
The Topics over Time (TOT) model allows users to be aware of changes in certain topics over time. The proposed method inputs the divided dataset of security blog posts based on a fixed period using an overlap period to the TOT. The results suggest the extraction of topics that include malware and attack campaign names that are appropriate for the multi-labeling of cyber threat intelligence reports.
Selective Pseudo-Labeling Based Subspace Learning for Cross-Project Defect Prediction
Ying SUN Xiao-Yuan JING Fei WU Yanfei SUN

LETTER-Software Engineering

Pubricized:
2020/06/10
Vol:
E103-D No:9
Page(s):
2003-2006
Cross-project defect prediction (CPDP) is a research hot recently, which utilizes the data form existing source project to construct prediction model and predicts the defect-prone of software instances from target project. However, it is challenging in bridging the distribution difference between different projects. To minimize the data distribution differences between different projects and predict unlabeled target instances, we present a novel approach called selective pseudo-labeling based subspace learning (SPSL). SPSL learns a common subspace by using both labeled source instances and pseudo-labeled target instances. The accuracy of pseudo-labeling is promoted by iterative selective pseudo-labeling strategy. The pseudo-labeled instances from target project are iteratively updated by selecting the instances with high confidence from two pseudo-labeling technologies. Experiments are conducted on AEEEM dataset and the results show that SPSL is effective for CPDP.
Exploration into Gray Area: Toward Efficient Labeling for Detecting Malicious Domain Names
Naoki FUKUSHI Daiki CHIBA Mitsuaki AKIYAMA Masato UCHIDA

PAPER

Pubricized:
2019/10/08
Vol:
E103-B No:4
Page(s):
375-388
In this paper, we propose a method to reduce the labeling cost while acquiring training data for a malicious domain name detection system using supervised machine learning. In the conventional systems, to train a classifier with high classification accuracy, large quantities of benign and malicious domain names need to be prepared as training data. In general, malicious domain names are observed less frequently than benign domain names. Therefore, it is difficult to acquire a large number of malicious domain names without a dedicated labeling method. We propose a method based on active learning that labels data around the decision boundary of classification, i.e., in the gray area, and we show that the classification accuracy can be improved by using approximately 1% of the training data used by the conventional systems. Another disadvantage of the conventional system is that if the classifier is trained with a small amount of training data, its generalization ability cannot be guaranteed. We propose a method based on ensemble learning that integrates multiple classifiers, and we show that the classification accuracy can be stabilized and improved. The combination of the two methods proposed here allows us to develop a new system for malicious domain name detection with high classification accuracy and generalization ability by labeling a small amount of training data.
Rule-Based Automatic Question Generation Using Semantic Role Labeling Open Access
Onur KEKLIK Tugkan TUGLULAR Selma TEKIR

PAPER-Natural Language Processing

Pubricized:
2019/04/01
Vol:
E102-D No:7
Page(s):
1362-1373
This paper proposes a new rule-based approach to automatic question generation. The proposed approach focuses on analysis of both syntactic and semantic structure of a sentence. Although the primary objective of the designed system is question generation from sentences, automatic evaluation results shows that, it also achieves great performance on reading comprehension datasets, which focus on question generation from paragraphs. Especially, with respect to METEOR metric, the designed system significantly outperforms all other systems in automatic evaluation. As for human evaluation, the designed system exhibits similar performance by generating the most natural (human-like) questions.
Fast Lane Detection Based on Deep Convolutional Neural Network and Automatic Training Data Labeling
Xun PAN Harutoshi OGAI

PAPER-Image

Vol:
E102-A No:3
Page(s):
566-575
Lane detection or road detection is one of the key features of autonomous driving. In computer vision area, it is still a very challenging target since there are various types of road scenarios which require a very high robustness of the algorithm. And considering the rather high speed of the vehicles, high efficiency is also a very important requirement for practicable application of autonomous driving. In this paper, we propose a deep convolution neural network based lane detection method, which consider the lane detection task as a pixel level segmentation of the lane markings. We also propose an automatic training data generating method, which can significantly reduce the effort of the training phase. Experiment proves that our method can achieve high accuracy for various road scenes in real-time.
An Efficient Concept Drift Detection Method for Streaming Data under Limited Labeling
Youngin KIM Cheong Hee PARK

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2017/06/26
Vol:
E100-D No:10
Page(s):
2537-2546
- HTML
- PDF(1.1MB) >> Buy this Article
- Errata[Uploaded on November 1,2017]
In data stream analysis, detecting the concept drift accurately is important to maintain the classification performance. Most drift detection methods assume that the class labels become available immediately after a data sample arrives. However, it is unrealistic to attempt to acquire all of the labels when processing the data streams, as labeling costs are high and much time is needed. In this paper, we propose a concept drift detection method under the assumption that there is limited access or no access to class labels. The proposed method detects concept drift on unlabeled data streams based on the class label information which is predicted by a classifier or a virtual classifier. Experimental results on synthetic and real streaming data show that the proposed method is competent to detect the concept drift on unlabeled data stream.
A New Connected-Component Labeling Algorithm
Xiao ZHAO Lifeng HE Bin YAO Yuyan CHAO

LETTER-Pattern Recognition

Pubricized:
2015/08/05
Vol:
E98-D No:11
Page(s):
2013-2016
This paper presents a new connected component labeling algorithm. The proposed algorithm scans image lines every three lines and processes pixels three by three. When processing the current three pixels, we also utilize the information obtained before to reduce the repeated work for checking pixels in the mask. Experimental results demonstrated that our method is more efficient than the fastest conventional labeling algorithm.
An Efficient Two-Scan Labeling Algorithm for Binary Hexagonal Images
Lifeng HE Xiao ZHAO Bin YAO Yun YANG Yuyan CHAO

LETTER-Image Recognition, Computer Vision

Pubricized:
2014/08/27
Vol:
E97-D No:12
Page(s):
3244-3247
This paper proposes an efficient two-scan labeling algorithm for binary hexagonal images. Unlike conventional labeling algorithms, which process pixels one by one in the first scan, our algorithm processes pixels two by two. We show that using our algorithm, we can check a smaller number of pixels. Experimental results demonstrated that our method is more efficient than the algorithm extended straightly from the corresponding labeling algorithm for rectangle binary images.
Partial Volume Correction on ASL-MRI and Its Application on Alzheimer's Disease Diagnosis
Wenji YANG Wei HUANG Shanxue CHEN

PAPER-Image Processing and Video Processing

Vol:
E97-D No:11
Page(s):
2912-2918
- HTML
- PDF(850.1KB) >> Buy this Article
- Errata[Uploaded on December 1,2014]
Arterial spin labeling (ASL) is a non-invasive magnetic resonance imaging (MRI) method that can provide direct and quantitative measurements of cerebral blood flow (CBF) of scanned patients. ASL can be utilized as an imaging modality to detect Alzheimer's disease (AD), as brain atrophy of AD patients can be revealed by low CBF values in certain brain regions. However, partial volume effects (PVE), which is mainly caused by signal cross-contamination due to voxel heterogeneity and limited spatial resolution of ASL images, often prevents CBF in ASL from being precisely measured. In this study, a novel PVE correction method is proposed based on pixel-wise voxels in ASL images; it can well handle with the existing problems of blurring and loss of brain details in conventional PVE correction methods. Dozens of comparison experiments and statistical analysis also suggest that the proposed method is superior to other PVE correction methods in AD diagnosis based on real patients data.
Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs
Chen-Yu YANG Zhen-Hua LING Li-Rong DAI

PAPER-Speech Synthesis and Related Topics

Vol:
E97-D No:6
Page(s):
1449-1460
In this paper, an automatic and unsupervised method using context-dependent hidden Markov models (CD-HMMs) is proposed for the prosodic labeling of speech synthesis databases. This method consists of three main steps, i.e., initialization, model training and prosodic labeling. The initial prosodic labels are obtained by unsupervised clustering using the acoustic features designed according to the characteristics of the prosodic descriptor to be labeled. Then, CD-HMMs of the spectral parameters, F0s and phone durations are estimated by a means similar to the HMM-based parametric speech synthesis using the initial prosodic labels. These labels are further updated by Viterbi decoding under the maximum likelihood criterion given the acoustic feature sequences and the trained CD-HMMs. The model training and prosodic labeling procedures are conducted iteratively until convergence. The performance of the proposed method is evaluated on Mandarin speech synthesis databases and two prosodic descriptors are investigated, i.e., the prosodic phrase boundary and the emphasis expression. In our implementation, the prosodic phrase boundary labels are initialized by clustering the durations of the pauses between every two consecutive prosodic words, and the emphasis expression labels are initialized by examining the differences between the original and the synthetic F0 trajectories. Experimental results show that the proposed method is able to label the prosodic phrase boundary positions much more accurately than the text-analysis-based method without requiring any manually labeled training data. The unit selection speech synthesis system constructed using the prosodic phrase boundary labels generated by our proposed method achieves similar performance to that using the manual labels. Furthermore, the unit selection speech synthesis system constructed using the emphasis expression labels generated by our proposed method can convey the emphasis information effectively while maintaining the naturalness of synthetic speech.
Online Learned Player Recognition Model Based Soccer Player Tracking and Labeling for Long-Shot Scenes
Weicun XU Qingjie ZHAO Yuxia WANG Xuanya LI

PAPER-Pattern Recognition

Vol:
E97-D No:1
Page(s):
119-129
Soccer player tracking and labeling suffer from the similar appearance of the players in the same team, especially in long-shot scenes where the faces and the numbers of the players are too blurry to identify. In this paper, we propose an efficient multi-player tracking system. The tracking system takes the detection responses of a human detector as inputs. To realize real-time player detection, we generate a spatial proposal to minimize the scanning scope of the detector. The tracking system utilizes the discriminative appearance models trained using the online Boosting method to reduce data-association ambiguity caused by the appearance similarity of the players. We also propose to build an online learned player recognition model which can be embedded in the tracking system to approach online player recognition and labeling in tracking applications for long-shot scenes by two stages. At the first stage, to build the model, we utilize the fast k-means clustering method instead of classic k-means clustering to build and update a visual word vocabulary in an efficient online manner, using the informative descriptors extracted from the training samples drawn at each time step of multi-player tracking. The first stage finishes when the vocabulary is ready. At the second stage, given the obtained visual word vocabulary, an incremental vector quantization strategy is used to recognize and label each tracked player. We also perform importance recognition validation to avoid mistakenly recognizing an outlier, namely, people we do not need to recognize, as a player. Both quantitative and qualitative experimental results on the long-shot video clips of a real soccer game video demonstrate that, the proposed player recognition model performs much better than some state-of-the-art online learned models, and our tracking system also performs quite effectively even under very complicated situations.
A New First-Scan Method for Two-Scan Labeling Algorithms
Lifeng HE Yuyan CHAO Kenji SUZUKI

LETTER-Pattern Recognition

Vol:
E95-D No:8
Page(s):
2142-2145
This paper proposes a new first-scan method for two-scan labeling algorithms. In the first scan, our proposed method first scans every fourth image line, and processes the scan line and its two neighbor lines. Then, it processes the remaining lines from top to bottom one by one. Our method decreases the average number of times that must be checked to process a foreground pixel will; thus, the efficiency of labeling can be improved.
A Fast Multi-Object Extraction Algorithm Based on Cell-Based Connected Components Labeling
Qingyi GU Takeshi TAKAKI Idaku ISHII

PAPER-Image Recognition, Computer Vision

Vol:
E95-D No:2
Page(s):
636-645
We describe a cell-based connected component labeling algorithm to calculate the 0th and 1st moment features as the attributes for labeled regions. These can be used to indicate their sizes and positions for multi-object extraction. Based on the additivity in moment features, the cell-based labeling algorithm can label divided cells of a certain size in an image by scanning the image only once to obtain the moment features of the labeled regions with remarkably reduced computational complexity and memory consumption for labeling. Our algorithm is a simple-one-time-scan cell-based labeling algorithm, which is suitable for hardware and parallel implementation. We also compared it with conventional labeling algorithms. The experimental results showed that our algorithm is faster than conventional raster-scan labeling algorithms.
Cayley Graph Representation and Graph Product Representation of Hypercubes
Miya MOROTA Ryoichi HATAYAMA Yukio SHIBATA

PAPER-Graphs and Networks

Vol:
E94-A No:3
Page(s):
946-954
Hypercube Qn is a well-known graph structure having three different kinds of equivalent definitions that are: 1. binary n bit sequences with the adjacency condition, 2. Q1=K2, Qn=Qn-1 K2, where means the Cartesian product, 3. the Cayley graph on Z2n with the generator set {100, 0100, , 001}. We give a necessary and sufficient condition for a set of binary sequences to be a generator set for the hypercube. Then, we give relations between some generator sets and relational products. These results show the wide variety of representability of hypercubes which would be used for many applications.
Decomposition Optimization for Minimizing Label Overflow in Prime Number Graph Labeling
Jaehoon KIM Seog PARK

PAPER-Dependable Computing

Vol:
E93-D No:7
Page(s):
1889-1899
Recently, a graph labeling technique based on prime numbers has been suggested for reducing the costly transitive closure computations in RDF query languages. The suggested prime number graph labeling provides the benefit of fast query processing by a simple divisibility test of labels. However, it has an inherent problem that originates with the nature of prime numbers. Since each prime number must be used exclusively, labels can become significantly large. Therefore, in this paper, we introduce a novel optimization technique to effectively reduce the problem of label overflow. The suggested idea is based on graph decomposition. When label overflow occurs, the full graph is divided into several sub-graphs, and nodes in each sub-graph are separately labeled. Through experiments, we also analyze the effectiveness of the graph decomposition optimization, which is evaluated by the number of divisions.
Improved Sequential Dependency Analysis Integrating Labeling-Based Sentence Boundary Detection
Takanobu OBA Takaaki HORI Atsushi NAKAMURA

PAPER-Natural Language Processing

Vol:
E93-D No:5
Page(s):
1272-1281
A dependency structure interprets modification relationships between words or phrases and is recognized as an important element in semantic information analysis. With the conventional approaches for extracting this dependency structure, it is assumed that the complete sentence is known before the analysis starts. For spontaneous speech data, however, this assumption is not necessarily correct since sentence boundaries are not marked in the data. Although sentence boundaries can be detected before dependency analysis, this cascaded implementation is not suitable for online processing since it delays the responses of the application. To solve these problems, we proposed a sequential dependency analysis (SDA) method for online spontaneous speech processing, which enabled us to analyze incomplete sentences sequentially and detect sentence boundaries simultaneously. In this paper, we propose an improved SDA integrating a labeling-based sentence boundary detection (SntBD) technique based on Conditional Random Fields (CRFs). In the new method, we use CRF for soft decision of sentence boundaries and combine it with SDA to retain its online framework. Since CRF-based SntBD yields better estimates of sentence boundaries, SDA can provide better results in which the dependency structure and sentence boundaries are consistent. Experimental results using spontaneous lecture speech from the Corpus of Spontaneous Japanese show that our improved SDA outperforms the original SDA with SntBD accuracy providing better dependency analysis results.
Incorporating Frame Information to Semantic Role Labeling
Joo-Young LEE Young-In SONG Hae-Chang RIM Kyoung-Soo HAN

LETTER-Natural Language Processing

Vol:
E93-D No:1
Page(s):
201-204
In this paper, we suggest a new probabilistic model of semantic role labeling, which uses the frameset of the predicate as explicit linguistic knowledge for providing global information on the predicate-argument structure that local classifier is unable to catch. The proposed model consists of three sub-models: role sequence generation model, frameset generation model, and matching model. The role sequence generation model generates the semantic role sequence candidates of a given predicate by using the local classification approach, which is a widely used approach in previous research. The frameset generation model estimates the probability of each frameset that the predicate can take. The matching model is designed to measure the degree of the matching between the generated role sequence and the frameset by using several features. These features are developed to represent the predicate-argument structure information described in the frameset. In the experiments, our model shows that the use of knowledge about the predicate-argument structure is effective for selecting a more appropriate semantic role sequence.

1-20hit(30hit)

Keyword Search Result

[Keyword] labeling(30hit)

Joint Domain Adaption and Pseudo-Labeling for Cross-Project Defect Prediction

Automatic Drawing of Complex Metro Maps

Extracting Knowledge Entities from Sci-Tech Intelligence Resources Based on BiLSTM and Conditional Random Field

Partition-then-Overlap Method for Labeling Cyber Threat Intelligence Reports by Topics over Time

Selective Pseudo-Labeling Based Subspace Learning for Cross-Project Defect Prediction

Exploration into Gray Area: Toward Efficient Labeling for Detecting Malicious Domain Names

Rule-Based Automatic Question Generation Using Semantic Role Labeling Open Access

Fast Lane Detection Based on Deep Convolutional Neural Network and Automatic Training Data Labeling

An Efficient Concept Drift Detection Method for Streaming Data under Limited Labeling

A New Connected-Component Labeling Algorithm

An Efficient Two-Scan Labeling Algorithm for Binary Hexagonal Images

Partial Volume Correction on ASL-MRI and Its Application on Alzheimer's Disease Diagnosis

Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs

Online Learned Player Recognition Model Based Soccer Player Tracking and Labeling for Long-Shot Scenes

A New First-Scan Method for Two-Scan Labeling Algorithms

A Fast Multi-Object Extraction Algorithm Based on Cell-Based Connected Components Labeling

Cayley Graph Representation and Graph Product Representation of Hypercubes

Decomposition Optimization for Minimizing Label Overflow in Prime Number Graph Labeling

Improved Sequential Dependency Analysis Integrating Labeling-Based Sentence Boundary Detection

Incorporating Frame Information to Semantic Role Labeling

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles