The search functionality is under construction.

Keyword Search Result

[Keyword] labeling(30hit)

1-20hit(30hit)

  • Joint Domain Adaption and Pseudo-Labeling for Cross-Project Defect Prediction

    Fei WU  Xinhao ZHENG  Ying SUN  Yang GAO  Xiao-Yuan JING  

     
    LETTER-Software Engineering

      Pubricized:
    2021/11/04
      Vol:
    E105-D No:2
      Page(s):
    432-435

    Cross-project defect prediction (CPDP) is a hot research topic in recent years. The inconsistent data distribution between source and target projects and lack of labels for most of target instances bring a challenge for defect prediction. Researchers have developed several CPDP methods. However, the prediction performance still needs to be improved. In this paper, we propose a novel approach called Joint Domain Adaption and Pseudo-Labeling (JDAPL). The network architecture consists of a feature mapping sub-network to map source and target instances into a common subspace, followed by a classification sub-network and an auxiliary classification sub-network. The classification sub-network makes use of the label information of labeled instances to generate pseudo-labels. The auxiliary classification sub-network learns to reduce the distribution difference and improve the accuracy of pseudo-labels for unlabeled instances through loss maximization. Network training is guided by the adversarial scheme. Extensive experiments are conducted on 10 projects of the AEEEM and NASA datasets, and the results indicate that our approach achieves better performance compared with the baselines.

  • Automatic Drawing of Complex Metro Maps

    Masahiro ONDA  Masaki MORIGUCHI  Keiko IMAI  

     
    PAPER-Graphs and Networks

      Pubricized:
    2021/03/08
      Vol:
    E104-A No:9
      Page(s):
    1150-1155

    The Tokyo subway is one of the most complex subway networks in the world and it is difficult to compute a visually readable metro map using existing layout methods. In this paper, we present a new method that can generate complex metro maps such as the Tokyo subway network. Our method consists of two phases. The first phase generates rough metro maps. It decomposes the metro networks into smaller subgraphs and partially generates rough metro maps. In the second phase, we use a local search technique to improve the aesthetic quality of the rough metro maps. The experimental results including the Tokyo metro map are shown.

  • Extracting Knowledge Entities from Sci-Tech Intelligence Resources Based on BiLSTM and Conditional Random Field

    Weizhi LIAO  Mingtong HUANG  Pan MA  Yu WANG  

     
    PAPER

      Pubricized:
    2021/04/22
      Vol:
    E104-D No:8
      Page(s):
    1214-1221

    There are many knowledge entities in sci-tech intelligence resources. Extracting these knowledge entities is of great importance for building knowledge networks, exploring the relationship between knowledge, and optimizing search engines. Many existing methods, which are mainly based on rules and traditional machine learning, require significant human involvement, but still suffer from unsatisfactory extraction accuracy. This paper proposes a novel approach for knowledge entity extraction based on BiLSTM and conditional random field (CRF).A BiLSTM neural network to obtain the context information of sentences, and CRF is then employed to integrate global label information to achieve optimal labels. This approach does not require the manual construction of features, and outperforms conventional methods. In the experiments presented in this paper, the titles and abstracts of 20,000 items in the existing sci-tech literature are processed, of which 50,243 items are used to build benchmark datasets. Based on these datasets, comparative experiments are conducted to evaluate the effectiveness of the proposed approach. Knowledge entities are extracted and corresponding knowledge networks are established with a further elaboration on the correlation of two different types of knowledge entities. The proposed research has the potential to improve the quality of sci-tech information services.

  • Partition-then-Overlap Method for Labeling Cyber Threat Intelligence Reports by Topics over Time

    Ryusei NAGASAWA  Keisuke FURUMOTO  Makoto TAKITA  Yoshiaki SHIRAISHI  Takeshi TAKAHASHI  Masami MOHRI  Yasuhiro TAKANO  Masakatu MORII  

     
    LETTER

      Pubricized:
    2021/02/24
      Vol:
    E104-D No:5
      Page(s):
    556-561

    The Topics over Time (TOT) model allows users to be aware of changes in certain topics over time. The proposed method inputs the divided dataset of security blog posts based on a fixed period using an overlap period to the TOT. The results suggest the extraction of topics that include malware and attack campaign names that are appropriate for the multi-labeling of cyber threat intelligence reports.

  • Selective Pseudo-Labeling Based Subspace Learning for Cross-Project Defect Prediction

    Ying SUN  Xiao-Yuan JING  Fei WU  Yanfei SUN  

     
    LETTER-Software Engineering

      Pubricized:
    2020/06/10
      Vol:
    E103-D No:9
      Page(s):
    2003-2006

    Cross-project defect prediction (CPDP) is a research hot recently, which utilizes the data form existing source project to construct prediction model and predicts the defect-prone of software instances from target project. However, it is challenging in bridging the distribution difference between different projects. To minimize the data distribution differences between different projects and predict unlabeled target instances, we present a novel approach called selective pseudo-labeling based subspace learning (SPSL). SPSL learns a common subspace by using both labeled source instances and pseudo-labeled target instances. The accuracy of pseudo-labeling is promoted by iterative selective pseudo-labeling strategy. The pseudo-labeled instances from target project are iteratively updated by selecting the instances with high confidence from two pseudo-labeling technologies. Experiments are conducted on AEEEM dataset and the results show that SPSL is effective for CPDP.

  • Exploration into Gray Area: Toward Efficient Labeling for Detecting Malicious Domain Names

    Naoki FUKUSHI  Daiki CHIBA  Mitsuaki AKIYAMA  Masato UCHIDA  

     
    PAPER

      Pubricized:
    2019/10/08
      Vol:
    E103-B No:4
      Page(s):
    375-388

    In this paper, we propose a method to reduce the labeling cost while acquiring training data for a malicious domain name detection system using supervised machine learning. In the conventional systems, to train a classifier with high classification accuracy, large quantities of benign and malicious domain names need to be prepared as training data. In general, malicious domain names are observed less frequently than benign domain names. Therefore, it is difficult to acquire a large number of malicious domain names without a dedicated labeling method. We propose a method based on active learning that labels data around the decision boundary of classification, i.e., in the gray area, and we show that the classification accuracy can be improved by using approximately 1% of the training data used by the conventional systems. Another disadvantage of the conventional system is that if the classifier is trained with a small amount of training data, its generalization ability cannot be guaranteed. We propose a method based on ensemble learning that integrates multiple classifiers, and we show that the classification accuracy can be stabilized and improved. The combination of the two methods proposed here allows us to develop a new system for malicious domain name detection with high classification accuracy and generalization ability by labeling a small amount of training data.

  • Rule-Based Automatic Question Generation Using Semantic Role Labeling Open Access

    Onur KEKLIK  Tugkan TUGLULAR  Selma TEKIR  

     
    PAPER-Natural Language Processing

      Pubricized:
    2019/04/01
      Vol:
    E102-D No:7
      Page(s):
    1362-1373

    This paper proposes a new rule-based approach to automatic question generation. The proposed approach focuses on analysis of both syntactic and semantic structure of a sentence. Although the primary objective of the designed system is question generation from sentences, automatic evaluation results shows that, it also achieves great performance on reading comprehension datasets, which focus on question generation from paragraphs. Especially, with respect to METEOR metric, the designed system significantly outperforms all other systems in automatic evaluation. As for human evaluation, the designed system exhibits similar performance by generating the most natural (human-like) questions.

  • Fast Lane Detection Based on Deep Convolutional Neural Network and Automatic Training Data Labeling

    Xun PAN  Harutoshi OGAI  

     
    PAPER-Image

      Vol:
    E102-A No:3
      Page(s):
    566-575

    Lane detection or road detection is one of the key features of autonomous driving. In computer vision area, it is still a very challenging target since there are various types of road scenarios which require a very high robustness of the algorithm. And considering the rather high speed of the vehicles, high efficiency is also a very important requirement for practicable application of autonomous driving. In this paper, we propose a deep convolution neural network based lane detection method, which consider the lane detection task as a pixel level segmentation of the lane markings. We also propose an automatic training data generating method, which can significantly reduce the effort of the training phase. Experiment proves that our method can achieve high accuracy for various road scenes in real-time.

  • An Efficient Concept Drift Detection Method for Streaming Data under Limited Labeling

    Youngin KIM  Cheong Hee PARK  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/06/26
      Vol:
    E100-D No:10
      Page(s):
    2537-2546

    In data stream analysis, detecting the concept drift accurately is important to maintain the classification performance. Most drift detection methods assume that the class labels become available immediately after a data sample arrives. However, it is unrealistic to attempt to acquire all of the labels when processing the data streams, as labeling costs are high and much time is needed. In this paper, we propose a concept drift detection method under the assumption that there is limited access or no access to class labels. The proposed method detects concept drift on unlabeled data streams based on the class label information which is predicted by a classifier or a virtual classifier. Experimental results on synthetic and real streaming data show that the proposed method is competent to detect the concept drift on unlabeled data stream.

  • A New Connected-Component Labeling Algorithm

    Xiao ZHAO  Lifeng HE  Bin YAO  Yuyan CHAO  

     
    LETTER-Pattern Recognition

      Pubricized:
    2015/08/05
      Vol:
    E98-D No:11
      Page(s):
    2013-2016

    This paper presents a new connected component labeling algorithm. The proposed algorithm scans image lines every three lines and processes pixels three by three. When processing the current three pixels, we also utilize the information obtained before to reduce the repeated work for checking pixels in the mask. Experimental results demonstrated that our method is more efficient than the fastest conventional labeling algorithm.

  • An Efficient Two-Scan Labeling Algorithm for Binary Hexagonal Images

    Lifeng HE  Xiao ZHAO  Bin YAO  Yun YANG  Yuyan CHAO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2014/08/27
      Vol:
    E97-D No:12
      Page(s):
    3244-3247

    This paper proposes an efficient two-scan labeling algorithm for binary hexagonal images. Unlike conventional labeling algorithms, which process pixels one by one in the first scan, our algorithm processes pixels two by two. We show that using our algorithm, we can check a smaller number of pixels. Experimental results demonstrated that our method is more efficient than the algorithm extended straightly from the corresponding labeling algorithm for rectangle binary images.

  • Partial Volume Correction on ASL-MRI and Its Application on Alzheimer's Disease Diagnosis

    Wenji YANG  Wei HUANG  Shanxue CHEN  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E97-D No:11
      Page(s):
    2912-2918

    Arterial spin labeling (ASL) is a non-invasive magnetic resonance imaging (MRI) method that can provide direct and quantitative measurements of cerebral blood flow (CBF) of scanned patients. ASL can be utilized as an imaging modality to detect Alzheimer's disease (AD), as brain atrophy of AD patients can be revealed by low CBF values in certain brain regions. However, partial volume effects (PVE), which is mainly caused by signal cross-contamination due to voxel heterogeneity and limited spatial resolution of ASL images, often prevents CBF in ASL from being precisely measured. In this study, a novel PVE correction method is proposed based on pixel-wise voxels in ASL images; it can well handle with the existing problems of blurring and loss of brain details in conventional PVE correction methods. Dozens of comparison experiments and statistical analysis also suggest that the proposed method is superior to other PVE correction methods in AD diagnosis based on real patients data.

  • Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs

    Chen-Yu YANG  Zhen-Hua LING  Li-Rong DAI  

     
    PAPER-Speech Synthesis and Related Topics

      Vol:
    E97-D No:6
      Page(s):
    1449-1460

    In this paper, an automatic and unsupervised method using context-dependent hidden Markov models (CD-HMMs) is proposed for the prosodic labeling of speech synthesis databases. This method consists of three main steps, i.e., initialization, model training and prosodic labeling. The initial prosodic labels are obtained by unsupervised clustering using the acoustic features designed according to the characteristics of the prosodic descriptor to be labeled. Then, CD-HMMs of the spectral parameters, F0s and phone durations are estimated by a means similar to the HMM-based parametric speech synthesis using the initial prosodic labels. These labels are further updated by Viterbi decoding under the maximum likelihood criterion given the acoustic feature sequences and the trained CD-HMMs. The model training and prosodic labeling procedures are conducted iteratively until convergence. The performance of the proposed method is evaluated on Mandarin speech synthesis databases and two prosodic descriptors are investigated, i.e., the prosodic phrase boundary and the emphasis expression. In our implementation, the prosodic phrase boundary labels are initialized by clustering the durations of the pauses between every two consecutive prosodic words, and the emphasis expression labels are initialized by examining the differences between the original and the synthetic F0 trajectories. Experimental results show that the proposed method is able to label the prosodic phrase boundary positions much more accurately than the text-analysis-based method without requiring any manually labeled training data. The unit selection speech synthesis system constructed using the prosodic phrase boundary labels generated by our proposed method achieves similar performance to that using the manual labels. Furthermore, the unit selection speech synthesis system constructed using the emphasis expression labels generated by our proposed method can convey the emphasis information effectively while maintaining the naturalness of synthetic speech.

  • Online Learned Player Recognition Model Based Soccer Player Tracking and Labeling for Long-Shot Scenes

    Weicun XU  Qingjie ZHAO  Yuxia WANG  Xuanya LI  

     
    PAPER-Pattern Recognition

      Vol:
    E97-D No:1
      Page(s):
    119-129

    Soccer player tracking and labeling suffer from the similar appearance of the players in the same team, especially in long-shot scenes where the faces and the numbers of the players are too blurry to identify. In this paper, we propose an efficient multi-player tracking system. The tracking system takes the detection responses of a human detector as inputs. To realize real-time player detection, we generate a spatial proposal to minimize the scanning scope of the detector. The tracking system utilizes the discriminative appearance models trained using the online Boosting method to reduce data-association ambiguity caused by the appearance similarity of the players. We also propose to build an online learned player recognition model which can be embedded in the tracking system to approach online player recognition and labeling in tracking applications for long-shot scenes by two stages. At the first stage, to build the model, we utilize the fast k-means clustering method instead of classic k-means clustering to build and update a visual word vocabulary in an efficient online manner, using the informative descriptors extracted from the training samples drawn at each time step of multi-player tracking. The first stage finishes when the vocabulary is ready. At the second stage, given the obtained visual word vocabulary, an incremental vector quantization strategy is used to recognize and label each tracked player. We also perform importance recognition validation to avoid mistakenly recognizing an outlier, namely, people we do not need to recognize, as a player. Both quantitative and qualitative experimental results on the long-shot video clips of a real soccer game video demonstrate that, the proposed player recognition model performs much better than some state-of-the-art online learned models, and our tracking system also performs quite effectively even under very complicated situations.

  • A New First-Scan Method for Two-Scan Labeling Algorithms

    Lifeng HE  Yuyan CHAO  Kenji SUZUKI  

     
    LETTER-Pattern Recognition

      Vol:
    E95-D No:8
      Page(s):
    2142-2145

    This paper proposes a new first-scan method for two-scan labeling algorithms. In the first scan, our proposed method first scans every fourth image line, and processes the scan line and its two neighbor lines. Then, it processes the remaining lines from top to bottom one by one. Our method decreases the average number of times that must be checked to process a foreground pixel will; thus, the efficiency of labeling can be improved.

  • A Fast Multi-Object Extraction Algorithm Based on Cell-Based Connected Components Labeling

    Qingyi GU  Takeshi TAKAKI  Idaku ISHII  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E95-D No:2
      Page(s):
    636-645

    We describe a cell-based connected component labeling algorithm to calculate the 0th and 1st moment features as the attributes for labeled regions. These can be used to indicate their sizes and positions for multi-object extraction. Based on the additivity in moment features, the cell-based labeling algorithm can label divided cells of a certain size in an image by scanning the image only once to obtain the moment features of the labeled regions with remarkably reduced computational complexity and memory consumption for labeling. Our algorithm is a simple-one-time-scan cell-based labeling algorithm, which is suitable for hardware and parallel implementation. We also compared it with conventional labeling algorithms. The experimental results showed that our algorithm is faster than conventional raster-scan labeling algorithms.

  • Cayley Graph Representation and Graph Product Representation of Hypercubes

    Miya MOROTA  Ryoichi HATAYAMA  Yukio SHIBATA  

     
    PAPER-Graphs and Networks

      Vol:
    E94-A No:3
      Page(s):
    946-954

    Hypercube Qn is a well-known graph structure having three different kinds of equivalent definitions that are: 1. binary n bit sequences with the adjacency condition, 2. Q1=K2, Qn=Qn-1 K2, where means the Cartesian product, 3. the Cayley graph on Z2n with the generator set {100, 0100, , 001}. We give a necessary and sufficient condition for a set of binary sequences to be a generator set for the hypercube. Then, we give relations between some generator sets and relational products. These results show the wide variety of representability of hypercubes which would be used for many applications.

  • Decomposition Optimization for Minimizing Label Overflow in Prime Number Graph Labeling

    Jaehoon KIM  Seog PARK  

     
    PAPER-Dependable Computing

      Vol:
    E93-D No:7
      Page(s):
    1889-1899

    Recently, a graph labeling technique based on prime numbers has been suggested for reducing the costly transitive closure computations in RDF query languages. The suggested prime number graph labeling provides the benefit of fast query processing by a simple divisibility test of labels. However, it has an inherent problem that originates with the nature of prime numbers. Since each prime number must be used exclusively, labels can become significantly large. Therefore, in this paper, we introduce a novel optimization technique to effectively reduce the problem of label overflow. The suggested idea is based on graph decomposition. When label overflow occurs, the full graph is divided into several sub-graphs, and nodes in each sub-graph are separately labeled. Through experiments, we also analyze the effectiveness of the graph decomposition optimization, which is evaluated by the number of divisions.

  • Improved Sequential Dependency Analysis Integrating Labeling-Based Sentence Boundary Detection

    Takanobu OBA  Takaaki HORI  Atsushi NAKAMURA  

     
    PAPER-Natural Language Processing

      Vol:
    E93-D No:5
      Page(s):
    1272-1281

    A dependency structure interprets modification relationships between words or phrases and is recognized as an important element in semantic information analysis. With the conventional approaches for extracting this dependency structure, it is assumed that the complete sentence is known before the analysis starts. For spontaneous speech data, however, this assumption is not necessarily correct since sentence boundaries are not marked in the data. Although sentence boundaries can be detected before dependency analysis, this cascaded implementation is not suitable for online processing since it delays the responses of the application. To solve these problems, we proposed a sequential dependency analysis (SDA) method for online spontaneous speech processing, which enabled us to analyze incomplete sentences sequentially and detect sentence boundaries simultaneously. In this paper, we propose an improved SDA integrating a labeling-based sentence boundary detection (SntBD) technique based on Conditional Random Fields (CRFs). In the new method, we use CRF for soft decision of sentence boundaries and combine it with SDA to retain its online framework. Since CRF-based SntBD yields better estimates of sentence boundaries, SDA can provide better results in which the dependency structure and sentence boundaries are consistent. Experimental results using spontaneous lecture speech from the Corpus of Spontaneous Japanese show that our improved SDA outperforms the original SDA with SntBD accuracy providing better dependency analysis results.

  • Incorporating Frame Information to Semantic Role Labeling

    Joo-Young LEE  Young-In SONG  Hae-Chang RIM  Kyoung-Soo HAN  

     
    LETTER-Natural Language Processing

      Vol:
    E93-D No:1
      Page(s):
    201-204

    In this paper, we suggest a new probabilistic model of semantic role labeling, which uses the frameset of the predicate as explicit linguistic knowledge for providing global information on the predicate-argument structure that local classifier is unable to catch. The proposed model consists of three sub-models: role sequence generation model, frameset generation model, and matching model. The role sequence generation model generates the semantic role sequence candidates of a given predicate by using the local classification approach, which is a widely used approach in previous research. The frameset generation model estimates the probability of each frameset that the predicate can take. The matching model is designed to measure the degree of the matching between the generated role sequence and the frameset by using several features. These features are developed to represent the predicate-argument structure information described in the frameset. In the experiments, our model shows that the use of knowledge about the predicate-argument structure is effective for selecting a more appropriate semantic role sequence.

1-20hit(30hit)