The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] OWL(221hit)

1-20hit(221hit)

  • Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution Open Access

    Yuka KO  Katsuhito SUDOH  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2024/05/24
      Vol:
    E107-D No:10
      Page(s):
    1322-1331

    End-to-end speech translation (ST) directly renders source language speech to the target language without intermediate automatic speech recognition (ASR) output as in a cascade approach. End-to-end ST avoids error propagation from intermediate ASR results. Although recent attempts have applied multi-task learning using an auxiliary task of ASR to improve ST performance, they use cross-entropy loss to one-hot references in the ASR task, and the trained ST models do not consider possible ASR confusion. In this study, we propose a novel multi-task learning framework for end-to-end STs leveraged by ASR-based loss against posterior distributions obtained using a pre-trained ASR model called ASR posterior-based loss (ASR-PBL). The ASR-PBL method, which enables a ST model to reflect possible ASR confusion among competing hypotheses with similar pronunciations, can be applied to one of the strong multi-task ST baseline models with Hybrid CTC/Attention ASR task loss. In our experiments on the Fisher Spanish-to-English corpus, the proposed method demonstrated better BLEU results than the baseline that used standard CE loss.

  • Type-Enhanced Ensemble Triple Representation via Triple-Aware Attention for Cross-Lingual Entity Alignment Open Access

    Zhishuo ZHANG  Chengxiang TAN  Xueyan ZHAO  Min YANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/05/22
      Vol:
    E107-D No:9
      Page(s):
    1182-1191

    Entity alignment (EA) is a crucial task for integrating cross-lingual and cross-domain knowledge graphs (KGs), which aims to discover entities referring to the same real-world object from different KGs. Most existing embedding-based methods generate aligning entity representation by mining the relevance of triple elements, paying little attention to triple indivisibility and entity role diversity. In this paper, a novel framework named TTEA - Type-enhanced Ensemble Triple Representation via Triple-aware Attention for Cross-lingual Entity Alignment is proposed to overcome the above shortcomings from the perspective of ensemble triple representation considering triple specificity and diversity features of entity role. Specifically, the ensemble triple representation is derived by regarding relation as information carrier between semantic and type spaces, and hence the noise influence during spatial transformation and information propagation can be smoothly controlled via specificity-aware triple attention. Moreover, the role diversity of triple elements is modeled via triple-aware entity enhancement in TTEA for EA-oriented entity representation. Extensive experiments on three real-world cross-lingual datasets demonstrate that our framework makes comparative results.

  • Chinese Spelling Correction Based on Knowledge Enhancement and Contrastive Learning Open Access

    Hao WANG  Yao MA  Jianyong DUAN  Li HE  Xin LI  

     
    PAPER-Natural Language Processing

      Pubricized:
    2024/05/17
      Vol:
    E107-D No:9
      Page(s):
    1264-1273

    Chinese Spelling Correction (CSC) is an important natural language processing task. Existing methods for CSC mostly utilize BERT models, which select a character from a candidate list to correct errors in the sentence. World knowledge refers to structured information and relationships spanning a wide range of domains and subjects, while definition knowledge pertains to textual explanations or descriptions of specific words or concepts. Both forms of knowledge have the potential to enhance a model’s ability to comprehend contextual nuances. As BERT lacks sufficient guidance from world knowledge for error correction and existing models overlook the rich definition knowledge in Chinese dictionaries, the performance of spelling correction models is somewhat compromised. To address these issues, within the world knowledge network, this study injects world knowledge from knowledge graphs into the model to assist in correcting spelling errors caused by a lack of world knowledge. Additionally, the definition knowledge network in this model improves the error correction capability by utilizing the definitions from the Chinese dictionary through a comparative learning approach. Experimental results on the SIGHAN benchmark dataset validate the effectiveness of our approach.

  • Improving Sliced Wasserstein Distance with Geometric Median for Knowledge Distillation Open Access

    Hongyun LU  Mengmeng ZHANG  Hongyuan JING  Zhi LIU  

     
    LETTER-Fundamentals of Information Systems

      Pubricized:
    2024/03/08
      Vol:
    E107-D No:7
      Page(s):
    890-893

    Currently, the most advanced knowledge distillation models use a metric learning approach based on probability distributions. However, the correlation between supervised probability distributions is typically geometric and implicit, causing inefficiency and an inability to capture structural feature representations among different tasks. To overcome this problem, we propose a knowledge distillation loss using the robust sliced Wasserstein distance with geometric median (GMSW) to estimate the differences between the teacher and student representations. Due to the intuitive geometric properties of GMSW, the student model can effectively learn to align its produced hidden states from the teacher model, thereby establishing a robust correlation among implicit features. In experiment, our method outperforms state-of-the-art models in both high-resource and low-resource settings.

  • Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing Open Access

    Kensuke SUMOTO  Kenta KANAKOGI  Hironori WASHIZAKI  Naohiko TSUDA  Nobukazu YOSHIOKA  Yoshiaki FUKAZAWA  Hideyuki KANUKA  

     
    PAPER

      Pubricized:
    2024/02/09
      Vol:
    E107-D No:5
      Page(s):
    674-682

    Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.

  • Conceptual Knowledge Enhanced Model for Multi-Intent Detection and Slot Filling Open Access

    Li HE  Jingxuan ZHAO  Jianyong DUAN  Hao WANG  Xin LI  

     
    PAPER

      Pubricized:
    2023/10/25
      Vol:
    E107-D No:4
      Page(s):
    468-476

    In Natural Language Understanding, intent detection and slot filling have been widely used to understand user queries. However, current methods tend to rely on single words and sentences to understand complex semantic concepts, and can only consider local information within the sentence. Therefore, they usually cannot capture long-distance dependencies well and are prone to problems where complex intentions in sentences are difficult to recognize. In order to solve the problem of long-distance dependency of the model, this paper uses ConceptNet as an external knowledge source and introduces its extensive semantic information into the multi-intent detection and slot filling model. Specifically, for a certain sentence, based on confidence scores and semantic relationships, the most relevant conceptual knowledge is selected to equip the sentence, and a concept context map with rich information is constructed. Then, the multi-head graph attention mechanism is used to strengthen context correlation and improve the semantic understanding ability of the model. The experimental results indicate that the model has significantly improved performance compared to other models on the MixATIS and MixSNIPS multi-intent datasets.

  • Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology

    Wenkai LIU  Lin ZHANG  Menglong WU  Xichang CAI  Hongxia DONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/10/23
      Vol:
    E107-D No:1
      Page(s):
    83-92

    The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.

  • Local-to-Global Structure-Aware Transformer for Question Answering over Structured Knowledge

    Yingyao WANG  Han WANG  Chaoqun DUAN  Tiejun ZHAO  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/06/27
      Vol:
    E106-D No:10
      Page(s):
    1705-1714

    Question-answering tasks over structured knowledge (i.e., tables and graphs) require the ability to encode structural information. Traditional pre-trained language models trained on linear-chain natural language cannot be directly applied to encode tables and graphs. The existing methods adopt the pre-trained models in such tasks by flattening structured knowledge into sequences. However, the serialization operation will lead to the loss of the structural information of knowledge. To better employ pre-trained transformers for structured knowledge representation, we propose a novel structure-aware transformer (SATrans) that injects the local-to-global structural information of the knowledge into the mask of the different self-attention layers. Specifically, in the lower self-attention layers, SATrans focus on the local structural information of each knowledge token to learn a more robust representation of it. In the upper self-attention layers, SATrans further injects the global information of the structured knowledge to integrate the information among knowledge tokens. In this way, the SATrans can effectively learn the semantic representation and structural information from the knowledge sequence and the attention mask, respectively. We evaluate SATrans on the table fact verification task and the knowledge base question-answering task. Furthermore, we explore two methods to combine symbolic and linguistic reasoning for these tasks to solve the problem that the pre-trained models lack symbolic reasoning ability. The experiment results reveal that the methods consistently outperform strong baselines on the two benchmarks.

  • Distilling Distribution Knowledge in Normalizing Flow

    Jungwoo KWON  Gyeonghwan KIM  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/04/26
      Vol:
    E106-D No:8
      Page(s):
    1287-1291

    In this letter, we propose a feature-based knowledge distillation scheme which transfers knowledge between intermediate blocks of teacher and student with flow-based architecture, specifically Normalizing flow in our implementation. In addition to the knowledge transfer scheme, we examine how configuration of the distillation positions impacts on the knowledge transfer performance. To evaluate the proposed ideas, we choose two knowledge distillation baseline models which are based on Normalizing flow on different domains: CS-Flow for anomaly detection and SRFlow-DA for super-resolution. A set of performance comparison to the baseline models with popular benchmark datasets shows promising results along with improved inference speed. The comparison includes performance analysis based on various configurations of the distillation positions in the proposed scheme.

  • Persymmetric Structured Covariance Matrix Estimation Based on Whitening for Airborne STAP

    Quanxin MA  Xiaolin DU  Jianbo LI  Yang JING  Yuqing CHANG  

     
    LETTER-Digital Signal Processing

      Pubricized:
    2022/12/27
      Vol:
    E106-A No:7
      Page(s):
    1002-1006

    The estimation problem of structured clutter covariance matrix (CCM) in space-time adaptive processing (STAP) for airborne radar systems is studied in this letter. By employing the prior knowledge and the persymmetric covariance structure, a new estimation algorithm is proposed based on the whitening ability of the covariance matrix. The proposed algorithm is robust to prior knowledge of different accuracy, and can whiten the observed interference data to obtain the optimal solution. In addition, the extended factored approach (EFA) is used in the optimization for dimensionality reduction, which reduces the computational burden. Simulation results show that the proposed algorithm can effectively improve STAP performance even under the condition of some errors in prior knowledge.

  • ZGridBC: Zero-Knowledge Proof Based Scalable and Privacy-Enhanced Blockchain Platform for Electricity Tracking

    Takeshi MIYAMAE  Fumihiko KOZAKURA  Makoto NAKAMURA  Masanobu MORINAGA  

     
    PAPER-Information Network

      Pubricized:
    2023/04/14
      Vol:
    E106-D No:7
      Page(s):
    1219-1229

    The total number of solar power-producing facilities whose Feed-in Tariff (FIT) Program-based ten-year contracts will expire by 2023 is expected to reach approximately 1.65 million in Japan. If the facilities that produce or consume renewable energy would increase to reach a large number, e.g., two million, blockchain would not be capable of processing all the transactions. In this work, we propose a blockchain-based electricity-tracking platform for renewable energy, called ‘ZGridBC,’ which consists of mutually cooperative two novel decentralized schemes to solve scalability, storage cost, and privacy issues at the same time. One is the electricity production resource management, which is an efficient data management scheme that manages electricity production resources (EPRs) on the blockchain by using UTXO tokens extended to two-dimension (period and electricity amount) to prevent double-spending. The other is the electricity-tracking proof, which is a massive data aggregation scheme that significantly reduces the amount of data managed on the blockchain by using zero-knowledge proof (ZKP). Thereafter, we illustrate the architecture of ZGridBC, consider its scalability, security, and privacy, and illustrate the implementation of ZGridBC. Finally, we evaluate the scalability of ZGridBC, which handles two million electricity facilities with far less cost per environmental value compared with the price of the environmental value proposed by METI (=0.3 yen/kWh).

  • Chinese Named Entity Recognition Method Based on Dictionary Semantic Knowledge Enhancement

    Tianbin WANG  Ruiyang HUANG  Nan HU  Huansha WANG  Guanghan CHU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1010-1017

    Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.

  • Secure Revocation Features in eKYC - Privacy Protection in Central Bank Digital Currency

    Kazuo TAKARAGI  Takashi KUBOTA  Sven WOHLGEMUTH  Katsuyuki UMEZAWA  Hiroki KOYANAGI  

     
    PAPER

      Pubricized:
    2022/10/07
      Vol:
    E106-A No:3
      Page(s):
    325-332

    Central bank digital currencies require the implementation of eKYC to verify whether a trading customer is eligible online. When an organization issues an ID proof of a customer for eKYC, that proof is usually achieved in practice by a hierarchy of issuers. However, the customer wants to disclose only part of the issuer's chain and documents to the trading partner due to privacy concerns. In this research, delegatable anonymous credential (DAC) and zero-knowledge range proof (ZKRP) allow customers to arbitrarily change parts of the delegation chain and message body to range proofs expressed in inequalities. That way, customers can protect the privacy they need with their own control. Zero-knowledge proof is applied to prove the inequality between two time stamps by the time stamp server (signature presentation, public key revocation, or non-revocation) without disclosing the signature content and stamped time. It makes it possible to prove that the registration information of the national ID card is valid or invalid while keeping the user's personal information anonymous. This research aims to contribute to the realization of a sustainable financial system based on self-sovereign identity management with privacy-enhanced PKI.

  • A CFAR Detection Algorithm Based on Clutter Knowledge for Cognitive Radar

    Kaixuan LIU  Yue LI  Peng WANG  Xiaoyan PENG  Hongshu LIAO  Wanchun LI  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2022/09/13
      Vol:
    E106-A No:3
      Page(s):
    590-599

    Under the background of non-homogenous and dynamic time-varying clutter, the processing ability of the traditional constant false alarm rate (CFAR) detection algorithm is significantly reduced, as well as the detection performance. This paper proposes a CFAR detection algorithm based on clutter knowledge (CK-CFAR), as a new CFAR, to improve the detection performance adaptability of the radar in complex clutter background. With the acquired clutter prior knowledge, the algorithm can dynamically select parameters according to the change of background clutter and calculate the threshold. Compared with the detection algorithms such as CA-CFAR, GO-CFAR, SO-CFAR, and OS-CFAR, the simulation results show that CK-CFAR has excellent detection performance in the background of homogenous clutter and edge clutter. This algorithm can help radar adapt to the clutter with different distribution characteristics, effectively enhance radar detection in a complex environment. It is more in line with the development direction of the cognitive radar.

  • Chinese Lexical Sememe Prediction Using CilinE Knowledge

    Hao WANG  Sirui LIU  Jianyong DUAN  Li HE  Xin LI  

     
    PAPER-Language, Thought, Knowledge and Intelligence

      Pubricized:
    2022/08/18
      Vol:
    E106-A No:2
      Page(s):
    146-153

    Sememes are the smallest semantic units of human languages, the composition of which can represent the meaning of words. Sememes have been successfully applied to many downstream applications in natural language processing (NLP) field. Annotation of a word's sememes depends on language experts, which is both time-consuming and labor-consuming, limiting the large-scale application of sememe. Researchers have proposed some sememe prediction methods to automatically predict sememes for words. However, existing sememe prediction methods focus on information of the word itself, ignoring the expert-annotated knowledge bases which indicate the relations between words and should value in sememe predication. Therefore, we aim at incorporating the expert-annotated knowledge bases into sememe prediction process. To achieve that, we propose a CilinE-guided sememe prediction model which employs an existing word knowledge base CilinE to remodel the sememe prediction from relational perspective. Experiments on HowNet, a widely used Chinese sememe knowledge base, have shown that CilinE has an obvious positive effect on sememe prediction. Furthermore, our proposed method can be integrated into existing methods and significantly improves the prediction performance. We will release the data and code to the public.

  • A Hierarchical Memory Model for Task-Oriented Dialogue System

    Ya ZENG  Li WAN  Qiuhong LUO  Mao CHEN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/05/16
      Vol:
    E105-D No:8
      Page(s):
    1481-1489

    Traditional pipeline methods for task-oriented dialogue systems are designed individually and expensively. Existing memory augmented end-to-end methods directly map the inputs to outputs and achieve promising results. However, the most existing end-to-end solutions store the dialogue history and knowledge base (KB) information in the same memory and represent KB information in the form of KB triples, making the memory reader's reasoning on the memory more difficult, which makes the system difficult to retrieve the correct information from the memory to generate a response. Some methods introduce many manual annotations to strengthen reasoning. To reduce the use of manual annotations, while strengthening reasoning, we propose a hierarchical memory model (HM2Seq) for task-oriented systems. HM2Seq uses a hierarchical memory to separate the dialogue history and KB information into two memories and stores KB in KB rows, then we use memory rows pointer combined with an entity decoder to perform hierarchical reasoning over memory. The experimental results on two publicly available task-oriented dialogue datasets confirm our hypothesis and show the outstanding performance of our HM2Seq by outperforming the baselines.

  • Predicting A Growing Stage of Rice Plants Based on The Cropping Records over 25 Years — A Trial of Feature Engineering Incorporating Hidden Regional Characteristics —

    Hiroshi UEHARA  Yasuhiro IUCHI  Yusuke FUKAZAWA  Yoshihiro KANETA  

     
    PAPER

      Pubricized:
    2021/12/29
      Vol:
    E105-D No:5
      Page(s):
    955-963

    This study tries to predict date of ear emergence of rice plants, based on cropping records over 25 years. Predicting ear emergence of rice plants is known to be crucial for practicing good harvesting quality, and has long been dependent upon old farmers who acquire skills of intuitive prediction based on their long term experiences. Facing with aging farmers, data driven approach for the prediction have been pursued. Nevertheless, they are not necessarily sufficient in terms of practical use. One of the issue is to adopt weather forecast as the feature so that the predictive performance is varied by the accuracy of the forecast. The other issue is that the performance is varied by region and the regional characteristics have not been used as the features for the prediction. With this background, we propose a feature engineering to quantify hidden regional characteristics as the feature for the prediction. Further the feature is engineered based only on observational data without any forecast. Applying our proposal to the data on the cropping records resulted in sufficient predictive performance, ±2.69days of RMSE.

  • MKGN: A Multi-Dimensional Knowledge Enhanced Graph Network for Multi-Hop Question and Answering

    Ying ZHANG  Fandong MENG  Jinchao ZHANG  Yufeng CHEN  Jinan XU  Jie ZHOU  

     
    PAPER-Natural Language Processing

      Pubricized:
    2021/12/29
      Vol:
    E105-D No:4
      Page(s):
    807-819

    Machine reading comprehension with multi-hop reasoning always suffers from reasoning path breaking due to the lack of world knowledge, which always results in wrong answer detection. In this paper, we analyze what knowledge the previous work lacks, e.g., dependency relations and commonsense. Based on our analysis, we propose a Multi-dimensional Knowledge enhanced Graph Network, named MKGN, which exploits specific knowledge to repair the knowledge gap in reasoning process. Specifically, our approach incorporates not only entities and dependency relations through various graph neural networks, but also commonsense knowledge by a bidirectional attention mechanism, which aims to enhance representations of both question and contexts. Besides, to make the most of multi-dimensional knowledge, we investigate two kinds of fusion architectures, i.e., in the sequential and parallel manner. Experimental results on HotpotQA dataset demonstrate the effectiveness of our approach and verify that using multi-dimensional knowledge, especially dependency relations and commonsense, can indeed improve the reasoning process and contribute to correct answer detection.

  • Efficient Zero-Knowledge Proofs of Graph Signature for Connectivity and Isolation Using Bilinear-Map Accumulator

    Toru NAKANISHI  Hiromi YOSHINO  Tomoki MURAKAMI  Guru-Vamsi POLICHARLA  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2021/09/08
      Vol:
    E105-A No:3
      Page(s):
    389-403

    To prove the graph relations such as the connectivity and isolation for a certified graph, a system of a graph signature and proofs has been proposed. In this system, an issuer generates a signature certifying the topology of an undirected graph, and issues the signature to a prover. The prover can prove the knowledge of the signature and the graph in the zero-knowledge, i.e., the signature and the signed graph are hidden. In addition, the prover can prove relations on the certified graph such as the connectivity and isolation between two vertexes. In the previous system, using integer commitments on RSA modulus, the graph relations are proved. However, the RSA modulus needs a longer size for each element. Furthermore, the proof size and verification cost depend on the total numbers of vertexes and edges. In this paper, we propose a graph signature and proof system, where these are computed on bilinear groups without the RSA modulus. Moreover, using a bilinear map accumulator, the prover can prove the connectivity and isolation on a graph, where the proof size and verification cost become independent from the total numbers of vertexes and edges.

  • Competent Triple Identification for Knowledge Graph Completion under the Open-World Assumption

    Esrat FARJANA  Natthawut KERTKEIDKACHORN  Ryutaro ICHISE  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2021/12/02
      Vol:
    E105-D No:3
      Page(s):
    646-655

    The usefulness and usability of existing knowledge graphs (KGs) are mostly limited because of the incompleteness of knowledge compared to the growing number of facts about the real world. Most existing ontology-based KG completion methods are based on the closed-world assumption, where KGs are fixed. In these methods, entities and relations are defined, and new entity information cannot be easily added. In contrast, in open-world assumptions, entities and relations are not previously defined. Thus there is a vast scope to find new entity information. Despite this, knowledge acquisition under the open-world assumption is challenging because most available knowledge is in a noisy unstructured text format. Nevertheless, Open Information Extraction (OpenIE) systems can extract triples, namely (head text; relation text; tail text), from raw text without any prespecified vocabulary. Such triples contain noisy information that is not essential for KGs. Therefore, to use such triples for the KG completion task, it is necessary to identify competent triples for KGs from the extracted triple set. Here, competent triples are the triples that can contribute to add new information to the existing KGs. In this paper, we propose the Competent Triple Identification (CTID) model for KGs. We also propose two types of feature, namely syntax- and semantic-based features, to identify competent triples from a triple set extracted by a state-of-the-art OpenIE system. We investigate both types of feature and test their effectiveness. It is found that the performance of the proposed features is about 20% better compared to that of the ReVerb system in identifying competent triples.

1-20hit(221hit)