The search functionality is under construction.

Keyword Search Result

[Keyword] semantic similarity(9hit)

1-9hit
  • Measuring Semantic Similarity between Words Based on Multiple Relational Information

    Jianyong DUAN  Yuwei WU  Mingli WU  Hao WANG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2019/09/27
      Vol:
    E103-D No:1
      Page(s):
    163-169

    The similarity of words extracted from the rich text relation network is the main way to calculate the semantic similarity. Complex relational information and text content in Wikipedia website, Community Question Answering and social network, provide abundant corpus for semantic similarity calculation. However, most typical research only focused on single relationship. In this paper, we propose a semantic similarity calculation model which integrates multiple relational information, and map multiple relationship to the same semantic space through learning representing matrix and semantic matrix to improve the accuracy of semantic similarity calculation. In experiments, we confirm that the semantic calculation method which integrates many kinds of relationships can improve the accuracy of semantic calculation, compared with other semantic calculation methods.

  • Autonomous, Decentralized and Privacy-Enabled Data Preparation for Evidence-Based Medicine with Brain Aneurysm as a Phenotype

    Khalid Mahmood MALIK  Hisham KANAAN  Vian SABEEH  Ghaus MALIK  

     
    PAPER

      Pubricized:
    2018/02/22
      Vol:
    E101-B No:8
      Page(s):
    1787-1797

    To enable the vision of precision medicine, evidence-based medicine is the key element. Understanding the natural history of complex diseases like brain aneurysm and particularly investigating the evidences of its rupture risk factors relies on the existence of semantic-enabled data preparation technology to conduct clinical trials, survival analysis and outcome prediction. For personalized medicine in the field of neurological diseases, it is very important that multiple health organizations coordinate and cooperate to conduct evidence based observational studies. Without the means of automating the process of privacy and semantic-enabled data preparation to conduct observational studies at intra-organizational level would require months to manually prepare the data. Therefore, this paper proposes a semantic and privacy enabled, multi-party data preparation architecture and a four-tiered semantic similarity algorithm. Evaluation shows that proposed algorithm achieves a precision of 79%, high recall at 83% and F-measure of 81%.

  • Sentence Similarity Computational Model Based on Information Content

    Hao WU  Heyan HUANG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2016/03/14
      Vol:
    E99-D No:6
      Page(s):
    1645-1652

    Sentence similarity computation is an increasingly important task in applications of natural language processing such as information retrieval, machine translation, text summarization and so on. From the viewpoint of information theory, the essential attribute of natural language is that the carrier of information and the capacity of information can be measured by information content which is already successfully used for word similarity computation in simple ways. Existing sentence similarity methods don't emphasize the information contained by the sentence, and the complicated models they employ often need using empirical parameters or training parameters. This paper presents a fully unsupervised computational model of sentence semantic similarity. It is also a simply and straightforward model that neither needs any empirical parameter nor rely on other NLP tools. The method can obtain state-of-the-art experimental results which show that sentence similarity evaluated by the model is closer to human judgment than multiple competing baselines. The paper also tests the proposed model on the influence of external corpus, the performance of various sizes of the semantic net, and the relationship between efficiency and accuracy.

  • Enriching Semantic Knowledge for WSD

    Junpeng CHEN  Wei YU  

     
    LETTER-Natural Language Processing

      Vol:
    E97-D No:8
      Page(s):
    2212-2216

    In our previous work, we proposed to combine ConceptNet and WordNet for Word Sense Disambiguation (WSD). The ConceptNet was automatically disambiguated through Normalized Google Distance (NGD) similarity. In this letter, we present several techniques to enhance the performance of the ConceptNet disambiguation and use this enriched semantic knowledge in WSD task. We propose to enrich both the WordNet semantic knowledge and NGD to disambiguate the concepts in ConceptNet. Furthermore, we apply the enriched semantic knowledge to improve the performance of WSD. From a number of experiments, the proposed method has been obtained enhanced results.

  • Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System

    Lasguido NIO  Sakriani SAKTI  Graham NEUBIG  Tomoki TODA  Satoshi NAKAMURA  

     
    PAPER-Dialog System

      Vol:
    E97-D No:6
      Page(s):
    1497-1505

    This paper describes the design and evaluation of a method for developing a chat-oriented dialog system by utilizing real human-to-human conversation examples from movie scripts and Twitter conversations. The aim of the proposed method is to build a conversational agent that can interact with users in as natural a fashion as possible, while reducing the time requirement for database design and collection. A number of the challenging design issues we faced are described, including (1) constructing an appropriate dialog corpora from raw movie scripts and Twitter data, and (2) developing an multi domain chat-oriented dialog management system which can retrieve a proper system response based on the current user query. To build a dialog corpus, we propose a unit of conversation called a tri-turn (a trigram conversation turn), as well as extraction and semantic similarity analysis techniques to help ensure that the content extracted from raw movie/drama script files forms appropriate dialog-pair (query-response) examples. The constructed dialog corpora are then utilized in a data-driven dialog management system. Here, various approaches are investigated including example-based (EBDM) and response generation using phrase-based statistical machine translation (SMT). In particular, we use two EBDM: syntactic-semantic similarity retrieval and TF-IDF based cosine similarity retrieval. Experiments are conducted to compare and contrast EBDM and SMT approaches in building a chat-oriented dialog system, and we investigate a combined method that addresses the advantages and disadvantages of both approaches. System performance was evaluated based on objective metrics (semantic similarity and cosine similarity) and human subjective evaluation from a small user study. Experimental results show that the proposed filtering approach effectively improve the performance. Furthermore, the results also show that by combing both EBDM and SMT approaches, we could overcome the shortcomings of each.

  • Automatic Vocabulary Adaptation Based on Semantic and Acoustic Similarities

    Shoko YAMAHATA  Yoshikazu YAMAGUCHI  Atsunori OGAWA  Hirokazu MASATAKI  Osamu YOSHIOKA  Satoshi TAKAHASHI  

     
    PAPER-Speech Recognition

      Vol:
    E97-D No:6
      Page(s):
    1488-1496

    Recognition errors caused by out-of-vocabulary (OOV) words lead critical problems when developing spoken language understanding systems based on automatic speech recognition technology. And automatic vocabulary adaptation is an essential technique to solve these problems. In this paper, we propose a novel and effective automatic vocabulary adaptation method. Our method selects OOV words from relevant documents using combined scores of semantic and acoustic similarities. Using this combined score that reflects both semantic and acoustic aspects, only necessary OOV words can be selected without registering redundant words. In addition, our method estimates probabilities of OOV words using semantic similarity and a class-based N-gram language model. These probabilities will be appropriate since they are estimated by considering both frequencies of OOV words in target speech data and the stable class N-gram probabilities. Experimental results show that our method improves OOV selection accuracy and recognition accuracy of newly registered words in comparison with conventional methods.

  • Toward Simulating the Human Way of Comparing Concepts

    Raul Ernesto MENENDEZ-MORA  Ryutaro ICHISE  

     
    PAPER-Data Engineering, Web Information Systems

      Vol:
    E94-D No:7
      Page(s):
    1419-1429

    An ability to assess similarity lies close to the core of cognition. Its understanding support the comprehension of human success in tasks like problem solving, categorization, memory retrieval, inductive reasoning, etc, and this is the main reason that it is a common research topic. In this paper, we introduce the idea of semantic differences and commonalities between words to the similarity computation process. Five new semantic similarity metrics are obtained after applying this scheme to traditional WordNet-based measures. We also combine the node based similarity measures with a corpus-independent way of computing the information content. In an experimental evaluation of our approach on two standard word pairs datasets, four of the measures outperformed their classical version, while the other performed as well as their unmodified counterparts.

  • A New Question Answering System for Chinese Restricted Domain

    Haiqing HU  Peilin JIANG  Fuji REN  Shingo KUROIWA  

     
    PAPER-Language

      Vol:
    E89-D No:6
      Page(s):
    1848-1859

    In this paper, we propose the construction of a web-based Question Answering (QA) system for restricted domain, which combines three resource information databases for the retrieval mechanism, including a Question&Answer database, a special domain documents database and the web resource retrieved by Google search engine. We describe a new retrieval technique of integrating a probabilistic technique based on OkapiBM25 and a semantic analysis which based on the ontology of HowNet knowledge base and a special domain HowNet created for the restricted domain. Furthermore, we provide a method of question expansion by computing word semantic similarity. The system is first developed for a middle-size domain of sightseeing information. The experiments proved the efficiency of our method for restricted domain and it is feasible to transfer to other domains expediently using the proposed method.

  • Personal Name Resolution Crossover Documents by a Semantics-Based Approach

    Xuan-Hieu PHAN  Le-Minh NGUYEN  Susumu HORIGUCHI  

     
    PAPER-Natural Language Processing

      Vol:
    E89-D No:2
      Page(s):
    825-836

    Cross-document personal name resolution is the process of identifying whether or not a common personal name mentioned in different documents refers to the same individual. Most previous approaches usually rely on lexical matching such as the occurrence of common words surrounding the entity name to measure the similarity between documents, and then clusters the documents according to their referents. In spite of certain successes, measuring similarity based on lexical comparison sometimes ignores important linguistic phenomena at the semantic level such as synonym or paraphrase. This paper presents a semantics-based approach to the resolution of personal name crossover documents that can make the most of both lexical evidences and semantic clues. In our method, the similarity values between documents are determined by estimating the semantic relatedness between words. Further, the semantic labels attached to sentences allow us to highlight the common personal facts that are potentially available among documents. An evaluation on three web datasets demonstrates that our method achieves the better performance than the previous work.