The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] similarity measure(12hit)

1-12hit
  • An Application of Intuitionistic Fuzzy Sets to Improve Information Extraction from Thai Unstructured Text

    Peerasak INTARAPAIBOON  Thanaruk THEERAMUNKONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/05/23
      Vol:
    E101-D No:9
      Page(s):
    2334-2345

    Multi-slot information extraction, also known as frame extraction, is a task that identify several related entities simultaneously. Most researches on this task are concerned with applying IE patterns (rules) to extract related entities from unstructured documents. An important obstacle for the success in this task is unknowing where text portions containing interested information are. This problem is more complicated when involving languages with sentence boundary ambiguity, e.g. the Thai language. Applying IE rules to all reasonable text portions can degrade the effect of this obstacle, but it raises another problem that is incorrect (unwanted) extractions. This paper aims to present a method for removing these incorrect extractions. In the method, extractions are represented as intuitionistic fuzzy sets, and a similarity measure for IFSs is used to calculate distance between IFS of an unclassified extraction and that of each already-classified extraction. The concept of k nearest neighbor is adopted to design whether the unclassified extraction is correct or not. From the experiment on various domains, the proposed technique improves extraction precision while satisfactorily preserving recall.

  • Feature-Chain Based Malware Detection Using Multiple Sequence Alignment of API Call

    Hyun-Joo KIM  Jong-Hyun KIM  Jung-Tai KIM  Ik-Kyun KIM  Tai-Myung CHUNG  

     
    PAPER

      Pubricized:
    2016/01/28
      Vol:
    E99-D No:4
      Page(s):
    1071-1080

    The recent cyber-attacks utilize various malware as a means of attacks for the attacker's malicious purposes. They are aimed to steal confidential information or seize control over major facilities after infiltrating the network of a target organization. Attackers generally create new malware or many different types of malware by using an automatic malware creation tool which enables remote control over a target system easily and disturbs trace-back of these attacks. The paper proposes a generation method of malware behavior patterns as well as the detection techniques in order to detect the known and even unknown malware efficiently. The behavior patterns of malware are generated with Multiple Sequence Alignment (MSA) of API call sequences of malware. Consequently, we defined these behavior patterns as a “feature-chain” of malware for the analytical purpose. The initial generation of the feature-chain consists of extracting API call sequences with API hooking library, classifying malware samples by the similar behavior, and making the representative sequences from the MSA results. The detection mechanism of numerous malware is performed by measuring similarity between API call sequence of a target process (suspicious executables) and feature-chain of malware. By comparing with other existing methods, we proved the effectiveness of our proposed method based on Longest Common Subsequence (LCS) algorithm. Also we evaluated that our method outperforms other antivirus systems with 2.55 times in detection rate and 1.33 times in accuracy rate for malware detection.

  • Enriching Semantic Knowledge for WSD

    Junpeng CHEN  Wei YU  

     
    LETTER-Natural Language Processing

      Vol:
    E97-D No:8
      Page(s):
    2212-2216

    In our previous work, we proposed to combine ConceptNet and WordNet for Word Sense Disambiguation (WSD). The ConceptNet was automatically disambiguated through Normalized Google Distance (NGD) similarity. In this letter, we present several techniques to enhance the performance of the ConceptNet disambiguation and use this enriched semantic knowledge in WSD task. We propose to enrich both the WordNet semantic knowledge and NGD to disambiguate the concepts in ConceptNet. Furthermore, we apply the enriched semantic knowledge to improve the performance of WSD. From a number of experiments, the proposed method has been obtained enhanced results.

  • Normalized Joint Mutual Information Measure for Ground Truth Based Segmentation Evaluation

    Xue BAI  Yibiao ZHAO  Siwei LUO  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E95-D No:10
      Page(s):
    2581-2584

    Ground truth based image segmentation evaluation paradigm plays an important role in objective evaluation of segmentation algorithms. So far, many evaluation methods in terms of comparing clusterings in machine learning field have been developed. However, most traditional pairwise similarity measures, which only compare a machine generated clustering to a “true” clustering, have their limitations in some cases, e.g. when multiple ground truths are available for the same image. In this letter, we propose utilizing an information theoretic measure, named NJMI (Normalized Joint Mutual Information), to handle the situations which the pairwise measures can not deal with. We illustrate the effectiveness of NJMI for both unsupervised and supervised segmentation evaluation.

  • Toward Simulating the Human Way of Comparing Concepts

    Raul Ernesto MENENDEZ-MORA  Ryutaro ICHISE  

     
    PAPER-Data Engineering, Web Information Systems

      Vol:
    E94-D No:7
      Page(s):
    1419-1429

    An ability to assess similarity lies close to the core of cognition. Its understanding support the comprehension of human success in tasks like problem solving, categorization, memory retrieval, inductive reasoning, etc, and this is the main reason that it is a common research topic. In this paper, we introduce the idea of semantic differences and commonalities between words to the similarity computation process. Five new semantic similarity metrics are obtained after applying this scheme to traditional WordNet-based measures. We also combine the node based similarity measures with a corpus-independent way of computing the information content. In an experimental evaluation of our approach on two standard word pairs datasets, four of the measures outperformed their classical version, while the other performed as well as their unmodified counterparts.

  • Phonetically Balanced Text Corpus Design Using a Similarity Measure for a Stereo Super-Wideband Speech Database

    Yoo Rhee OH  Yong Guk KIM  Mina KIM  Hong Kook KIM  Mi Suk LEE  Hyun Joo BAE  

     
    PAPER-Speech and Hearing

      Vol:
    E94-D No:7
      Page(s):
    1459-1466

    In this paper, we propose a text corpus design method for a Korean stereo super-wideband speech database. Since a small-sized text corpus for speech coding is generally required for speech coding, the corpus should be designed to comply with the pronunciation behavior of natural conversation in order to ensure efficient speech quality tests. To this end, the proposed design method utilizes a similarity measure between the phoneme distribution occurring from natural conversation and that from the designed text corpus. In order to achieve this goal, we first collect and refine text data from textbooks and websites. Next, a corpus is designed from the refined text data based on the similarity measure to compare phoneme distributions. We then construct a Korean stereo super-wideband speech (K-SW) database using the designed text corpus, where the recording environment is set to meet the conditions defined by ITU-T. Finally, the subjective quality of the K-SW database is evaluated using an ITU-T super-wideband codec in order to demonstrate that the K-SW database is useful for developing and evaluating super-wideband codecs.

  • Study on Entropy and Similarity Measure for Fuzzy Set

    Sang-Hyuk LEE  Keun Ho RYU  Gyoyong SOHN  

     
    LETTER-Computation and Computational Models

      Vol:
    E92-D No:9
      Page(s):
    1783-1786

    In this study, we investigated the relationship between similarity measures and entropy for fuzzy sets. First, we developed fuzzy entropy by using the distance measure for fuzzy sets. We pointed out that the distance between the fuzzy set and the corresponding crisp set equals fuzzy entropy. We also found that the sum of the similarity measure and the entropy between the fuzzy set and the corresponding crisp set constitutes the total information in the fuzzy set. Finally, we derived a similarity measure from entropy and showed by a simple example that the maximum similarity measure can be obtained using a minimum entropy formulation.

  • A New Similar Trajectory Search Algorithm Based on Spatio-Temporal Similarity Measure for Moving Objects in Road Networks

    Young-Chang KIM  Jae-Woo CHANG  

     
    LETTER-Database

      Vol:
    E92-D No:2
      Page(s):
    327-331

    The deployment of historical trajectories of moving objects has greatly increased for various applications in road networks. For instance, similar patterns of moving-object trajectories are very useful for designing the transportation network of a new city. In this paper, we define a spatio-temporal similarity measure based on a road network distance, rather than a Euclidean distance. We also propose a new similar trajectory search algorithm based on the spatio-temporal measure by using an efficient pruning mechanism. Finally, we show the efficiency of our algorithm, both in terms of retrieval accuracy and retrieval efficiency.

  • Fuzzy Ranking Model Based on User Preference

    Bo-Yeong KANG  Dae-Won KIM  Qing LI  

     
    LETTER-Natural Language Processing

      Vol:
    E89-D No:6
      Page(s):
    1971-1974

    A great deal of research has been made to model the vagueness and uncertainty in information retrieval. One such research is fuzzy ranking models, which have been showing their superior performance in handling the uncertainty involved in the retrieval process. However, these conventional fuzzy ranking models have a limited ability to incorporate the user preference when calculating the rank of documents. To address this issue, in this study we develop a new fuzzy ranking model based on the user preference. Through the experiments on the TREC-2 collection of Wall Street Journal documents, we show that the proposed method outperforms the conventional fuzzy ranking models.

  • A New Similarity Measure to Understand Visitor Behavior in a Web Site

    Juan D. VELASQUEZ  Hiroshi YASUDA  Terumasa AOKI  Richard WEBER  

     
    PAPER

      Vol:
    E87-D No:2
      Page(s):
    389-396

    The behavior of visitors browsing in a web site offers a lot of information about their requirements and the way they use the respective site. Analyzing such behavior can provide the necessary information in order to improve the web site's structure. The literature contains already several suggestions on how to characterize web site usage and to identify the respective visitor requirements based on clustering of visitor sessions. Here we propose to combine visitor behavior with the content of the respective web pages and the similarity between different page sequences in order to define a similarity measure between different visits. This similarity serves as input for clustering of visitor sessions. The application of our approach to a bank's web site and its visitor sessions shows its potential for internet-based businesses.

  • Nonlinear System Control Using Compensatory Neuro-Fuzzy Networks

    Cheng-Jian LIN  Cheng-Hung CHEN  

     
    PAPER-Optimization and Control

      Vol:
    E86-A No:9
      Page(s):
    2309-2316

    In this paper, a Compensatory Neuro-Fuzzy Network (CNFN) for nonlinear system control is proposed. The compensatory fuzzy reasoning method is using adaptive fuzzy operations of neural fuzzy network that can make the fuzzy logic system more adaptive and effective. An on-line learning algorithm is proposed to automatically construct the CNFN. They are created and adapted as on-line learning proceeds via simultaneous structure and parameter learning. The structure learning is based on the fuzzy similarity measure and the parameter learning is based on backpropagation algorithm. The advantages of the proposed learning algorithm are that it converges quickly and the obtained fuzzy rules are more precise. The performance of CNFN compares excellently with other various exiting model.

  • Recognition of Degraded Machine-Printed Characters Using a Complementary Similarity Measure and Error-Correction Learning

    Minako SAWAKI  Norihiro HAGITA  

     
    PAPER-Classification Methods

      Vol:
    E79-D No:5
      Page(s):
    491-497

    Most conventional methods used in character recognition extract geometrical features, such as stroke direction and connectivity, and compare them with reference patterns in a stored dictionary. Unfortunately, geometrical features are easily degraded by blurs and stains, and by the graphical designs such as used in Japanese newspaper headlines. This noise must be removed before recognition commences, but no preprocessing method is perfectly accurate. This paper proposes a method for recognizing degraded characters as well as characters printed on graphical designs. This method extracts features from binary images, and a new similarity measure, the complementary similarity measure, is used as a discriminant function; it compares the similarity and dissimilarity of binary patterns with reference dictionary patterns. Experiments are conducted using the standard character database ETL-2, which consists of machine-printed Kanji, Hiragana, Katakana, alphanumeric, and special characters. The results show that our method is much more robust against noise than the conventional geometrical-feature method. It also achieves high recognition rates of over 97% for characters with textured foregrounds, over 99% for characters with textured backgrounds, over 98% for outline fonts and over 99% for reverse contrast characters. The experiments for recognizing both the fontstyles and character category show that it also achieves high recognition rates against noise.