The search functionality is under construction.

Keyword Search Result

[Keyword] similarity(161hit)

121-140hit(161hit)

  • Fuzzy Ranking Model Based on User Preference

    Bo-Yeong KANG  Dae-Won KIM  Qing LI  

     
    LETTER-Natural Language Processing

      Vol:
    E89-D No:6
      Page(s):
    1971-1974

    A great deal of research has been made to model the vagueness and uncertainty in information retrieval. One such research is fuzzy ranking models, which have been showing their superior performance in handling the uncertainty involved in the retrieval process. However, these conventional fuzzy ranking models have a limited ability to incorporate the user preference when calculating the rank of documents. To address this issue, in this study we develop a new fuzzy ranking model based on the user preference. Through the experiments on the TREC-2 collection of Wall Street Journal documents, we show that the proposed method outperforms the conventional fuzzy ranking models.

  • A New Question Answering System for Chinese Restricted Domain

    Haiqing HU  Peilin JIANG  Fuji REN  Shingo KUROIWA  

     
    PAPER-Language

      Vol:
    E89-D No:6
      Page(s):
    1848-1859

    In this paper, we propose the construction of a web-based Question Answering (QA) system for restricted domain, which combines three resource information databases for the retrieval mechanism, including a Question&Answer database, a special domain documents database and the web resource retrieved by Google search engine. We describe a new retrieval technique of integrating a probabilistic technique based on OkapiBM25 and a semantic analysis which based on the ontology of HowNet knowledge base and a special domain HowNet created for the restricted domain. Furthermore, we provide a method of question expansion by computing word semantic similarity. The system is first developed for a middle-size domain of sightseeing information. The experiments proved the efficiency of our method for restricted domain and it is feasible to transfer to other domains expediently using the proposed method.

  • Personal Name Resolution Crossover Documents by a Semantics-Based Approach

    Xuan-Hieu PHAN  Le-Minh NGUYEN  Susumu HORIGUCHI  

     
    PAPER-Natural Language Processing

      Vol:
    E89-D No:2
      Page(s):
    825-836

    Cross-document personal name resolution is the process of identifying whether or not a common personal name mentioned in different documents refers to the same individual. Most previous approaches usually rely on lexical matching such as the occurrence of common words surrounding the entity name to measure the similarity between documents, and then clusters the documents according to their referents. In spite of certain successes, measuring similarity based on lexical comparison sometimes ignores important linguistic phenomena at the semantic level such as synonym or paraphrase. This paper presents a semantics-based approach to the resolution of personal name crossover documents that can make the most of both lexical evidences and semantic clues. In our method, the similarity values between documents are determined by estimating the semantic relatedness between words. Further, the semantic labels attached to sentences allow us to highlight the common personal facts that are potentially available among documents. An evaluation on three web datasets demonstrates that our method achieves the better performance than the previous work.

  • On the Aggregation of Self-Similar Processes

    Gianluca MAZZINI  Riccardo ROVATTI  Gianluca SETTI  

     
    PAPER

      Vol:
    E88-A No:10
      Page(s):
    2656-2663

    The problem of aggregating different stochastic process into a unique one that must be characterized based on the statistical knowledge of its components is a key point in the modeling of many complex phenomena such as the merging of traffic flows at network nodes. Depending on the physical intuition on the interaction between the processes, many different aggregation policies can be devised, from averaging to taking the maximum in each time slot. We here address flows averaging and maximum since they are very common modeling options. Then we give a set of axioms defining a general aggregation operator and, based on some advanced results of functional analysis, we investigate how the decay of correlation of the original processes affect the decay of correlation (and thus the self-similar features) of the aggregated process.

  • Image Segmentation with Fast Wavelet-Based Color Segmenting and Directional Region Growing

    Din-Yuen CHAN  Chih-Hsueh LIN  Wen-Shyong HSIEH  

     
    PAPER

      Vol:
    E88-D No:10
      Page(s):
    2249-2259

    This investigation proposes a fast wavelet-based color segmentation (FWCS) technique and a modified directional region-growing (DRG) technique for semantic image segmentation. The FWCS is a subsequent combination of progressive color truncation and histogram-based color extraction processes for segmenting color regions in images. By exploring specialized centroids of segmented fragments as initial growing seeds, the proposed DRG operates a directional 1-D region growing on pairs of color segmented regions based on those centroids. When the two examined regions are positively confirmed by DRG, the proposed framework subsequently computes the texture features extracted from these two regions to further check their relation using texture similarity testing (TST). If any pair of regions passes double checking with both DRG and TST, they are identified as associated regions. If two associated regions/areas are connective, they are unified to a union area enclosed by a single contour. On the contrary, the proposed framework merely acknowledges a linking relation between those associated regions/areas highlighted with any linking mark. Particularly, by the systematic integration of all proposed processes, the critical issue to decide the ending level of wavelet decomposition in various images can be efficiently solved in FWCS by a quasi-linear high-frequency analysis model newly proposed. The simulations conducted here demonstrate that the proposed segmentation framework can achieve a quasi-semantic segmentation without priori a high-level knowledge.

  • Resonance Analysis of Multilayered Filters with Triadic Cantor-Type One-Dimensional Quasi-Fractal Structures

    Ushio SANGAWA  

     
    PAPER-Electromagnetic Theory

      Vol:
    E88-C No:10
      Page(s):
    1981-1991

    Multilayered filters with a dielectric distribution along their thickness forming a one-dimensional quasi-fractal structure are theoretically analyzed, focusing on exposing their resonant properties in order to understand a dielectric Menger's sponge resonator [4],[5]. "Quasi-fractal" refers to the triadic Cantor set with finite generation. First, a novel calculation method that has the ability to deal with filters with fine fractal structures is derived. This method takes advantage of Clifford algebra based on the theory of thin-film optics. The method is then applied to classify resonant modes and, especially, to investigate quality factors for them in terms of the following design parameters: a dielectric constant, a loss tangent, and a stage number. The latter determines fractal structure. Finally, behavior of the filters with perfect fractal structure is considered. A crucial finding is that the high quality factor of the modes is not due to the complete self-similarity, but rather to the breaking of such a fractal symmetry.

  • Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity

    Takao DOI  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:6
      Page(s):
    1256-1264

    In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input utterance appears promising. In previous research, many methods used word-sequence characteristics like N-gram clues among splitting positions. In this paper, to supplement splitting methods based on word-sequence characteristics, we introduce another clue using similarity based on edit-distance. In our splitting method, we generate candidates for utterance splitting based on N-grams, and select the best one by measuring the utterance similarity against a corpus. This selection is founded on the assumption that a corpus-based MT system can correctly translate an utterance that is similar to an utterance in its training corpus. We conducted experiments using three MT systems: two EBMT systems, one of which uses a phrase as a translation unit and the other of which uses an utterance, and an SMT system. The translation results under various conditions were evaluated by objective measures and a subjective measure. The experimental results demonstrate that the proposed method is valuable for the three systems. Using utterance similarity can improve the translation quality.

  • A New Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation

    Xuebin HU  Hidefumi KOBATAKE  

     
    PAPER-Engineering Acoustics

      Vol:
    E88-A No:6
      Page(s):
    1543-1548

    Frequency domain blind source separation has the great advantage that the complicated convolution in time domain becomes multiple efficient multiplications in frequency domain. However, the inherent ambiguity of permutation of ICA becomes an important problem that the separated signals at different frequencies may be permuted in order. Mapping the separated signal at each frequency to a target source remains to be a difficult problem. In this paper, we first discuss the inter-frequency correlation based method, and propose a new method using the continuity in power between adjacent frequency components of same source. The proposed method also implicitly utilizes the information of inter-frequency correlation, as such has better performance than the previous method.

  • Generalized Variance-Based Markovian Fitting for Self-Similar Traffic Modelling

    Shou-Kuo SHAO  Malla REDDY PERATI  Meng-Guang TSAI  Hen-Wai TSAO  Jingshown WU  

     
    PAPER

      Vol:
    E88-B No:4
      Page(s):
    1493-1502

    Most of the proposed self-similar traffic models are asymptotic in nature. Hence, they are less effective in queueing-based performance evaluation when the buffer sizes are small. In this paper, we propose a short range dependent (SRD) process modelling by a generalized variance-based Markovian fitting to provide effective queueing-based performance measures when buffer sizes are small. The proposed method is to match the variance of the exact second-order self-similar processes. The fitting procedure determines the related parameters in an exact and straightforward way. The resultant traffic model essentially consists of a superposition of several two-state Markov-modulated Poisson processes (MMPPs) with distinct modulating parameters. We present how well the resultant MMPP could emulate the variance of original self-similar traffic in the range of the specified time scale, and could provide more accurate bounds for the queueing-based performance measures, namely tail probability, mean waiting time and loss probability. Numerical results show that both the second-order statistics and queueing-based performance measures when buffer capacity is small are more accurate than that of the variance-based fitting where the modulating parameters of each superposed two-state MMPP are equal. We then investigate the relationship between time scale and the number of superposed two-state MMPPs. We found that when the performance measures pertaining to larger time scales are not better than that of smaller ones, we need to increase the number of superposed two-state MMPPs to maintain the accurate and reliable queueing-based performance measures. We then conclude from the extensive numerical examples that an exact second-order self-similar traffic can be well represented by the proposed model.

  • Hard-Limited Karhunen-Loeve Transform for Face Recognition

    Chih-Chien Thomas CHEN  Chin-Ta CHEN  Ming-Hong JIANG  

     
    LETTER-Image

      Vol:
    E87-A No:7
      Page(s):
    1836-1838

    A face recognition system based on the hard-limited eigenfunctions derived from the Karhunen-Loeve transform is proposed. The key of this approach is to change the inner product of the face image and the selected eigenvectors from floating point arithmetic to integer arithmetic. A database with 1000 facial images corresponding to 100 subjects is collected for system evaluation. It is demonstrated that 92% correct classification rate and 6-fold computational time saving can be achieved by the use of the first 150 hard-limited features.

  • Strategy for XML Integration Using Similarity in Structure and Content

    Youn Hee KIM  Byung Gon KIM  Jaeho LEE  Hae Chull LIM  

     
    PAPER

      Vol:
    E87-A No:6
      Page(s):
    1479-1486

    Most of the existing studies on storing and searching XML documents effectively manipulate each XML document independently. Therefore, techniques for storing XML documents together that have similar meaning or structure are required for efficiency. Also, as a unified access method for various XML storage systems that have different storage forms, studies to integrate the DTD or XML schema of each storage system into one are required, because many XML documents do not have a particular DTD or XML schema, or XML documents can be written in various ways. Therefore, studies on the integration techniques for XML instances are needed. The XML integration technique can be used effectively in the case of constructing a data warehouse for heterogeneous XML storage systems. The proposed integration techniques remove the space duplicated for the same elements in XML documents. The proposed techniques significantly reduce the search time for general queries on the XML documents because it stores the related parts in XML documents close.

  • Self-Organizing Map-Based Analysis of IP-Network Traffic in Terms of Time Variation of Self-Similarity: A Detrended Fluctuation Analysis Approach

    Masao MASUGI  

     
    PAPER-Nonlinear Problems

      Vol:
    E87-A No:6
      Page(s):
    1546-1554

    This paper describes an analysis of IP-network traffic in terms of the time variation of self-similarity. To get a comprehensive view in analyzing the degree of long-range dependence (LRD) of IP-network traffic, this paper used a self-organizing map, which provides a way to map high-dimensional data onto a low-dimensional domain. Also, in the LRD-based analysis, this paper employed detrended fluctuation analysis (DFA), which is applicable to the analysis of long-range power-law correlations or LRD in non-stationary time-series signals. In applying this method to traffic analysis, this paper performed two kinds of traffic measurement: one based on IP-network traffic flowing into NTT Musashino R&D center (Tokyo, Japan) from the Internet and the other based on IP-network traffic flowing through at an interface point between an access provider (Tokyo, Japan) and the Internet. Based on sequential measurements of IP-network traffic, this paper derived corresponding values for the LRD-related parameter α of measured traffic. As a result, we found that the characteristic of self-similarity seen in the measured traffic fluctuated over time, with different time variation patterns for two measurement locations. In training the self-organizing map, this paper used three parameters: two α values for different plot ranges, and Shannon-based entropy, which reflects the degree of concentration of measured time-series data. We visually confirmed that the traffic data could be projected onto the map in accordance with the traffic properties, resulting in a combined depiction of the effects of the degree of LRD and network utilization rates. The proposed method can deal with multi-dimensional parameters, projecting its results onto a two-dimensional space in which the projected data positions give us an effective depiction of network conditions at different times.

  • Distance between Rooted and Unordered Trees Based on Vertex and Edge Mappings

    Shaoming LIU  

     
    PAPER

      Vol:
    E87-A No:5
      Page(s):
    1034-1041

    The issues of comparing the similarity or dissimilarity (distance) between structures have been studied. Especially, several distances between trees and their efficient algorithms have been proposed. However, all of the tree distances are defined based on mapping between vertices only, and they are helpless to compare the tree structures whose vertices and edges hold information. In this paper, we will propose a mapping between rooted and unordered trees based on vertex translation and edge translation, and then define a distance based on proposed mapping, and develop an efficient algorithm for computing proposed distance. Proposed distance can be used to compare the similarity or distance between two natural language sentences.

  • A Significant Property of Mapping Parameters for Signal Interpolation Using Fractal Interpolation Functions

    Satoshi UEMURA  Miki HASEYAMA  Hideo KITAJIMA  

     
    LETTER-Digital Signal Processing

      Vol:
    E87-A No:3
      Page(s):
    748-752

    This letter presents a significant property of the mapping parameters that play a central role to represent a given signal in Fractal Interpolation Functions (FIF). Thanks to our theoretical analysis, it is derived that the mapping parameters required to represent a given signal are also applicable to represent the upsampled signal of a given one. Furthermore, the upsampled signal obtained by using the property represents the self-affine property more distinctly than the given signal. Experiments show the validity and usefulness of the significant property.

  • A New Similarity Measure to Understand Visitor Behavior in a Web Site

    Juan D. VELASQUEZ  Hiroshi YASUDA  Terumasa AOKI  Richard WEBER  

     
    PAPER

      Vol:
    E87-D No:2
      Page(s):
    389-396

    The behavior of visitors browsing in a web site offers a lot of information about their requirements and the way they use the respective site. Analyzing such behavior can provide the necessary information in order to improve the web site's structure. The literature contains already several suggestions on how to characterize web site usage and to identify the respective visitor requirements based on clustering of visitor sessions. Here we propose to combine visitor behavior with the content of the respective web pages and the similarity between different page sequences in order to define a similarity measure between different visits. This similarity serves as input for clustering of visitor sessions. The application of our approach to a bank's web site and its visitor sessions shows its potential for internet-based businesses.

  • Sentence Extraction by Spreading Activation through Sentence Similarity

    Naoaki OKAZAKI  Yutaka MATSUO  Naohiro MATSUMURA  Mitsuru ISHIZUKA  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1686-1694

    Although there has been a great deal of research on automatic summarization, most methods rely on statistical methods, disregarding relationships between extracted textual segments. We propose a novel method to extract a set of comprehensible sentences which centers on several key points to ensure sentence connectivity. It features a similarity network from documents with a lexical dictionary, and spreading activation to rank sentences. We show evaluation results of a multi-document summarization system based on the method participating in a competition of summarization, TSC (Text Summarization Challenge) task, organized by the third NTCIR project.

  • Nonlinear System Control Using Compensatory Neuro-Fuzzy Networks

    Cheng-Jian LIN  Cheng-Hung CHEN  

     
    PAPER-Optimization and Control

      Vol:
    E86-A No:9
      Page(s):
    2309-2316

    In this paper, a Compensatory Neuro-Fuzzy Network (CNFN) for nonlinear system control is proposed. The compensatory fuzzy reasoning method is using adaptive fuzzy operations of neural fuzzy network that can make the fuzzy logic system more adaptive and effective. An on-line learning algorithm is proposed to automatically construct the CNFN. They are created and adapted as on-line learning proceeds via simultaneous structure and parameter learning. The structure learning is based on the fuzzy similarity measure and the parameter learning is based on backpropagation algorithm. The advantages of the proposed learning algorithm are that it converges quickly and the obtained fuzzy rules are more precise. The performance of CNFN compares excellently with other various exiting model.

  • Using Similarity Parameters for Supervised Polarimetric SAR Image Classification

    Junyi XU  Jian YANG  Yingning PENG  Chao WANG  Yuei-An LIOU  

     
    PAPER-Sensing

      Vol:
    E85-B No:12
      Page(s):
    2934-2942

    In this paper, a new method is proposed for supervised classification of ground cover types by using polarimetric synthetic aperture radar (SAR) data. The concept of similarity parameter between two scattering matrices is introduced for characterizing target scattering mechanism. Four similarity parameters of each pixel in image are used for classification. They are the similarity parameters between a pixel and a plane, a dihedral, a helix and a wire. The total received power of each pixel is also used since the similarity parameter is independent of the spans of target scattering matrices. The supervised classification is carried out based on the principal component analysis. This analysis is applied to each data set in image in the feature space for getting the corresponding feature transform vector. The inner product of two vectors is used as a distance measure in classification. The classification result of the new scheme is shown and it is compared to the results of principal component analysis with other decomposition coefficients, to demonstrate the effectiveness of the similarity parameters.

  • Modeling of Aggregated TCP/IP Traffic on a Bottleneck Link Based on Scaling Behavior

    Hiroki FURUYA  Masaki FUKUSHIMA  Hajime NAKAMURA  Shinichi NOMOTO  

     
    PAPER-Internet

      Vol:
    E85-B No:9
      Page(s):
    1756-1765

    This paper proposes an idea for modeling aggregated TCP/IP traffic arriving at a bottleneck link by focusing on its scaling behavior. Here, the aggregated TCP/IP traffic means the IP packet traffic from many TCP connections sharing the bottleneck link. The model is constructed based on the outcomes of our previous works investigating how the TCP/IP networking mechanism affects the self-similar scaling behavior of the aggregated TCP/IP traffic in a LAN/WAN environment. The proposed traffic model has been examined from the perspective of application to network performance estimation. The examinations have shown that it models the scaling behavior and queueing behavior of actual traffic, though it neglects the interaction among TCP connections that compete with each other for the single bottleneck link bandwidth.

  • Self-Similarity in Cell Dwell Time Caused by Terminal Motion and Its Effects on Teletraffic of Cellular Communication Networks

    Hirotoshi HIDAKA  Kazuyoshi SAITOH  Noriteru SHINAGAWA  Takehiko KOBAYASHI  

     
    PAPER

      Vol:
    E85-A No:7
      Page(s):
    1445-1453

    This paper discusses self-similarity in cell dwell time of a mobile terminal, the discovery of which was described in our previous paper, and its effects on teletraffic of mobile communication networks. We have evaluated various teletraffic statistics, such as cell dwell time and channel occupancy time, of a mobile terminal based on measurements of motion for various types of vehicles. Those results show that cell dwell time follows a long-tailed log-normal distribution rather than the exponential distribution that has been used for modeling. Here, we first elaborate on self-similarity in cell dwell time of various vehicles. We then evaluate self-similarity in channel occupancy time. For future mobile multimedia communication systems employing a micro-cell configuration, it is anticipated that data communication will be the main form of communication and that call holding time will be long. For such cases, we have shown that channel occupancy time will be greatly affected by the cell dwell time of the mobile terminal, and that self-similarity, a characteristic that is not seen in conventional systems, will consequently appear. We have also found that hand-off frequently fails as self-similarity in cell dwell time of a mobile terminal becomes stronger.

121-140hit(161hit)