The search functionality is under construction.

Keyword Search Result

[Keyword] similarity(161hit)

1-20hit(161hit)

  • Continuous Similarity Search for Dynamic Text Streams

    Yuma TSUCHIDA  Kohei KUBO  Hisashi KOGA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2023/09/21
      Vol:
    E106-D No:12
      Page(s):
    2026-2035

    Similarity search for data streams has attracted much attention for information recommendation. In this context, recent leading works regard the latest W items in a data stream as an evolving set and reduce similarity search for data streams to set similarity search. Whereas they consider standard sets composed of items, this paper uniquely studies similarity search for text streams and treats evolving sets whose elements are texts. Specifically, we formulate a new continuous range search problem named the CTS problem (Continuous similarity search for Text Sets). The task of the CTS problem is to find all the text streams from the database whose similarity to the query becomes larger than a threshold ε. It abstracts a scenario in which a user-based recommendation system searches similar users from social networking services. The CTS is important because it allows both the query and the database to change dynamically. We develop a fast pruning-based algorithm for the CTS. Moreover, we discuss how to speed up it with the inverted index.

  • Learning Local Similarity with Spatial Interrelations on Content-Based Image Retrieval

    Longjiao ZHAO  Yu WANG  Jien KATO  Yoshiharu ISHIKAWA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2023/02/14
      Vol:
    E106-D No:5
      Page(s):
    1069-1080

    Convolutional Neural Networks (CNNs) have recently demonstrated outstanding performance in image retrieval tasks. Local convolutional features extracted by CNNs, in particular, show exceptional capability in discrimination. Recent research in this field has concentrated on pooling methods that incorporate local features into global features and assess the global similarity of two images. However, the pooling methods sacrifice the image's local region information and spatial relationships, which are precisely known as the keys to the robustness against occlusion and viewpoint changes. In this paper, instead of pooling methods, we propose an alternative method based on local similarity, determined by directly using local convolutional features. Specifically, we first define three forms of local similarity tensors (LSTs), which take into account information about local regions as well as spatial relationships between them. We then construct a similarity CNN model (SCNN) based on LSTs to assess the similarity between the query and gallery images. The ideal configuration of our method is sought through thorough experiments from three perspectives: local region size, local region content, and spatial relationships between local regions. The experimental results on a modified open dataset (where query images are limited to occluded ones) confirm that the proposed method outperforms the pooling methods because of robustness enhancement. Furthermore, testing on three public retrieval datasets shows that combining LSTs with conventional pooling methods achieves the best results.

  • Joint Design of Transmitting Waveform and Receiving Filter for Colocated MIMO Radar

    Ningkang CHEN  Ping WEI  Lin GAO  Huaguo ZHANG  Hongshu LIAO  

     
    PAPER-Communication Theory and Signals

      Pubricized:
    2022/03/14
      Vol:
    E105-A No:9
      Page(s):
    1330-1339

    This paper aims to design multiple-input multiple-output (MIMO) radar receiving weights and transmitting waveforms, in order to obtain better spatial filtering performance and enhance the robustness in the case of signal-dependent interference and jointly inaccurate estimated angles of target and interference. Generally, an alternate iterative optimization algorithm is proposed for the joint design problem. Specifically, the receiving weights are designed by the generalized eigenvalue decomposition of the matrix which contains the estimated information of the target and interference. As the cost function of the transmitting waveform design is fractional, the fractional optimization problem is first converted into a secondary optimization problem. Based on the proposed algorithm, a closed-form solution of the waveform is given using the alternating projection. At the analysis stage, in the presence of estimated errors under the environment of signal-dependent interference, a robust signal-to-interference and noise ratio (SINR) performance is obtained using a small amount of calculation with an iterative procedure. Numerical examples verify the effectiveness of the performances of the designed waveform in terms of the SINR, beampattern and pulse compression.

  • Supervised Audio Source Separation Based on Nonnegative Matrix Factorization with Cosine Similarity Penalty Open Access

    Yuta IWASE  Daichi KITAMURA  

     
    PAPER-Engineering Acoustics

      Pubricized:
    2021/12/08
      Vol:
    E105-A No:6
      Page(s):
    906-913

    In this study, we aim to improve the performance of audio source separation for monaural mixture signals. For monaural audio source separation, semisupervised nonnegative matrix factorization (SNMF) can achieve higher separation performance by employing small supervised signals. In particular, penalized SNMF (PSNMF) with orthogonality penalty is an effective method. PSNMF forces two basis matrices for target and nontarget sources to be orthogonal to each other and improves the separation accuracy. However, the conventional orthogonality penalty is based on an inner product and does not affect the estimation of the basis matrix properly because of the scale indeterminacy between the basis and activation matrices in NMF. To cope with this problem, a new PSNMF with cosine similarity between the basis matrices is proposed. The experimental comparison shows the efficacy of the proposed cosine similarity penalty in supervised audio source separation.

  • Exact Algorithm to Solve Continuous Similarity Search for Evolving Queries and Its Variant

    Tomohiro YAMAZAKI  Hisashi KOGA  

     
    PAPER

      Pubricized:
    2022/02/07
      Vol:
    E105-D No:5
      Page(s):
    898-908

    We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time T-1. We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time T-1. Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude and as fast as the previous approximation algorithm.

  • Pairwise Similarity Normalization Based on a Hubness Score for Improving Cover Song Retrieval Accuracy

    Jin S. SEO  

     
    LETTER-Music Information Processing

      Pubricized:
    2022/02/21
      Vol:
    E105-D No:5
      Page(s):
    1130-1134

    A hubness-score based normalization of the pairwise similarity is proposed for the sequence-alignment based cover song retrieval. The hubness, which is the tendency of some data points in high-dimensional data sets to link more frequently to other points than the rest of the points from the set, is widely-known to deteriorate the information retrieval accuracy. This paper tries to relieve the performance degradation due to the hubness by normalizing the pairwise similarity with a hubness score. Experiments on two cover song datasets confirm that the proposed similarity normalization improves the cover song retrieval accuracy.

  • SIBYL: A Method for Detecting Similar Binary Functions Using Machine Learning

    Yuma MASUBUCHI  Masaki HASHIMOTO  Akira OTSUKA  

     
    PAPER-Dependable Computing

      Pubricized:
    2021/12/28
      Vol:
    E105-D No:4
      Page(s):
    755-765

    Binary code similarity comparison methods are mainly used to find bugs in software, to detect software plagiarism, and to reduce the workload during malware analysis. In this paper, we propose a method to compare the binary code similarity of each function by using a combination of Control Flow Graphs (CFGs) and disassembled instruction sequences contained in each function, and to detect a function with high similarity to a specified function. One of the challenges in performing similarity comparisons is that different compile-time optimizations and different architectures produce different binary code. The main units for comparing code are instructions, basic blocks and functions. The challenge of functions is that they have a graph structure in which basic blocks are combined, making it relatively difficult to derive similarity. However, analysis tools such as IDA, display the disassembled instruction sequence in function units. Detecting similarity on a function basis has the advantage of facilitating simplified understanding by analysts. To solve the aforementioned challenges, we use machine learning methods in the field of natural language processing. In this field, there is a Transformer model, as of 2017, that updates each record for various language processing tasks, and as of 2021, Transformer is the basis for BERT, which updates each record for language processing tasks. There is also a method called node2vec, which uses machine learning techniques to capture the features of each node from the graph structure. In this paper, we propose SIBYL, a combination of Transformer and node2vec. In SIBYL, a method called Triplet-Loss is used during learning so that similar items are brought closer and dissimilar items are moved away. To evaluate SIBYL, we created a new dataset using open-source software widely used in the real world, and conducted training and evaluation experiments using the dataset. In the evaluation experiments, we evaluated the similarity of binary codes across different architectures using evaluation indices such as Rank1 and MRR. The experimental results showed that SIBYL outperforms existing research. We believe that this is due to the fact that machine learning has been able to capture the features of the graph structure and the order of instructions on a function-by-function basis. The results of these experiments are presented in detail, followed by a discussion and conclusion.

  • Similarity Search in InterPlanetary File System with the Aid of Locality Sensitive Hash

    Satoshi FUJITA  

     
    PAPER-Information Network

      Pubricized:
    2021/07/08
      Vol:
    E104-D No:10
      Page(s):
    1616-1623

    To realize an information-centric networking, IPFS (InterPlanetary File System) generates a unique ContentID for each content by applying a cryptographic hash to the content itself. Although it could improve the security against attacks such as falsification, it makes difficult to realize a similarity search in the framework of IPFS, since the similarity of contents is not reflected in the proximity of ContentIDs. To overcome this issue, we propose a method to apply a locality sensitive hash (LSH) to feature vectors extracted from contents as the key of indexes stored in IPFS. By conducting experiments with 10,000 random points corresponding to stored contents, we found that more than half of randomly given queries return a non-empty result for the similarity search, and yield an accurate result which is outside the σ confidence interval of an ordinary flooding-based method. Note that such a collection of random points corresponds to the worst case scenario for the proposed scheme since the performance of similarity search could improve when points and queries follow an uneven distribution.

  • Matrix Factorization Based Recommendation Algorithm for Sharing Patent Resource

    Xueqing ZHANG  Xiaoxia LIU  Jun GUO  Wenlei BAI  Daguang GAN  

     
    PAPER

      Pubricized:
    2021/04/26
      Vol:
    E104-D No:8
      Page(s):
    1250-1257

    As scientific and technological resources are experiencing information overload, it is quite expensive to find resources that users are interested in exactly. The personalized recommendation system is a good candidate to solve this problem, but data sparseness and the cold starting problem still prevent the application of the recommendation system. Sparse data affects the quality of the similarity measurement and consequently the quality of the recommender system. In this paper, we propose a matrix factorization recommendation algorithm based on similarity calculation(SCMF), which introduces potential similarity relationships to solve the problem of data sparseness. A penalty factor is adopted in the latent item similarity matrix calculation to capture more real relationships furthermore. We compared our approach with other 6 recommendation algorithms and conducted experiments on 5 public data sets. According to the experimental results, the recommendation precision can improve by 2% to 9% versus the traditional best algorithm. As for sparse data sets, the prediction accuracy can also improve by 0.17% to 18%. Besides, our approach was applied to patent resource exploitation provided by the wanfang patents retrieval system. Experimental results show that our method performs better than commonly used algorithms, especially under the cold starting condition.

  • Collaborative Filtering Auto-Encoders for Technical Patent Recommending

    Wenlei BAI  Jun GUO  Xueqing ZHANG  Baoying LIU  Daguang GAN  

     
    PAPER

      Pubricized:
    2021/04/26
      Vol:
    E104-D No:8
      Page(s):
    1258-1265

    To find the exact items from the massive patent resources for users is a matter of great urgency. Although the recommender systems have shot this problem to a certain extent, there are still some challenging problems, such as tracking user interests and improving the recommendation quality when the rating matrix is extremely sparse. In this paper, we propose a novel method called Collaborative Filtering Auto-Encoder for the top-N recommendation. This method employs Auto-Encoders to extract the item's features, converts a high-dimensional sparse vector into a low-dimensional dense vector, and then uses the dense vector for similarity calculation. At the same time, to make the recommendation list closer to the user's recent interests, we divide the recommendation weight into time-based and recent similarity-based weights. In fact, the proposed method is an improved, item-based collaborative filtering model with more flexible components. Experimental results show that the method consistently outperforms state-of-the-art top-N recommendation methods by a significant margin on standard evaluation metrics.

  • Deep Metric Learning for Multi-Label and Multi-Object Image Retrieval

    Jonathan MOJOO  Takio KURITA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/03/08
      Vol:
    E104-D No:6
      Page(s):
    873-880

    Content-based image retrieval has been a hot topic among computer vision researchers for a long time. There have been many advances over the years, one of the recent ones being deep metric learning, inspired by the success of deep neural networks in many machine learning tasks. The goal of metric learning is to extract good high-level features from image pixel data using neural networks. These features provide useful abstractions, which can enable algorithms to perform visual comparison between images with human-like accuracy. To learn these features, supervised information of image similarity or relative similarity is often used. One important issue in deep metric learning is how to define similarity for multi-label or multi-object scenes in images. Traditionally, pairwise similarity is defined based on the presence of a single common label between two images. However, this definition is very coarse and not suitable for multi-label or multi-object data. Another common mistake is to completely ignore the multiplicity of objects in images, hence ignoring the multi-object facet of certain types of datasets. In our work, we propose an approach for learning deep image representations based on the relative similarity of both multi-label and multi-object image data. We introduce an intuitive and effective similarity metric based on the Jaccard similarity coefficient, which is equivalent to the intersection over union of two label sets. Hence we treat similarity as a continuous, as opposed to discrete quantity. We incorporate this similarity metric into a triplet loss with an adaptive margin, and achieve good mean average precision on image retrieval tasks. We further show, using a recently proposed quantization method, that the resulting deep feature can be quantized whilst preserving similarity. We also show that our proposed similarity metric performs better for multi-object images than a previously proposed cosine similarity-based metric. Our proposed method outperforms several state-of-the-art methods on two benchmark datasets.

  • Deterministic Supervisors for Bisimilarity Control of Partially Observed Nondeterministic Discrete Event Systems with Deterministic Specifications

    Kohei SHIMATANI  Shigemasa TAKAI  

     
    PAPER

      Vol:
    E104-A No:2
      Page(s):
    438-446

    We consider the bisimilarity control problem for partially observed nondeterministic discrete event systems with deterministic specifications. This problem requires us to synthesize a supervisor that achieves bisimulation equivalence of the supervised system and the deterministic specification under partial observation. We present necessary and sufficient conditions for the existence of such a deterministic supervisor and show that these conditions can be verified polynomially.

  • Sentence-Embedding and Similarity via Hybrid Bidirectional-LSTM and CNN Utilizing Weighted-Pooling Attention

    Degen HUANG  Anil AHMED  Syed Yasser ARAFAT  Khawaja Iftekhar RASHID  Qasim ABBAS  Fuji REN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2020/08/27
      Vol:
    E103-D No:10
      Page(s):
    2216-2227

    Neural networks have received considerable attention in sentence similarity measuring systems due to their efficiency in dealing with semantic composition. However, existing neural network methods are not sufficiently effective in capturing the most significant semantic information buried in an input. To address this problem, a novel weighted-pooling attention layer is proposed to retain the most remarkable attention vector. It has already been established that long short-term memory and a convolution neural network have a strong ability to accumulate enriched patterns of whole sentence semantic representation. First, a sentence representation is generated by employing a siamese structure based on bidirectional long short-term memory and a convolutional neural network. Subsequently, a weighted-pooling attention layer is applied to obtain an attention vector. Finally, the attention vector pair information is leveraged to calculate the score of sentence similarity. An amalgamation of both, bidirectional long short-term memory and a convolutional neural network has resulted in a model that enhances information extracting and learning capacity. Investigations show that the proposed method outperforms the state-of-the-art approaches to datasets for two tasks, namely semantic relatedness and Microsoft research paraphrase identification. The new model improves the learning capability and also boosts the similarity accuracy as well.

  • A New Similarity Model Based on Collaborative Filtering for New User Cold Start Recommendation

    Ruilin PAN  Chuanming GE  Li ZHANG  Wei ZHAO  Xun SHAO  

     
    PAPER-Office Information Systems, e-Business Modeling

      Pubricized:
    2020/03/03
      Vol:
    E103-D No:6
      Page(s):
    1388-1394

    Collaborative filtering (CF) is one of the most popular approaches to building Recommender systems (RS) and has been extensively implemented in many online applications. But it still suffers from the new user cold start problem that users have only a small number of items interaction or purchase records in the system, resulting in poor recommendation performance. Thus, we design a new similarity model which can fully utilize the limited rating information of cold users. We first construct a new metric, Popularity-Mean Squared Difference, considering the influence of popular items, average difference between two user's common ratings and non-numerical information of ratings. Moreover, the second new metric, Singularity-Difference, presents the deviation degree of favor to items between two users. It considers the distribution of the similarity degree of co-ratings between two users as weight to adjust the deviation degree. Finally, we take account of user's personal rating preferences through introducing the mean and variance of user ratings. Experiment results based on three real-life datasets of MovieLens, Epinions and Netflix demonstrate that the proposed model outperforms seven popular similarity methods in terms of MAE, precision, recall and F1-Measure under new user cold start condition.

  • Adversarial Metric Learning with Naive Similarity Discriminator

    Yi-ze LE  Yong FENG  Da-jiang LIU  Bao-hua QIANG  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2020/03/10
      Vol:
    E103-D No:6
      Page(s):
    1406-1413

    Metric learning aims to generate similarity-preserved low dimensional feature vectors from input images. Most existing supervised deep metric learning methods usually define a carefully-designed loss function to make a constraint on relative position between samples in projected lower dimensional space. In this paper, we propose a novel architecture called Naive Similarity Discriminator (NSD) to learn the distribution of easy samples and predict their probability of being similar. Our purpose lies on encouraging generator network to generate vectors in fitting positions whose similarity can be distinguished by our discriminator. Adequate comparison experiments was performed to demonstrate the ability of our proposed model on retrieval and clustering tasks, with precision within specific radius, normalized mutual information and F1 score as evaluation metrics.

  • Superpixel Segmentation Based on Global Similarity and Contour Region Transform

    Bing LUO  Junkai XIONG  Li XU  Zheng PEI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2019/12/03
      Vol:
    E103-D No:3
      Page(s):
    716-719

    This letter proposes a new superpixel segmentation algorithm based on global similarity and contour region transformation. The basic idea is that pixels surrounded by the same contour are more likely to belong to the same object region, which could be easily clustered into the same superpixel. To this end, we use contour scanning to estimate the global similarity between pixels and corresponded centers. In addition, we introduce pixel's gradient information of contour transform map to enhance the pixel's global similarity to overcome the missing contours in blurred region. Benefited from our global similarity, the proposed method could adherent with blurred and low contrast boundaries. A large number of experiments on BSDS500 and VOC2012 datasets show that the proposed algorithm performs better than traditional SLIC.

  • Measuring Semantic Similarity between Words Based on Multiple Relational Information

    Jianyong DUAN  Yuwei WU  Mingli WU  Hao WANG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2019/09/27
      Vol:
    E103-D No:1
      Page(s):
    163-169

    The similarity of words extracted from the rich text relation network is the main way to calculate the semantic similarity. Complex relational information and text content in Wikipedia website, Community Question Answering and social network, provide abundant corpus for semantic similarity calculation. However, most typical research only focused on single relationship. In this paper, we propose a semantic similarity calculation model which integrates multiple relational information, and map multiple relationship to the same semantic space through learning representing matrix and semantic matrix to improve the accuracy of semantic similarity calculation. In experiments, we confirm that the semantic calculation method which integrates many kinds of relationships can improve the accuracy of semantic calculation, compared with other semantic calculation methods.

  • Spectra Restoration of Bone-Conducted Speech via Attention-Based Contextual Information and Spectro-Temporal Structure Constraint Open Access

    Changyan ZHENG  Tieyong CAO  Jibin YANG  Xiongwei ZHANG  Meng SUN  

     
    LETTER-Digital Signal Processing

      Vol:
    E102-A No:12
      Page(s):
    2001-2007

    Compared with acoustic microphone (AM) speech, bone-conducted microphone (BCM) speech is much immune to background noise, but suffers from severe loss of information due to the characteristics of the human-body transmission channel. In this letter, a new method for the speaker-dependent BCM speech enhancement is proposed, in which we focus our attention on the spectra restoration of the distorted speech. In order to better infer the missing components, an attention-based bidirectional Long Short-Term Memory (AB-BLSTM) is designed to optimize the use of contextual information to model the relationship between the spectra of BCM speech and its corresponding clean AM speech. Meanwhile, a structural error metric, Structural SIMilarity (SSIM) metric, originated from image processing is proposed to be the loss function, which provides the constraint of the spectro-temporal structures in recovering of the spectra. Experiments demonstrate that compared with approaches based on conventional DNN and mean square error (MSE), the proposed method can better recover the missing phonemes and obtain spectra with spectro-temporal structure more similar to the target one, which leads to great improvement on objective metrics.

  • Hue Signature Auto Update System for Visual Similarity-Based Phishing Detection with Tolerance to Zero-Day Attack

    Shuichiro HARUTA  Hiromu ASAHINA  Fumitaka YAMAZAKI  Iwao SASASE  

     
    PAPER-Dependable Computing

      Pubricized:
    2019/09/04
      Vol:
    E102-D No:12
      Page(s):
    2461-2471

    Detecting phishing websites is imperative. Among several detection schemes, the promising ones are the visual similarity-based approaches. In those, targeted legitimate website's visual features referred to as signatures are stored in SDB (Signature Database) by the system administrator. They can only detect phishing websites whose signatures are highly similar to SDB's one. Thus, the system administrator has to register multiple signatures to detect various phishing websites and that cost is very high. This incurs the vulnerability of zero-day phishing attack. In order to address this issue, an auto signature update mechanism is needed. The naive way of auto updating SDB is expanding the scope of detection by adding detected phishing website's signature to SDB. However, the previous approaches are not suitable for auto updating since their similarity can be highly different among targeted legitimate website and subspecies of phishing website targeting that legitimate website. Furthermore, the previous signatures can be easily manipulated by attackers. In order to overcome the problems mentioned above, in this paper, we propose a hue signature auto update system for visual similarity-based phishing detection with tolerance to zero-day attack. The phishing websites targeting certain legitimate website tend to use the targeted website's theme color to deceive users. In other words, the users can easily distinguish phishing website if it has highly different hue information from targeted legitimate one (e.g. red colored Facebook is suspicious). Thus, the hue signature has a common feature among the targeted legitimate website and subspecies of phishing websites, and it is difficult for attackers to change it. Based on this notion, we argue that the hue signature fulfills the requirements about auto updating SDB and robustness for attackers' manipulating. This commonness can effectively expand the scope of detection when auto updating is applied to the hue signature. By the computer simulation with a real dataset, we demonstrate that our system achieves high detection performance compared with the previous scheme.

  • Low-Cost Method for Recognizing Table Tennis Activity

    Se-Min LIM  Jooyoung PARK  Hyeong-Cheol OH  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/06/18
      Vol:
    E102-D No:10
      Page(s):
    2051-2054

    This study designs a low-cost portable device that functions as a coaching assistant system which can support table tennis practice. Although deep learning technology is a promising solution to realizing human activity recognition, we propose using cosine similarity in making inferences. Our experiments show that the cosine similarity based inference can be a good alternative to the deep learning based inference for the assistant system when resources are limited.

1-20hit(161hit)