The search functionality is under construction.

Keyword Search Result

[Keyword] metric learning(14hit)

1-14hit
  • MemFRCN: Few Shot Object Detection with Memorable Faster-RCNN

    TongWei LU  ShiHai JIA  Hao ZHANG  

     
    LETTER-Vision

      Pubricized:
    2022/05/24
      Vol:
    E105-A No:12
      Page(s):
    1626-1630

    At this stage, research in the field of Few-shot image classification (FSC) has made good progress, but there are still many difficulties in the field of Few-shot object detection (FSOD). Almost all of the current FSOD methods face catastrophic forgetting problems, which are manifested in that the accuracy of base class recognition will drop seriously when acquiring the ability to recognize Novel classes. And for many methods, the accuracy of the model will fall back as the class increases. To address this problem we propose a new memory-based method called Memorable Faster R-CNN (MemFRCN), which makes the model remember the categories it has already seen. Specifically, we propose a new tow-stage object detector consisting of a memory-based classifier (MemCla), a fully connected neural network classifier (FCC) and an adaptive fusion block (AdFus). The former stores the embedding vector of each category as memory, which enables the model to have memory capabilities to avoid catastrophic forgetting events. The final part fuses the outputs of FCC and MemCla, which can automatically adjust the fusion method of the model when the number of samples increases so that the model can achieve better performance under various conditions. Our method can perform well on unseen classes while maintaining the detection accuracy of seen classes. Experimental results demonstrate that our method outperforms other current methods on multiple benchmarks.

  • Searching and Learning Discriminative Regions for Fine-Grained Image Retrieval and Classification

    Kangbo SUN  Jie ZHU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/10/18
      Vol:
    E105-D No:1
      Page(s):
    141-149

    Local discriminative regions play important roles in fine-grained image analysis tasks. How to locate local discriminative regions with only category label and learn discriminative representation from these regions have been hot spots. In our work, we propose Searching Discriminative Regions (SDR) and Learning Discriminative Regions (LDR) method to search and learn local discriminative regions in images. The SDR method adopts attention mechanism to iteratively search for high-response regions in images, and uses this as a clue to locate local discriminative regions. Moreover, the LDR method is proposed to learn compact within category and sparse between categories representation from the raw image and local images. Experimental results show that our proposed approach achieves excellent performance in both fine-grained image retrieval and classification tasks, which demonstrates its effectiveness.

  • Deep Metric Learning for Multi-Label and Multi-Object Image Retrieval

    Jonathan MOJOO  Takio KURITA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/03/08
      Vol:
    E104-D No:6
      Page(s):
    873-880

    Content-based image retrieval has been a hot topic among computer vision researchers for a long time. There have been many advances over the years, one of the recent ones being deep metric learning, inspired by the success of deep neural networks in many machine learning tasks. The goal of metric learning is to extract good high-level features from image pixel data using neural networks. These features provide useful abstractions, which can enable algorithms to perform visual comparison between images with human-like accuracy. To learn these features, supervised information of image similarity or relative similarity is often used. One important issue in deep metric learning is how to define similarity for multi-label or multi-object scenes in images. Traditionally, pairwise similarity is defined based on the presence of a single common label between two images. However, this definition is very coarse and not suitable for multi-label or multi-object data. Another common mistake is to completely ignore the multiplicity of objects in images, hence ignoring the multi-object facet of certain types of datasets. In our work, we propose an approach for learning deep image representations based on the relative similarity of both multi-label and multi-object image data. We introduce an intuitive and effective similarity metric based on the Jaccard similarity coefficient, which is equivalent to the intersection over union of two label sets. Hence we treat similarity as a continuous, as opposed to discrete quantity. We incorporate this similarity metric into a triplet loss with an adaptive margin, and achieve good mean average precision on image retrieval tasks. We further show, using a recently proposed quantization method, that the resulting deep feature can be quantized whilst preserving similarity. We also show that our proposed similarity metric performs better for multi-object images than a previously proposed cosine similarity-based metric. Our proposed method outperforms several state-of-the-art methods on two benchmark datasets.

  • Adversarial Metric Learning with Naive Similarity Discriminator

    Yi-ze LE  Yong FENG  Da-jiang LIU  Bao-hua QIANG  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2020/03/10
      Vol:
    E103-D No:6
      Page(s):
    1406-1413

    Metric learning aims to generate similarity-preserved low dimensional feature vectors from input images. Most existing supervised deep metric learning methods usually define a carefully-designed loss function to make a constraint on relative position between samples in projected lower dimensional space. In this paper, we propose a novel architecture called Naive Similarity Discriminator (NSD) to learn the distribution of easy samples and predict their probability of being similar. Our purpose lies on encouraging generator network to generate vectors in fitting positions whose similarity can be distinguished by our discriminator. Adequate comparison experiments was performed to demonstrate the ability of our proposed model on retrieval and clustering tasks, with precision within specific radius, normalized mutual information and F1 score as evaluation metrics.

  • Partial Label Metric Learning Based on Statistical Inference

    Tian XIE  Hongchang CHEN  Tuosiyu MING  Jianpeng ZHANG  Chao GAO  Shaomei LI  Yuehang DING  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/03/05
      Vol:
    E103-D No:6
      Page(s):
    1355-1361

    In partial label data, the ground-truth label of a training example is concealed in a set of candidate labels associated with the instance. As the ground-truth label is inaccessible, it is difficult to train the classifier via the label information. Consequently, manifold structure information is adopted, which is under the assumption that neighbor/similar instances in the feature space have similar labels in the label space. However, the real-world data may not fully satisfy this assumption. In this paper, a partial label metric learning method based on likelihood-ratio test is proposed to make partial label data satisfy the manifold assumption. Moreover, the proposed method needs no objective function and treats the data pairs asymmetrically. The experimental results on several real-world PLL datasets indicate that the proposed method outperforms the existing partial label metric learning methods in terms of classification accuracy and disambiguation accuracy while costs less time.

  • Threshold Auto-Tuning Metric Learning

    Rachelle RIVERO  Yuya ONUMA  Tsuyoshi KATO  

     
    PAPER-Pattern Recognition

      Pubricized:
    2019/03/04
      Vol:
    E102-D No:6
      Page(s):
    1163-1170

    It has been reported repeatedly that discriminative learning of distance metric boosts the pattern recognition performance. Although the ITML (Information Theoretic Metric Learning)-based methods enjoy an advantage that the Bregman projection framework can be applied for optimization of distance metric, a weak point of ITML-based methods is that the distance threshold for similarity/dissimilarity constraints must be determined manually, onto which the generalization performance is sensitive. In this paper, we present a new formulation of metric learning algorithm in which the distance threshold is optimized together. Since the optimization is still in the Bregman projection framework, the Dykstra algorithm can be applied for optimization. A nonlinear equation has to be solved to project the solution onto a half-space in each iteration. We have developed an efficient technique for projection onto a half-space. We empirically show that although the distance threshold is automatically tuned for the proposed metric learning algorithm, the accuracy of pattern recognition for the proposed algorithm is comparable, if not better, to the existing metric learning methods.

  • Network Embedding with Deep Metric Learning

    Xiaotao CHENG  Lixin JI  Ruiyang HUANG  Ruifei CUI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/12/26
      Vol:
    E102-D No:3
      Page(s):
    568-578

    Network embedding has attracted an increasing amount of attention in recent years due to its wide-ranging applications in graph mining tasks such as vertex classification, community detection, and network visualization. Network embedding is an important method to learn low-dimensional representations of vertices in networks, aiming to capture and preserve the network structure. Almost all the existing network embedding methods adopt the so-called Skip-gram model in Word2vec. However, as a bag-of-words model, the skip-gram model mainly utilized the local structure information. The lack of information metrics for vertices in global network leads to the mix of vertices with different labels in the new embedding space. To solve this problem, in this paper we propose a Network Representation Learning method with Deep Metric Learning, namely DML-NRL. By setting the initialized anchor vertices and adding the similarity measure in the training progress, the distance information between different labels of vertices in the network is integrated into the vertex representation, which improves the accuracy of network embedding algorithm effectively. We compare our method with baselines by applying them to the tasks of multi-label classification and data visualization of vertices. The experimental results show that our method outperforms the baselines in all three datasets, and the method has proved to be effective and robust.

  • Stochastic Dykstra Algorithms for Distance Metric Learning with Covariance Descriptors

    Tomoki MATSUZAWA  Eisuke ITO  Raissa RELATOR  Jun SESE  Tsuyoshi KATO  

     
    PAPER-Pattern Recognition

      Pubricized:
    2017/01/13
      Vol:
    E100-D No:4
      Page(s):
    849-856

    In recent years, covariance descriptors have received considerable attention as a strong representation of a set of points. In this research, we propose a new metric learning algorithm for covariance descriptors based on the Dykstra algorithm, in which the current solution is projected onto a half-space at each iteration, and which runs in O(n3) time. We empirically demonstrate that randomizing the order of half-spaces in the proposed Dykstra-based algorithm significantly accelerates convergence to the optimal solution. Furthermore, we show that the proposed approach yields promising experimental results for pattern recognition tasks.

  • Face Hallucination by Learning Local Distance Metric

    Yuanpeng ZOU  Fei ZHOU  Qingmin LIAO  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2016/11/07
      Vol:
    E100-D No:2
      Page(s):
    384-387

    In this letter, we propose a novel method for face hallucination by learning a new distance metric in the low-resolution (LR) patch space (source space). Local patch-based face hallucination methods usually assume that the two manifolds formed by LR and high-resolution (HR) image patches have similar local geometry. However, this assumption does not hold well in practice. Motivated by metric learning in machine learning, we propose to learn a new distance metric in the source space, under the supervision of the true local geometry in the target space (HR patch space). The learned new metric gives more freedom to the presentation of local geometry in the source space, and thus the local geometries of source and target space turn to be more consistent. Experiments conducted on two datasets demonstrate that the proposed method is superior to the state-of-the-art face hallucination and image super-resolution (SR) methods.

  • Utilizing Shape-Based Feature and Discriminative Learning for Building Detection

    Shangqi ZHANG  Haihong SHEN  Chunlei HUO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2016/11/18
      Vol:
    E100-D No:2
      Page(s):
    392-395

    Building detection from high resolution remote sensing images is challenging due to the high intraclass variability and the difficulty in describing buildings. To address the above difficulties, a novel approach is proposed based on the combination of shape-specific feature extraction and discriminative feature classification. Shape-specific feature can capture complex shapes and structures of buildings. Discriminative feature classification is effective in reflecting similarities among buildings and differences between buildings and backgrounds. Experiments demonstrate the effectiveness of the proposed approach.

  • Manifold Kernel Metric Learning for Larger-Scale Image Annotation

    Lihua GUO  

     
    LETTER-Pattern Recognition

      Pubricized:
    2015/04/03
      Vol:
    E98-D No:7
      Page(s):
    1396-1400

    An appropriate similarity measure between images is one of the key techniques in search-based image annotation models. In order to capture the nonlinear relationships between visual features and image semantics, many kernel distance metric learning(KML) algorithms have been developed. However, when challenged with large-scale image annotation, their metrics can't explicitly represent the similarity between image semantics, and their algorithms suffer from high computation cost. Therefore, they always lose their efficiency. In this paper, we propose a manifold kernel metric learning (M_KML) algorithm. Our M_KML algorithm will simultaneously learn the manifold structure and the image annotation metrics. The main merit of our M_KML algorithm is that the distance metrics are builded on image feature's interior manifold structure, and the dimensionality reduction on manifold structure can handle the high dimensionality challenge faced by KML. Final experiments verify our method's efficiency and effectiveness by comparing it with state-of-the-art image annotation approaches.

  • Similar Speaker Selection Technique Based on Distance Metric Learning Using Highly Correlated Acoustic Features with Perceptual Voice Quality Similarity

    Yusuke IJIMA  Hideyuki MIZUNO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2014/10/15
      Vol:
    E98-D No:1
      Page(s):
    157-165

    This paper analyzes the correlation between various acoustic features and perceptual voice quality similarity, and proposes a perceptually similar speaker selection technique based on distance metric learning. To analyze the relationship between acoustic features and voice quality similarity, we first conduct a large-scale subjective experiment using the voices of 62 female speakers and perceptual voice quality similarity scores between all pairs of speakers are acquired. Next, multiple linear regression analysis is carried out; it shows that four acoustic features are highly correlated to voice quality similarity. The proposed speaker selection technique first trains a transform matrix based on distance metric learning using the perceptual voice quality similarity acquired in the subjective experiment. Given an input speech, acoustic features of the input speech are transformed using the trained transform matrix, after which speaker selection is performed based on the Euclidean distance on the transformed acoustic feature space. We perform speaker selection experiments and evaluate the performance of the proposed technique by comparing it to speaker selection without feature space transformation. The results indicate that transformation based on distance metric learning reduces the error rate by 53.9%.

  • Adaptive Metric Learning for People Re-Identification

    Guanwen ZHANG  Jien KATO  Yu WANG  Kenji MASE  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E97-D No:11
      Page(s):
    2888-2902

    There exist two intrinsic issues in multiple-shot person re-identification: (1) large differences in camera view, illumination, and non-rigid deformation of posture that make the intra-class variance even larger than the inter-class variance; (2) only a few training data that are available for learning tasks in a realistic re-identification scenario. In our previous work, we proposed a local distance comparison framework to deal with the first issue. In this paper, to deal with the second issue (i.e., to derive a reliable distance metric from limited training data), we propose an adaptive learning method to learn an adaptive distance metric, which integrates prior knowledge learned from a large existing auxiliary dataset and task-specific information extracted from a much smaller training dataset. Experimental results on several public benchmark datasets show that combined with the local distance comparison framework, our adaptive learning method is superior to conventional approaches.

  • Nonlinear Metric Learning with Deep Independent Subspace Analysis Network for Face Verification

    Xinyuan CAI  Chunheng WANG  Baihua XIAO  Yunxue SHAO  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E96-D No:12
      Page(s):
    2830-2838

    Face verification is the task of determining whether two given face images represent the same person or not. It is a very challenging task, as the face images, captured in the uncontrolled environments, may have large variations in illumination, expression, pose, background, etc. The crucial problem is how to compute the similarity of two face images. Metric learning has provided a viable solution to this problem. Until now, many metric learning algorithms have been proposed, but they are usually limited to learning a linear transformation. In this paper, we propose a nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Independent Subspace Analysis (ISA) network. Compared to the linear or kernel based metric learning methods, the proposed deep ISA network is a deep and local learning architecture, and therefore exhibits more powerful ability to learn the nature of highly variable dataset. We evaluate our method on the Labeled Faces in the Wild dataset, and results show superior performance over some state-of-the-art methods.