Protein Fold Classification Using Large Margin Combination of Distance Metrics

Chendra Hadi SURYANTO; Kazuhiro FUKUI; Hideitsu HINO

doi:10.1587/transinf.2015EDP7294

IEICE TRANSACTIONS on Information

Protein Fold Classification Using Large Margin Combination of Distance Metrics

Chendra Hadi SURYANTO, Kazuhiro FUKUI, Hideitsu HINO

Full Text Views

0

Cite this

Summary :

Many methods have been proposed for measuring the structural similarity between two protein folds. However, it is difficult to select one best method from them for the classification task, as each method has its own strength and weakness. Intuitively, combining multiple methods is one solution to get the optimal classification results. In this paper, by generalizing the concept of the large margin nearest neighbor (LMNN), a method for combining multiple distance metrics from different types of protein structure comparison methods for protein fold classification task is proposed. While LMNN is limited to Mahalanobis-based distance metric learning from a set of feature vectors of training data, the proposed method learns an optimal combination of metrics from a set of distance metrics by minimizing the distances between intra-class data and enlarging the distances of different classes' data. The main advantage of the proposed method is the capability in finding an optimal weight coefficient for combination of many metrics, possibly including poor metrics, avoiding the difficulties in selecting which metrics to be included for the combination. The effectiveness of the proposed method is demonstrated on classification experiments using two public protein datasets, namely, Ding Dubchak dataset and ENZYMES dataset.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.3 pp.714-723

Publication Date: 2016/03/01

Publicized: 2015/12/14

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2015EDP7294

Type of Manuscript: PAPER

Category: Pattern Recognition

Authors

Chendra Hadi SURYANTO
  University of Tsukuba
Kazuhiro FUKUI
  University of Tsukuba
Hideitsu HINO
  University of Tsukuba

Keyword

protein fold classification, distance metrics combination, large margin nearest neighbor, kernel learning, optimization

Cite this

Copy

Chendra Hadi SURYANTO, Kazuhiro FUKUI, Hideitsu HINO, "Protein Fold Classification Using Large Margin Combination of Distance Metrics" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 3, pp. 714-723, March 2016, doi: 10.1587/transinf.2015EDP7294.
Abstract: Many methods have been proposed for measuring the structural similarity between two protein folds. However, it is difficult to select one best method from them for the classification task, as each method has its own strength and weakness. Intuitively, combining multiple methods is one solution to get the optimal classification results. In this paper, by generalizing the concept of the large margin nearest neighbor (LMNN), a method for combining multiple distance metrics from different types of protein structure comparison methods for protein fold classification task is proposed. While LMNN is limited to Mahalanobis-based distance metric learning from a set of feature vectors of training data, the proposed method learns an optimal combination of metrics from a set of distance metrics by minimizing the distances between intra-class data and enlarging the distances of different classes' data. The main advantage of the proposed method is the capability in finding an optimal weight coefficient for combination of many metrics, possibly including poor metrics, avoiding the difficulties in selecting which metrics to be included for the combination. The effectiveness of the proposed method is demonstrated on classification experiments using two public protein datasets, namely, Ding Dubchak dataset and ENZYMES dataset.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7294/_p

Copy

@ARTICLE{e99-d_3_714,
author={Chendra Hadi SURYANTO, Kazuhiro FUKUI, Hideitsu HINO, },
journal={IEICE TRANSACTIONS on Information},
title={Protein Fold Classification Using Large Margin Combination of Distance Metrics},
year={2016},
volume={E99-D},
number={3},
pages={714-723},
abstract={Many methods have been proposed for measuring the structural similarity between two protein folds. However, it is difficult to select one best method from them for the classification task, as each method has its own strength and weakness. Intuitively, combining multiple methods is one solution to get the optimal classification results. In this paper, by generalizing the concept of the large margin nearest neighbor (LMNN), a method for combining multiple distance metrics from different types of protein structure comparison methods for protein fold classification task is proposed. While LMNN is limited to Mahalanobis-based distance metric learning from a set of feature vectors of training data, the proposed method learns an optimal combination of metrics from a set of distance metrics by minimizing the distances between intra-class data and enlarging the distances of different classes' data. The main advantage of the proposed method is the capability in finding an optimal weight coefficient for combination of many metrics, possibly including poor metrics, avoiding the difficulties in selecting which metrics to be included for the combination. The effectiveness of the proposed method is demonstrated on classification experiments using two public protein datasets, namely, Ding Dubchak dataset and ENZYMES dataset.},
keywords={},
doi={10.1587/transinf.2015EDP7294},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Protein Fold Classification Using Large Margin Combination of Distance Metrics
T2 - IEICE TRANSACTIONS on Information
SP - 714
EP - 723
AU - Chendra Hadi SURYANTO
AU - Kazuhiro FUKUI
AU - Hideitsu HINO
PY - 2016
DO - 10.1587/transinf.2015EDP7294
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2016
AB - Many methods have been proposed for measuring the structural similarity between two protein folds. However, it is difficult to select one best method from them for the classification task, as each method has its own strength and weakness. Intuitively, combining multiple methods is one solution to get the optimal classification results. In this paper, by generalizing the concept of the large margin nearest neighbor (LMNN), a method for combining multiple distance metrics from different types of protein structure comparison methods for protein fold classification task is proposed. While LMNN is limited to Mahalanobis-based distance metric learning from a set of feature vectors of training data, the proposed method learns an optimal combination of metrics from a set of distance metrics by minimizing the distances between intra-class data and enlarging the distances of different classes' data. The main advantage of the proposed method is the capability in finding an optimal weight coefficient for combination of many metrics, possibly including poor metrics, avoiding the difficulties in selecting which metrics to be included for the combination. The effectiveness of the proposed method is demonstrated on classification experiments using two public protein datasets, namely, Ding Dubchak dataset and ENZYMES dataset.
ER -

IEICE TRANSACTIONS on Information