The search functionality is under construction.

Keyword Search Result

[Keyword] protein structure(2hit)

1-2hit
  • Measuring the Similarity of Protein Structures Using Image Compression Algorithms

    Morihiro HAYASHIDA  Tatsuya AKUTSU  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E94-D No:12
      Page(s):
    2468-2478

    For measuring the similarity of biological sequences and structures such as DNA sequences, protein sequences, and tertiary structures, several compression-based methods have been developed. However, they are based on compression algorithms only for sequential data. For instance, protein structures can be represented by two-dimensional distance matrices. Therefore, it is expected that image compression is useful for measuring the similarity of protein structures because image compression algorithms compress data horizontally and vertically. This paper proposes series of methods for measuring the similarity of protein structures. In the methods, an original protein structure is transformed into a distance matrix, which is regarded as a two-dimensional image. Then, the similarity of two protein structures is measured by a kind of compression ratio of the concatenated image. We employed several image compression algorithms, JPEG, GIF, PNG, IFS, and SPC. Since SPC often gave better results among the other image compression methods, and it is simple and easy to be modified, we modified SPC and obtained MSPC. We applied the proposed methods to clustering of protein structures, and performed Receiver Operating Characteristic (ROC) analysis. The results of computational experiments suggest that MSPC has the best performance among existing compression-based methods. We also present some theoretical results on the time complexity and Kolmogorov complexity of image compression-based protein structure comparison.

  • Protein Structure Alignment Using Dynamic Programing and Iterative Improvement

    Tatsuya AKUTSU  

     
    PAPER-Algorithm and Computational Complexity

      Vol:
    E79-D No:12
      Page(s):
    1629-1636

    In this paper, we consider the protein structure alignment problem, which is a very important problem in molecular biology. Since an outline of protein structure is represented by a sequence of points in three-dimensional space, this problem is defined as the following geometric pattern matching problem: given two point sequences P and Q in three-dimensions and a real number δ > 0, find a maximum-cardinality set of point pairs such that the distance between each pair is at most δ under the condition that any translation and rotation can be applied to P. Since it is very difficult to solve this problem exactly, we consider algorithms that solve it approximately. We propose three algorithms: BASICALIGN, RANDALIGN and FRAGALIGN whose worst case time complexities are O(n8), O((n7/k3) polylog(n)) and O(n4) respectively, where n denotes the size of larger input structure and k denotes the minimum size of the alignment to be obtained. All of these have the following common framework: a series of initial superpositions are computed; for each of such superpositions, a rough alignment is first computed using a dynamic programming technique, and then it is refined through an iterative improvement procedure which also uses dynamic programming; the best alignment among them is selected as an output. The difference among three algorithms lies in the methods of finding initial superpositions. BASICALIGN, RANDALIGN and FRAGALIGN use exhaustive search, random sampling technique and fragment-based search, respectively. We prove guaranteed approximation ratios (in the sense of distances between point pairs) for theoretical versions of BASICALIGN and RANDALIGN. Practical versions of RANDALIGN and FRAGALIGN were implemented and compared with a previous algorithm using real protein structure data. The experimental results show that FRAGALIGN is best among them and it outputs good alignments quickly.