Physical Database Design for Efficient Time-Series Similarity Search

Sang-Wook KIM; Jinho KIM; Sanghyun PARK

doi:10.1093/ietcom/e91-b.4.1251

IEICE TRANSACTIONS on Communications

Physical Database Design for Efficient Time-Series Similarity Search

Sang-Wook KIM, Jinho KIM, Sanghyun PARK

Full Text Views

0

Cite this

Summary :

Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.

Publication: IEICE TRANSACTIONS on Communications Vol.E91-B No.4 pp.1251-1254

Publication Date: 2008/04/01

Publicized

Online ISSN: 1745-1345

DOI: 10.1093/ietcom/e91-b.4.1251

Type of Manuscript: LETTER

Category: Multimedia Systems for Communications

Cite this

Copy

Sang-Wook KIM, Jinho KIM, Sanghyun PARK, "Physical Database Design for Efficient Time-Series Similarity Search" in IEICE TRANSACTIONS on Communications, vol. E91-B, no. 4, pp. 1251-1254, April 2008, doi: 10.1093/ietcom/e91-b.4.1251.
Abstract: Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.
URL: https://global.ieice.org/en_transactions/communications/10.1093/ietcom/e91-b.4.1251/_p

Copy

@ARTICLE{e91-b_4_1251,
author={Sang-Wook KIM, Jinho KIM, Sanghyun PARK, },
journal={IEICE TRANSACTIONS on Communications},
title={Physical Database Design for Efficient Time-Series Similarity Search},
year={2008},
volume={E91-B},
number={4},
pages={1251-1254},
abstract={Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.},
keywords={},
doi={10.1093/ietcom/e91-b.4.1251},
ISSN={1745-1345},
month={April},}

Copy

TY - JOUR
TI - Physical Database Design for Efficient Time-Series Similarity Search
T2 - IEICE TRANSACTIONS on Communications
SP - 1251
EP - 1254
AU - Sang-Wook KIM
AU - Jinho KIM
AU - Sanghyun PARK
PY - 2008
DO - 10.1093/ietcom/e91-b.4.1251
JO - IEICE TRANSACTIONS on Communications
SN - 1745-1345
VL - E91-B
IS - 4
JA - IEICE TRANSACTIONS on Communications
Y1 - April 2008
AB - Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.
ER -

IEICE TRANSACTIONS on Communications

Physical Database Design for Efficient Time-Series Similarity Search

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Communications

Physical Database Design for Efficient Time-Series Similarity Search

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles