A framework of context-sensitive grammar transform for speeding-up compressed pattern matching (CPM) is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching algorithm. The compression ratio is a match for gzip and Re-Pair, and the search speed of our CPM algorithm is almost twice faster than the KMP-type CPM algorithm on Byte-Pair-Encoding by Shibata et al., and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al., which is regarded as one of the best combinations that allows a practically fast search.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Shirou MARUYAMA, Youhei TANAKA, Hiroshi SAKAMOTO, Masayuki TAKEDA, "Context-Sensitive Grammar Transform: Compression and Pattern Matching" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 2, pp. 219-226, February 2010, doi: 10.1587/transinf.E93.D.219.
Abstract: A framework of context-sensitive grammar transform for speeding-up compressed pattern matching (CPM) is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching algorithm. The compression ratio is a match for gzip and Re-Pair, and the search speed of our CPM algorithm is almost twice faster than the KMP-type CPM algorithm on Byte-Pair-Encoding by Shibata et al., and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al., which is regarded as one of the best combinations that allows a practically fast search.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.219/_p
Copy
@ARTICLE{e93-d_2_219,
author={Shirou MARUYAMA, Youhei TANAKA, Hiroshi SAKAMOTO, Masayuki TAKEDA, },
journal={IEICE TRANSACTIONS on Information},
title={Context-Sensitive Grammar Transform: Compression and Pattern Matching},
year={2010},
volume={E93-D},
number={2},
pages={219-226},
abstract={A framework of context-sensitive grammar transform for speeding-up compressed pattern matching (CPM) is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching algorithm. The compression ratio is a match for gzip and Re-Pair, and the search speed of our CPM algorithm is almost twice faster than the KMP-type CPM algorithm on Byte-Pair-Encoding by Shibata et al., and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al., which is regarded as one of the best combinations that allows a practically fast search.},
keywords={},
doi={10.1587/transinf.E93.D.219},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - Context-Sensitive Grammar Transform: Compression and Pattern Matching
T2 - IEICE TRANSACTIONS on Information
SP - 219
EP - 226
AU - Shirou MARUYAMA
AU - Youhei TANAKA
AU - Hiroshi SAKAMOTO
AU - Masayuki TAKEDA
PY - 2010
DO - 10.1587/transinf.E93.D.219
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2010
AB - A framework of context-sensitive grammar transform for speeding-up compressed pattern matching (CPM) is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching algorithm. The compression ratio is a match for gzip and Re-Pair, and the search speed of our CPM algorithm is almost twice faster than the KMP-type CPM algorithm on Byte-Pair-Encoding by Shibata et al., and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al., which is regarded as one of the best combinations that allows a practically fast search.
ER -