IEICE global.ieice.org Site

Keyword Search Result

[Keyword] q-gram(2hit)

1-2hit

Regular Expression Filtering on Multiple q-Grams
Seon-Ho SHIN HyunBong KIM MyungKeun YOON

LETTER-Information Network

Pubricized:
2017/10/11
Vol:
E101-D No:1
Page(s):
253-256
Regular expression matching is essential in network and big-data applications; however, it still has a serious performance bottleneck. The state-of-the-art schemes use a multi-pattern exact string-matching algorithm as a filtering module placed before a heavy regular expression engine. We design a new approximate string-matching filter using multiple q-grams; this filter not only achieves better space compactness, but it also has higher throughput than the existing filters.
Substring Count Estimation in Extremely Long Strings
Jinuk BAE Sukho LEE

PAPER-Database

Vol:
E89-D No:3
Page(s):
1148-1156
To estimate the number of substring matches against string data, count suffix trees (CS-tree) have been used as a kind of alphanumeric histograms. Although the trees are useful for substring count estimation in short data strings (e.g. name or title), they reveal several drawbacks when the target is changed to extremely long strings. First, it becomes too hard or at least slow to build CS-trees, because their origin, the suffix tree, has memory-bottleneck problem with long strings. Secondly, some of CS-tree-node counts are incorrect due to frequent pruning of nodes. Therefore, we propose the count q-gram tree (CQ-tree) as an alphanumeric histogram for long strings. By adopting q-grams (or length-q substrings), CQ-trees can be created fast and correctly within small available memory. Furthermore, we mathematically provide the lower and upper bounds that the count estimation can reach to. To the best of our knowledge, our work is the first one to present such bounds among research activities to estimate the alphanumeric selectivity. Our experimental study shows that the CQ-tree outperforms the CS-tree in terms of the building time and accuracy.

Keyword Search Result

[Keyword] q-gram(2hit)

Regular Expression Filtering on Multiple q-Grams

Substring Count Estimation in Extremely Long Strings

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles