IEICE global.ieice.org Site

Keyword Search Result

[Keyword] regular expression(15hit)

1-15hit

Graph Linear Notations with Regular Expressions
Ren MIMURA Kengo MIYAMOTO Akio FUJIYOSHI

PAPER

Pubricized:
2023/10/11
Vol:
E107-D No:3
Page(s):
312-319
This paper proposes graph linear notations and an extension of them with regular expressions. Graph linear notations are a set of strings to represent labeled general graphs. They are extended with regular expressions to represent sets of graphs by specifying chosen parts for selections and repetitions of certain induced subgraphs. Methods for the conversion between graph linear notations and labeled general graphs are shown. The NP-completeness of the membership problem for graph regular expressions is proved.
Parallelization on a Minimal Substring Search Algorithm for Regular Expressions
Yosuke OBE Hiroaki YAMAMOTO Hiroshi FUJIWARA

PAPER-Fundamentals of Information Systems

Pubricized:
2023/02/08
Vol:
E106-D No:5
Page(s):
952-958
Let us consider a regular expression r of length m and a text string T of length n over an alphabet Σ. Then, the RE minimal substring search problem is to find all minimal substrings of T matching r. Yamamoto proposed O(mn) time and O(m) space algorithm using a Thompson automaton. In this paper, we improve Yamamoto's algorithm by introducing parallelism. The proposed algorithm runs in O(mn) time in the worst case and in O(mn/p) time in the best case, where p denotes the number of processors. Besides, we show a parameter related to the parallel time of the proposed algorithm. We evaluate the algorithm experimentally.
On Lookaheads in Regular Expressions with Backreferences
Nariyoshi CHIDA Tachio TERAUCHI

PAPER-Fundamentals of Information Systems

Pubricized:
2023/02/06
Vol:
E106-D No:5
Page(s):
959-975
Many modern regular expression engines employ various extensions to give more expressive support for real-world usages. Among the major extensions employed by many of the modern regular expression engines are backreferences and lookaheads. A question of interest about these extended regular expressions is their expressive power. Previous works have shown that (i) the extension by lookaheads does not enhance the expressive power, i.e., the expressive power of regular expressions with lookaheads is still regular, and that (ii) the extension by backreferences enhances the expressive power, i.e., the expressive power of regular expressions with backreferences (abbreviated as rewb) is no longer regular. This raises the following natural question: Does the extension of regular expressions with backreferences by lookaheads enhance the expressive power of regular expressions with backreferences? This paper answers the question positively by proving that adding either positive lookaheads or negative lookaheads increases the expressive power of rewb (the former abbreviated as rewblp and the latter as rewbln). A consequence of our result is that neither the class of finite state automata nor that of memory automata (MFA) of Schmid[2] (which corresponds to regular expressions with backreferenes but without lookaheads) corresponds to rewblp or rewbln. To fill the void, as a first step toward building such automata, we propose a new class of automata called memory automata with positive lookaheads (PLMFA) that corresponds to rewblp. The key idea of PLMFA is to extend MFA with a new kind of memories, called positive-lookahead memory, that is used to simulate the backtracking behavior of positive lookaheads. Interestingly, our positive-lookahead memories are almost perfectly symmetric to the capturing-group memories of MFA. Therefore, our PLMFA can be seen as a natural extension of MFA that can be obtained independently of its original intended purpose of simulating rewblp.
A New Finite Automata Construction Using a Prefix and a Suffix of Regular Expressions
Hiroaki YAMAMOTO Hiroshi FUJIWARA

PAPER

Pubricized:
2020/11/09
Vol:
E104-D No:3
Page(s):
381-388
This paper presents a new method to translate a regular expression into a nondeterministic finite automaton (an NFA for short). Let r be a regular expression and let M be a Thompson automaton for r. We first introduce a labeled Thompson automaton defined by assigning two types of expressions which denote prefixes and suffixes of words in L(r) to each state of M. Then we give new ϵ-free NFAs constructed from a labeled Thompson automaton. These NFAs are called a prefix equation automaton and a suffix equation automaton. We show that a suffix equation automaton is isomorphic to an equation automaton defined by Antimirov. Furthermore we give an NFA called a unified equation automaton by joining two NFAs. Thus the number of states of a unified equation automaton can be smaller than that of an equation automaton.
Multiple Regular Expression Pattern Monitoring over Probabilistic Event Streams
Kento SUGIURA Yoshiharu ISHIKAWA

PAPER

Pubricized:
2020/02/03
Vol:
E103-D No:5
Page(s):
982-991
As smartphones and IoT devices become widespread, probabilistic event streams, which are continuous analysis results of sensing data, have received a lot of attention. One of the applications of probabilistic event streams is monitoring of time series events based on regular expressions. That is, we describe a monitoring query such as “Has the tracked object moved from RoomA to RoomB in the past 30 minutes?” by using a regular expression, and then check whether corresponding events occur in a probabilistic event stream with a sliding window. Although we proposed the fundamental monitoring method of time series events in our previous work, three problems remain: 1) it is based on an unusual assumption about slide size of a sliding window, 2) the grammar of pattern queries did not include “negation”, and 3) it was not optimized for multiple monitoring queries. In this paper, we propose several techniques to solve the above problems. First, we remove the assumption about slide size, and propose adaptive slicing of sliding windows for efficient probability calculation. Second, we calculate the occurrence probability of a negation pattern by using an inverted DFA. Finally, we propose the merge of multiple DFAs based on disjunction to process multiple queries efficiently. Experimental results using real and synthetic datasets demonstrate effectiveness of our approach.
A Heuristic for Constructing Smaller Automata Based on Suffix Sorting and Its Application in Network Security
Inbok LEE Victor C. VALGENTI Min S. KIM Sung-il OH

LETTER

Pubricized:
2017/12/19
Vol:
E101-D No:3
Page(s):
613-615
In this paper we show a simple heuristic for constructing smaller automata for a set of regular expressions, based on suffix sorting: finding common prefixes and suffixes in regular expressions and merging them. It is an important problem in network security. We applied our approach to random and real-world regular expressions. Experimental results showed that our approach yields up to 12 times enhancement in throughput.
Regular Expression Filtering on Multiple q-Grams
Seon-Ho SHIN HyunBong KIM MyungKeun YOON

LETTER-Information Network

Pubricized:
2017/10/11
Vol:
E101-D No:1
Page(s):
253-256
Regular expression matching is essential in network and big-data applications; however, it still has a serious performance bottleneck. The state-of-the-art schemes use a multi-pattern exact string-matching algorithm as a filtering module placed before a heavy regular expression engine. We design a new approximate string-matching filter using multiple q-grams; this filter not only achieves better space compactness, but it also has higher throughput than the existing filters.
A Dynamically Reconfigurable FPGA-Based Pattern Matching Hardware for Subclasses of Regular Expressions
Yusaku KANETA Shingo YOSHIZAWA Shin-ichi MINATO Hiroki ARIMURA Yoshikazu MIYANAGA

PAPER-Computer System

Vol:
E95-D No:7
Page(s):
1847-1857
In this paper, we propose a novel architecture for large-scale regular expression matching, called dynamically reconfigurable bit-parallel NFA architecture (Dynamic BP-NFA), which allows dynamic loading of regular expressions on-the-fly as well as efficient pattern matching for fast data streams. This is the first dynamically reconfigurable hardware with guaranteed performance for the class of extended patterns, which is a subclass of regular expressions consisting of union of characters and its repeat. This class allows operators such as character classes, gaps, optional characters, and bounded and unbounded repeats of character classes. The key to our architecture is the use of bit-parallel pattern matching approach, in which the information of an input non-deterministic finite automaton (NFA) is first compactly encoded in bit-masks stored in a collection of registers and block RAMs. Then, the NFA is efficiently simulated by a fixed circuitry using bitwise Boolean and arithmetic operations consuming one input character per clock regardless of the actual contents of an input text. Experimental results showed that our hardwares for both string and extended patterns were comparable to previous dynamically reconfigurable hardwares in their performances.
A Design Method of a Regular Expression Matching Circuit Based on Decomposed Automaton
Hiroki NAKAHARA Tsutomu SASAO Munehiro MATSUURA

PAPER-Design Methodology

Vol:
E95-D No:2
Page(s):
364-373
This paper shows a design method for a regular expression matching circuit based on a decomposed automaton. To implement a regular expression matching circuit, first, we convert a regular expression into a non-deterministic finite automaton (NFA). Then, to reduce the number of states, we convert the NFA into a merged-states non-deterministic finite automaton with unbounded string transition (MNFAU) using a greedy algorithm. Next, to realize it by a feasible amount of hardware, we decompose the MNFAU into a deterministic finite automaton (DFA) and an NFA. The DFA part is implemented by an off-chip memory and a simple sequencer, while the NFA part is implemented by a cascade of logic cells. Also, in this paper, we show that the MNFAU based implementation has lower area complexity than the DFA and the NFA based ones. Experiments using regular expressions form SNORT shows that, as for the embedded memory size per a character, the MNFAU is 17.17-148.70 times smaller than DFA methods. Also, as for the number of LCs (Logic Cells) per a character, the MNFAU is 1.56-5.12 times smaller than NFA methods. This paper describes detail of the MEMOCODE2010 HW/SW co-design contest for which we won the first place award.
Unicode Canonical Decomposition for Hangeul Syllables in Regular Expression
Hee Yuan TAN Hyotaek LIM

PAPER-Natural Language Processing

Vol:
E94-D No:1
Page(s):
146-154
Owing to the high expressiveness of regular expression, it is frequently used in searching and manipulation of text based data. Regular expression is highly applicable in processing Latin alphabet based text, but the same cannot be said for Hangeul*, the writing system for Korean language. Although Hangeul possesses alphabetic features within the script, expressiveness of regular expression pattern using Hangeul is hindered by the absence of syllable decomposition. Without decomposition support in regular expression, searching through Hangeul text is limited to string literal matching. Literal matching has made enumeration of syllable candidates in regular expression pattern definition indispensable, albeit impractical, especially for a large set of syllable candidates. Although the existing implementation of canonical decomposition in Unicode standard does reduce a pre-composed Hangeul syllable into smaller unit of consonant-vowel or consonant-vowel-consonant letters, it still leaves quite a number of the individual letters in compounded form. We have observed that there is a necessity to further reduce the compounded letters into unit of basic letters to properly represent the Korean script in regular expression. We look at how the new canonical decomposition technique proposed by Kim can help in handling Hangeul in regular expression. In this paper, we examine several of the performance indicators of full decomposition of Hangeul syllable to better understand the overhead that might incur, if a full decomposition were to be implemented in a regular expression engine. For efficiency considerations, we propose a semi decomposition technique alongside with a notation for defining Hangeul syllables. The semi decomposition functions as an enhancement to the existing regular expression syntax by taking in some of the special constructs and features of the Korean language. This proposed technique intends to allow an end user to have a greater freedom to define regular expression syntax for Hangeul.
Parallel DFA Architecture for Ultra High Throughput DFA-Based Pattern Matching
Yi TANG Junchen JIANG Xiaofei WANG Chengchen HU Bin LIU Zhijia CHEN

PAPER

Vol:
E93-D No:12
Page(s):
3232-3242
Multi-pattern matching is a key technique for implementing network security applications such as Network Intrusion Detection/Protection Systems (NIDS/NIPSes) where every packet is inspected against tens of thousands of predefined attack signatures written in regular expressions (regexes). To this end, Deterministic Finite Automaton (DFA) is widely used for multi-regex matching, but existing DFA-based researches have claimed high throughput at an expense of extremely high memory cost, so fail to be employed in devices such as high-speed routers and embedded systems where the available memory is quite limited. In this paper, we propose a parallel architecture of DFA called Parallel DFA (PDFA) taking advantage of the large amount of concurrent flows to increase the throughput with nearly no extra memory cost. The basic idea is to selectively store the underlying DFA in memory modules that can be accessed in parallel. To explore its potential parallelism we intensively study DFA-split schemes from both state and transition points in this paper. The performance of our approach in both the average cases and the worst cases is analyzed, optimized and evaluated by numerical results. The evaluation shows that we obtain an average speedup of 100 times compared with traditional DFA-based matching approach.
Fast and Memory-Efficient Regular Expression Matching Using Transition Sharing
Shuzhuang ZHANG Hao LUO Binxing FANG Xiaochun YUN

PAPER-DRM and Security

Vol:
E92-D No:10
Page(s):
1953-1960
Scanning packet payload at a high speed has become a crucial task in modern network management due to its wide variety applications on network security and application-specific services. Traditionally, Deterministic finite automatons (DFAs) are used to perform this operation in linear time. However, the memory requirements of DFAs are prohibitively high for patterns used in practical packet scanning, especially when many patterns are compiled into a single DFA. Existing solutions for memory blow-up are making a trade-off between memory requirement and memory access of processing per input character. In this paper we proposed a novel method to drastically reduce the memory requirements of DFAs while still maintain the high matching speed and provide worst-case guarantees. We removed the duplicate transitions between states by dividing all the DFA states into a number of groups and making each group of states share a merged transition table. We also proposed an efficient algorithm for transition sharing between states. The high efficiency in time and space made our approach adapted to frequently updated DFAs. We performed several experiments on real world rule sets. Overall, for all rule sets and approach evaluated, our approach offers the best memory versus run-time trade-offs.
Bit-Parallel Algorithms for Translating Regular Expressions into NFAs
Hiroaki YAMAMOTO Takashi MIYAZAKI Masayuki OKAMOTO

PAPER-Automata

Vol:
E90-D No:2
Page(s):
418-427
The aim of the paper is to design efficient bit-parallel algorithms for translating regular expressions into nondeterministic finite automata (NFAs). Let r be a regular expression over an alphabet Σ, and let n be the length of r and let m be the number of symbols of Σ occurring in r. Then we present bit-parallel algorithms for translating regular expressions into Glushkov automata (position automata) and follow automata using Thompson automata. For Glushkov automata, we will give an algorithm which runs in O(n+mm/W) time and space. For follow automata, we will give a randomized algorithm which runs in O(n+mm/W) expected time and O(n+mm/W) space. We rely on a W-bit uniform RAM for estimating the complexities of algorithms. Since the best known algorithms for these automata runs in O(n+m2) time and space, our algorithms achieve an almost W-fold speed-up.
Efficient Forward Model Checking Algorithm for ω-Regular Properties
Hiroaki IWASHITA Tsuneo NAKATA

PAPER

Vol:
E82-A No:11
Page(s):
2448-2454
We present a symbolic language emptiness check algorithm based on forward state traversal. A verification property is given by a set of error traces written in ω-regular expression and is manipulated explicitly as a non-deterministic state transition graph. State space of the design model is implicitly traversed along the explicit graph. This method has a large amount of flexibility for controlling state traversal on the property space. It should become a good framework of incremental or approximate verification of ω-regular properties.
Verification of Register Transfer Level (RTL) Designs
Alberto Palacios PAWLOVSKY Sachio NAITO

PAPER

Vol:
E75-D No:6
Page(s):
785-791
This paper describes a new method for verifying designs at the RTL with respect to their specifications at the functional level. The base of the verification method shown here is the translation of the specification and design representations to graph models, where the descriptions common to both representations have a symbolic representation. These symbol labeled graphs are then simplified and, by solving the all node-pair path expression problem for them, a pair of regular expressions is obtained for every two nodes in the graphs. The first regular expression in each pair represents the flow of control and the second one the flow of data between the corresponding nodes. The process of verification is carried out by checking whether or not every pair of regular expressions of the specification has a corresponding pair in the design.