Logical Combinations of Regular Expressions in Hyperscan

ID 标签 751699
已更新 9/24/2018
版本 Latest
公共

author-image

作者

Introduction

Compared to previous versions, Hyperscan 5.0 supports more than just the matching of regular expressions. In the usage scenarios of previous versions, the subsequent action is sometimes decided based on the logical combination of matches from a group of patterns. For example, when it is required that a set of regular expressions be completely, partially, or not matched, users must record the matching results and do the logic calculation themselves. In contrast, Hyperscan 5.0 provides a new feature: for the logical combination among regular expressions, users may define its expression first, after which Hyperscan can complete the logic calculation of the matching results from regular expressions and then report the results directly.

Feature Description

For situations when a user requires behavior dependent on the presence or absence of matches from groups of patterns, Hyperscan provides support for the logical combination of patterns in a given pattern set, with three operators: AND, OR and NOT.

The logical value of such a combination is based on each expression's matching status at a given offset. The matching status of any expression has a Boolean value: false if the expression has not yet matched or true if the expression has already matched. In particular, the value of a "NOT" operation at a given offset is true if the expression it refers to is false at this offset. For example, "NOT 101" means that expression 101 has not yet matched at this offset.

A logical combination is passed to Hyperscan as an expression at compile time. This combination expression will raise matches at every offset in which one of its subexpressions matches and the logical value of the whole expression is true.

To illustrate, here is an example of a combination expression:

((301 OR 302) AND 303) AND (304 OR NOT 305)
  1. If expression 301 matches at offset 10, the logical value of 301 is true while the other patterns' values are false. Hence, the whole combination's value is false.
  2. Then expression 303 matches at offset 20. Now the values of 301 and 303 are true while the other patterns' values are still false. In this case, the combination's value is true, so the combination expression raises a match at offset 20.
  3. Finally, expression 305 has matches at offset 30. Now the values of 301, 303 and 305 are true while the other patterns' values are still false. In this case, the combination's value is false and no match is raised.

Example - Parsing a logical combination expression

Usage

Create an expression

In logical combination syntax, an expression is written using infix notation, which consists of operands, operators, and parentheses. The operands are expression IDs, and operators are "! (NOT)", "& (AND)" or "| (OR)". For example, the combination described in the previous section would be written this way:

((301 | 302) & 303) & (304 | !305)

The logical combination expression:

  1. The priority of operators are ! > & > |. For example:
    A&B|C is treated as (A&B)|C.
    A|B&C is treated as A|(B&C).
    A&!B is treated as A&(!B).
  2. Extra parentheses are allowed. Example: (A)&!(B) is the same as A&!B.
    (A&B)|C is the same as A&B|C.
  3. Whitespace is ignored.

Compile

A logical combination expression must be passed to one of the Hyperscan compiler API functions ( (hs_compile_multi(), hs_compile_ext_multi() ), along with the HS_FLAG_COMBINATION flag, which identifies the pattern as a logical combination expression. The patterns referred to in the logical combination expression must be compiled in the same pattern set as the combination expression.

When an expression has the HS_FLAG_COMBINATION flag set, it ignores all other flags except the HS_FLAG_SINGLEMATCH flag and the HS_FLAG_QUIET flag.

At compile time, Hyperscan will reject logical combination expressions that evaluate to true when no patterns have matched. For example:

!101
!101|102
!101&!102
!(101&102)

Patterns that are referred to as operands within a logical combination (for example, 301 through 305 in the examples above) may also use the HS_FLAG_QUIET flag to silence the reporting of individual matches for those patterns. In the absence of this flag, all matches (for both individual patterns and their logical combinations) will be reported.

When an expression has both the HS_FLAG_COMBINATION flag and the HS_FLAG_QUIET flag set, no matches for this logical combination will be reported.

Examples

Below are examples illustrating the usage of logical combinations.

Define a single logical combination

Some of the subrules may use the HS_FLAG_QUIET flag to keep silence for their own matches, which does not influence the result of logical combinations. This is shown in the following code example:

hs_database_t *db = nullptr;
hs_compile_error_t *compile_err = nullptr;
const char *expr[] = {"abc",
                      "def",
                      "foobar.*gh",
                      "teakettle{4,10}",
                      "ijkl[mMn]",
                      "(101 & 102 & 103) | (104 & !105)"};
unsigned flags[] = {HS_FLAG_QUIET,
                    HS_FLAG_QUIET,
                    HS_FLAG_QUIET,
                    HS_FLAG_QUIET,
                    0,
                    HS_FLAG_COMBINATION};
unsigned ids[] = {101, 102, 103, 104, 105, 1001};
hs_error_t err = hs_compile_multi(expr, flags, ids, 6, HS_MODE_NOSTREAM,
                                  nullptr, &db, &compile_err);

Define multiple logical combinations

Based on different needs, expressions of logical combination can also use the HS_FLAG_SINGLEMATCH flag, as shown in this code example:

hs_database_t *db = nullptr;
hs_compile_error_t *compile_err = nullptr;
const char *expr[] = {"abc",
                      "def",
                      "foobar.*gh",
                      "teakettle{4,10}",
                      "ijkl[mMn]",
                      "(101 & 102 & 103) | (104 & !105)",
                      "!101 & 102",
                      "!(!101 | 102)",
                      "101 & !102"};
unsigned flags[] = {HS_FLAG_QUIET,
                    HS_FLAG_QUIET,
                    HS_FLAG_QUIET,
                    HS_FLAG_QUIET,
                    0,
                    HS_FLAG_COMBINATION | HS_FLAG_SINGLEMATCH,
                    HS_FLAG_COMBINATION,
                    HS_FLAG_COMBINATION | HS_FLAG_SINGLEMATCH,
                    HS_FLAG_COMBINATION | HS_FLAG_SINGLEMATCH};
unsigned ids[] = {101, 102, 103, 104, 105, 1001, 1002, 1003, 1004};
hs_error_t err = hs_compile_multi(expr, flags, ids, 9, HS_MODE_NOSTREAM,
                                  nullptr, &db, &compile_err); 

Summary

Logical combinations is a new feature in Hyperscan 5.0 that extends regular expression matching functionality. It can handle matching behaviors among a vast array of patterns, while using the existing API to avoid impact on previous use cases.

You can get Hyperscan 5.0 from GitHub*, and you can find more logical combination usage examples in the Hyperscan logical combinations unit test.

Give it a try and have fun!

References

"