Regular Expressions

Top  Previous  Next

With Regular Expressions you can define complex search and filter expressions. All regular expressions are case insensitive by default.

Regex functions

Regular Expressions must be placed into one of the following functions:

 

regex( ... )

Filters the given regular expression

For example: regex(\d+ downloads)

 

FirstRegex( ... )

Filters only the first occurrence of the defined regular expression

For example: FirstRegex(\d+ downloads)

 

StartToRegex( ... )

Filters everything from the page beginning to the first occurrence of the given Regular Expression

For example: StartToRegex(\d+ visitors)

 

RegexToRegex( ... , ... )

Filters everything between two Regular Expressions

For example: RegexToRegex(Downloads\: \d+,License\:)

 

RegexToEnd( ... )

Filters everything from the last occurrence of the given Regular Expression to the end of the page

For example: RegexToEnd(\d+ users online)

 

RegexCmp( ... )

Finds a defined regular expression, extracts all digits from the result and compares them with a pre-defined number. This can for example be used to extract and compare prices. Eg. to only find a match when a certain price is higher than 1000.

For example: RegexCmp(\d+([,\.]\d+)* Euro;,; > 1000)

The regexcmp function can be used in the Keywords functionality, Ignore filters and Watch filters. A detailed description can be found below.

Tokens of Regular Expressions

Below you can find a list of useful Regular Expression tokens.

 

\

The backslash escapes any character and can therefore be used to force characters to be matched as literals instead of being treated as characters with special meaning. For example, '\[' matches '[' and '\\' matches '\'.

.

A dot matches any character. For example, 'go.d' matches 'gold' and 'good'.

{ }

{n} ... Match exactly n times

{n,} ... Match at least n times

{n,m} ... Match at least n but not more than m times

[ ]

A string enclosed in square brackets matches any character in that string, but no others. For example, '[xyz]' matches only 'x', 'y', or 'z', a range of characters may be specified by two characters separated by '-'. Note that '[a-z]' matches alphabetic characters, while '[z-a]' never matches.

[-]

A hyphen within the brackets signifies a range of characters. For example, [b-o] matches any character from b through o.

|

A vertical bar matches either expression on either side of the vertical bar. For example, bar|car will match either bar or car.

*

An asterisk after a string matches any number of occurrences of that string, including zero characters. For example, bo* matches: bo, boo and booo but not b.

+

A plus sign after a string matches any number of occurrences of that string, except zero characters. For example, bo+ matches: boo, and booo, but not bo or be.

\d+

matches all numbers with one or more digits

\d*

matches all numbers with zero or more digits

\w+

matches all words with one or more characters containing a-z, A-Z and 0-9. \w+ will find title, border, width etc. Please note that \w matches only numbers and characters (a-z, A-Z, 0-9) lower than ordinal value 128.

\s

matches a whitespace (space, tab and carriage return/line feed)

.*?

find as few characters as possible.

a.*?b means: "find "a", followed by as few characters as possible, followed by "b

[a-zA-Z\xA1-\xFF]+

matches all words with one or more characters containing a-z, A-Z and characters larger than ordinal value 161 (eg. ä or Ü). If you want to find words with numbers, then add 0-9 to the expression: [0-9a-zA-Z\xA1-\xFF]+

(?-i)

By default, all regular expressions are case insensitive. If you add (?-i) in front of a regular expression, then it becomes case sensitive. For example regex((?-i)\d+ Comments)

RegexCmp(...)

The RegexCmp function finds a defined regular expression, extracts all digits from the result and compares them with a pre-defined number. If the comparison returns true, the match will be accepted.

This function requires 3 parameters (divided by the ; character), the exact syntax is:

 

    regexcmp(regular expression; decimal point character; operator number)

 

Parameters:

regular expression
This regular expression extracts defined numbers from a page. The result can contain characters and numbers, for example a regular expression that finds "Price: 49,00 Euro". The regexcmp function will then extract all digits from the found result and compare the extracted number.
decimal point character
Defines if a dot or a coma is used as decimal point character in the page. Valid parameter characters are "." and "," (without quotes).
operator number
valid operators:
= ... equal
< ... less than
<= ... less or equal than
> ... greater than
>= ... greater or equal than
<> ... not equal
The number defines the number for the comparison and can optionally contain a decimal point character, for example 49,95 or 49.95. Thousands separators are not allowed.

 

Example:

regexcmp(\d+([,\.]\d+)* Euro;,; > 49.95)

 

The first parameter searches the regular expression "\d+([,\.]\d+)* Euro" and extracts all digits from the found result (incl. decimal point character). For example 1449,95
The second parameter defines which character is used as decimal point character, in that example it's the character ","
The third parameter compares if the price is higher than 49.95
If the extracted price is lower or equal than 49.95, then the found match is omitted. If the extracted price is higher than 49.95, then the found match is accepted.

Typical examples

 

regex(bo*)

will find "b", "bo", "boo", "booooo"

 

regex(bx+)

will find "bxxxxxxxx", "bxx", "bx" but not "b"

 

regex(\d+)

will find all numbers

 

regex(\d+ visitors)

will find "3 visitors" or "243234 visitors" or "2763816 visitors"

 

regex(\d+ of \d+ messages)

will find "2 of 1200 messages" or "1 of 10 messages"

 

RegexToEnd(\d+ of \d+ messages)

will filter everything from the last occurrence of "2 of 1200 messages" or "1 of 10 messages" to the end of the page

 

regex(MyText.{0,20})

will find "MyText" and the next 20 characters after "MyText"

 

regex(\d\d.\d\d.\d\d\d\d)

will find date-strings with format 99.99.9999 or 99-99-9999 (the dot in the regex matches any character)

 

regex(\d\d\.\d\d\.\d\d\d\d)

will find date-strings with format 99.99.9999

 

regex(([_a-zA-Z\d\-\.]+@[_a-zA-Z\d\-]+(\.[_a-zA-Z\d\-]+)+))

will find all e-mail addresses

 

regexcmp(\d+([,\.]\d+)* Euro;,; > 49.95)

will find all prices with format "9.999,99 Euro" and only accept results with prices higher than 49,95 Euro




Translate document: