Regular Expressions

Top  Previous  Next

Below you can find a list of useful Regular Expression tokens. A detailed description and more tokens can be found on many places in the world wide web.

 

\

The backslash escapes any character and can therefore be used to force characters to be matched as literals instead of being treated as characters with special meaning. For example, '\[' matches '[' and '\\' matches '\'.

.

A dot matches any character. For example, 'go.d' matches 'gold' and 'good'.

{ }

{n} ... Match exactly n times

{n,} ... Match at least n times

{n,m} ... Match at least n but not more than m times

[ ]

A string enclosed in square brackets matches any character in that string, but no others. For example, '[xyz]' matches only 'x', 'y', or 'z', a range of characters may be specified by two characters separated by '-'. Note that '[a-z]' matches alphabetic characters, while '[z-a]' never matches.

[-]

A hyphen within the brackets signifies a range of characters. For example, [b-o] matches any character from b through o.

|

A vertical bar matches either expression on either side of the vertical bar. For example, bar|car will match either bar or car.

*

An asterisk after a string matches any number of occurrences of that string, including zero characters. For example, bo* matches: bo, boo and booo but not b.

+

A plus sign after a string matches any number of occurrences of that string, except zero characters. For example, bo+ matches: boo, and booo, but not bo or be.

\d+

matches all numbers with one or more digits

\d*

matches all numbers with zero or more digits

\w+

matches all words with one or more characters containing a-z, A-Z and 0-9. \w+ will find title, border, width etc. Please note that \w matches only numbers and characters (a-z, A-Z, 0-9) lower than ordinal value 128.

\s

matches a whitespace (space, tab and carriage return/line feed)

.*?

find as few characters as possible.

a.*?b means: "find "a", followed by as few characters as possible, followed by "b

[a-zA-Z\xA1-\xFF]+

matches all words with one or more characters containing a-z, A-Z and characters larger than ordinal value 161 (eg. ä or Ü). If you want to find words with numbers, then add 0-9 to the expression: [0-9a-zA-Z\xA1-\xFF]+

(?-i)

By default, all regular expressions are case insensitive. If you add (?-i) in front of a regular expression, then it becomes case sensitive. For example regex((?-i)\d+ Comments)

Typical examples

 

regex(bo*)

will find "b", "bo", "boo", "booooo"

 

regex(bx+)

will find "bxxxxxxxx", "bxx", "bx" but not "b"

 

regex(\d+)

will find all numbers

 

regex(\d+ visitors)

will find "3 visitors" or "243234 visitors" or "2763816 visitors"

 

regex(\d+ of \d+ messages)

will find "2 of 1200 messages" or "1 of 10 messages"

 

RegexToEnd(\d+ of \d+ messages)

will filter everything from the last occurrence of "2 of 1200 messages" or "1 of 10 messages" to the end of the page

 

regex(MyText.{0,20})

will find "MyText" and the next 20 characters after "MyText"

 

regex(\d\d.\d\d.\d\d\d\d)

will find date-strings with format 99.99.9999 or 99-99-9999 (the dot in the regex matches any character)

 

regex(\d\d\.\d\d\.\d\d\d\d)

will find date-strings with format 99.99.9999

 

regex(([_a-zA-Z\d\-\.]+@[_a-zA-Z\d\-]+(\.[_a-zA-Z\d\-]+)+))

will find all e-mail addresses




Translate document: