Regex stands for regular expression. It is an integral part of every programming language. Without the use of regular expressions, you can not execute complex queries. A regular expression, also sometimes called rational expression, is a sequence of characters that defines a search pattern in the text.
Many string-searching algorithms use regular expressions for ‘find’ and ‘find and replace’ operations on strings. Many programming languages come with built-in regex capabilities, while others provide via plug-ins.
Regular expression or regex is primarily used in Google Analytics in URL matching. Some other popular use cases of a regex are lexical analysis, search and replace dialogs of word processors and text editors, and text processing utilities.
Due to its excessive importance, many people are eager to learn regex syntax and expressions to appear for interviews. For your quick reference, you can simply consider this regular expression cheat sheet PDF.
In this regex guide, you will get to know the working of various regex symbols and regex expressions with proper examples. Let’s get started with the Java regex cheat sheet.
Regex Cheat Sheet
1. Characters Escapes
The backslash character (\) in the following table indicates that the character that follows it is a special character.
Escaped character | Description | Pattern | Matches |
\a | Matches a bell character, \u0007. | \a | "\u0004" in "Error!" + '\u0004' |
\b | Will match a backspace within a character class, \u0008. | [\b]{3,} | "\b\b\b\b" in "\b\b\b\b" |
\t | It will match a tab, \u0009. | (\w+)\t | "i1\t", "i2\t" in "i1\ti\t" |
\r | It will match a carriage return, \u000D. (\r is not equivalent to the newline character, \n.) | \r\n(\w+) | "\r\nThese" in "\r\nThese are\ntwo lines." |
\v | It will match a vertical tab, \u000B. | [\v]{2,} | "\v\v\v" in "\v\v\v" |
\f | It will match a form feed, \u000C. | [\f]{2,} | "\f\f\f" in "\f\f\f" |
\n | It will match a new line, \u000A. | \r\n(\w+) | "\r\nThese" in "\r\nThese are\ntwo lines." |
\e | It will match an escape, \u001B. | \e | "\x001B" in "\x001B" |
\ nnn | It uses octal representation to specify a character (nnn consists of two or three digits). | \w\040\w | "a b", "c d" in "a bc d" |
\x nn | It uses the hexadecimal representation to specify a character (nn consists of exactly two digits). | \w\x20\w | "a b", "c d" in "a bc d" |
\c X \c x | It will match the ASCII control character that is specified by X or x, where X or x is the letter of the control character. | \cC | "\x0003" in "\x0003" (Ctrl-C) |
\u nnnn | It will match a Unicode character using hexadecimal representation (exactly four digits, as represented by nnnn). | \w\u0020\w | "a b", "c d" in "a bc d" |
2. Character Classes
A character class will match any one of a set of characters. Character classes include the language elements that are listed in the following table.
Character class | Description | Pattern | Matches |
[ character_group ] | It will match any single character present in the character_group. By default, the match is case-sensitive. | [ae] | "a" in "bay" "a", "e" in "stake" |
[^ character_group ] | Negation: it will match any single character that is not present in the character_group. By default, characters in character_group are case-sensitive. | [^aei] | "r", "g", "n" in "reign" |
[ first - last ] | Character range: it will match any single character present in the range from first to last. | [A-Z] | "A", "B" in "AB123" |
. | Wildcard: it will match any single character except \n. If you want to match a literal period character (. or \u002E), you have to precede it with the escape character (\.). | a.e | "ave" in "have" "ate" in "hater" |
\p{ name } | It will match any single character available in the Unicode general category or named block specified by name. | \p{Lu} \p{IsCyrillic} | "C", "L" in "City Lights" "Ð", "Ð" in "ÐÐem" |
\P{ name } | It will match any single character not available in the Unicode general category or named block specified by name. | \P{Lu} \P{IsCyrillic} | "i", "t", "y" in "City" "e", "m" in "ÐÐem" |
\w | It will match any word character. | \w | "I", "D", "A", "1", "3" in "ID A1.3" |
\W | It will match any non-word character. | \W | " ", "." in "ID A1.3" |
\s | It will match any white-space character. | \w\s | "D " in "ID A1.3" |
\S | It will match any non-white-space character. | \s\S | " _" in "int __ctr" |
\d | It will match any decimal digit. | \d | "4" in "4 = IV" |
\D | It will match any character other than a decimal digit. | \D | " ", "=", " ", "I", "V" in "4 = IV" |
3. Character Class Operations
Class Operation | Legend | Example | Sample Match |
[…-[…]] | .NET: it is a character class subtraction. One character on the left, but not in the subtracted class. | [a-z-[aeiou]] | Any lowercase consonant |
[…-[…]] | .NET: it is a character class subtraction. | [\p{IsArabic}-[\D]] | An Arabic character and not a non-digit, i.e., an Arabic digit |
[…&&[…]] | Java, Ruby 2+: it is a character class intersection. One character on the left and in the && class. | [\S&&[\D]] | An non-whitespace character and a non-digit. |
[…&&[…]] | Java, Ruby 2+: character class intersection. | [\S&&[\D]&&[^a-zA-Z]] | An non-whitespace character that a non-digit and not a letter. |
[…&&[^…]] | Java, Ruby 2+: it is a character class subtraction is obtained by intersecting a class with a negated class | [a-z&&[^aeiou]] | An English lowercase letter that is not a vowel. |
[…&&[^…]] | Java, Ruby 2+: it is a character class subtraction | [\p{InArabic}&&[^\p{L}\p{N}]] | An Arabic character and not a letter or a number |
4. Anchors
Anchors are also known as the atomic zero-width assertions. It results the match to succeed or fail based on the current position in the string. But these anchors cannot be used to allow the engine to advance through the string or characters. The metacharacters that are listed in the following table are anchors.
Assertion | Description | Pattern | Matches |
^ | By default, the match starts from the beginning of the string. Also, in the case of the multiline mode, it will also start at the beginning of the line. | ^\d{3} | "111" in "111-333-" |
$ | By default, the match will occur at the end of the string or just before \n at the end of the string. In the case of the multiline mode, it will occur just before the end of the line or before \n at the end of the line. | -\d{3}$ | "-444" in "-901-444" |
\A | The match occurs at the start of the string. | \A\d{3} | "222" in "222-333-" |
\Z | The match occurs at the end of the string or before \n at the end of the string. | -\d{3}\Z | "-111" in "-555-111" |
\z | The match occurs at the end of the string. | -\d{3}\z | "-111" in "-901-111" |
\G | The match occurs at the point where the previous match ended. | \G\(\d\) | "(1)", "(3)", "(5)" in "(1)(3)(5)[7](9)" |
\b | The match occurs on a boundary between a \w (alphanumeric) and a \W (nonalphanumeric) character. | \b\w+\s\w+\b | "them theme", "them them" in "them theme them them" |
\B | The match will not occur on a \b boundary. | \Bend\w*\b | "ends", "ender" in "end sends endure lender" |
The Complete Regular Expressions(Regex) Course For Beginners
5. Grouping Constructs
Grouping constructs delineate subexpressions of a regular expression and capture substrings of the provided string. Grouping constructs uses the following language elements.
Grouping construct | Description | Pattern | Matches |
( subexpression ) | It will capture the matched subexpression and assigns it with a one-based ordinal number. | (\w)\1 | "ll" in "hello" |
(?< name > subexpression ) or (?' name ' subexpression ) | It will capture the matched subexpression into a named group. | (?<double>\w)\k<double> | "ll" in "hello" |
(?< name1 - name2 > subexpression ) or (?' name1 - name2 ' subexpression ) | It will define a balancing group definition. | (((?'Open'\()[^\(\)]*)+((?'Close-Open'\))[^\(\)]*)+)*(?(Open)(?!))$ | "((1-3)*(3-1))" in "3+2^((1-3)*(3-1))" |
(?: subexpression ) | It will define a noncapturing group. | Write(?:Line)? | "WriteLine" in "Console.WriteLine()" "Write" in "Console.Write(value)" |
(?imnsx-imnsx: subexpression ) | It will apply or disable the specified options within subexpression. | A\d{2}(?i:\w+)\b | "A12xl", "A12XL" in "A12xl A12XL a12xl" |
(?= subexpression ) | Zero-width positive lookahead assertion. | \b\w+\b(?=.+and.+) | "rats", "bats" in "rats, bats and some mice." |
(?! subexpression ) | Zero-width negative lookahead assertion. | \b\w+\b(?!.+and.+) | "and", "some", "mice" in "rats, bats and some mice." |
(?<= subexpression ) | Zero-width positive lookbehind assertion. | \b\w+\b(?<=.+and.+) ——————————— \b\w+\b(?<=.+and.*) | "some", "mice" in "rats, bats and some mice." ———————————— "and", "some", "mice" in "rats, bats and some mice." |
(?<! subexpression ) | Zero-width negative lookbehind assertion. | \b\w+\b(?<!.+and.+) ——————————— \b\w+\b(?<!.+and.*) | "rats", "bats", "and" in "rats, bats and some mice." ———————————— "rats", "bats" in "rats, bats and some mice." |
(?> subexpression ) | Atomic group. | (?>a|ab)c | "ac" in"ac" nothing in"abc" |
6. Lookarounds
When the regex engine starts processing the lookaround expression, it takes a substring from the current position to the start (lookbehind) or end (lookahead) of the original string, and then runs Regex.IsMatch on that selected substring with the help of the lookaround pattern. You can determine the success of the result based on a positive or negative assertion.
Lookaround | Name | Example | Sample Match |
(?=check) | Positive Lookahead | (?=\d{10})\d{5} | 06678 in 0667856789 |
(?<=check) | Positive Lookbehind | (?<=\d)rat | bat in 1bat |
(?!check) | Negative Lookahead | (?!theatre)the\w+ | theme |
(?<!check) | Negative Lookbehind | \w{3}(?<!mon)ster | Munster |
7. Quanitfiers
A quantifier will simply specify how many instances of the previous element must be available in the input string for resulting in a perfect match. Quantifiers include the following language elements.
Quantifier | Description | Pattern | Matches |
* | It will match the previous element zero or more times. | \d*\.\d | ".0", "19.9", "219.9" |
+ | It will match the previous element one or more times. | "se+" | "see" in "seen", "se" in "sent" |
? | It will match the previous element zero or one time. | "mai?n" | "man", "main" |
{ n } | It will match the previous element exactly n times. | ",\d{3}" | ",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210" |
{ n ,} | It will match the previous element at least n times. | "\d{2,}" | "166", "29", "1930" |
{ n , m } | It will match the previous element at least n times, but no more than m times. | "\d{3,5}" | "166", "17668" "19302" in "193024" |
*? | It will match the previous element zero or more times, but as few times as possible. | \d*?\.\d | ".0", "19.9", "219.9" |
+? | It will match the previous element one or more times, but as few times as possible. | "se+?" | "se" in "seen", "se" in "sent" |
?? | It will match the previous element zero or one time, but as few times as possible. | "mai??n" | "man", "main" |
{ n }? | It will match the preceding element exactly n times. | ",\d{3}?" | ",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210" |
{ n ,}? | It will match the previous element at least n times, but as few times as possible. | "\d{2,}?" | "166", "29", "1930" |
{ n , m }? | It will match the previous element between n and m times, but as few times as possible. | "\d{3,5}?" | "166", "17668" "193", "024" in "193024" |
8. Backreference Constructs
With backreference, you can simply identify the subexpression subsequently in the same regular expression. The following table highlights the backreference constructs:
Backreference construct |
Description |
Pattern |
Matches |
\ number |
Backreference. It will match the value of a numbered subexpression. |
(\w)\1 |
"ee" in "peek" |
\k< name > |
Named backreference. It will match the value of a named expression. |
(?<char>\w)\k<char> |
"ee" in "peek" |
9. Alteration Constructs
Alternation constructs will alter a regular expression to enable the “either/or” matching. These constructs come with the language elements that are listed in the following table.
Alternation construct | Description | Pattern | Matches |
| | It will match any one element that is separated by the vertical bar (|) character. | th(e|is|at) | "the", "this" in "this is the day." |
(?( expression ) yes | no ) or (?( expression ) yes ) | It will match “yes” if the regex pattern designated by expression matches; else, it will match the optional “no” part. The provided expression is interpreted as a zero-width assertion. To avoid ambiguity with a named or numbered capturing group, you must use the optional explicit assertion, such as (?( (?= expression ) ) yes | no ) | (?(A)A\d{2}\b|\b\d{3}\b) | "A10", "910" in "A10 C103 910" |
(?( name ) yes | no ) or (?( name ) yes ) | It will match “yes” if name, a named or numbered capturing group, has a match; else, it will match the optional no. | (?<quoted>")?(?(quoted).+?"|\S+\s) | "Dogs.jpg ", "\"Yiska playing.jpg\"" in "Dogs.jpg \"Yiska playing.jpg\"" |
10. Substitutions
Substitutions are regex language elements that are used in replacement patterns. The following table lists metacharacters that are atomic zero-width assertions.
Character | Description | Pattern | Replacement pattern | Input string | Result string |
$ number | It will substitute the substring matched by group number. | \b(\w+)(\s)(\w+)\b | $3$2$1 | "one two" | "two one" |
${ name } | It will substitute the substring matched by the named group name. | \b(?<word1>\w+)(\s)(?<word2>\w+)\b | ${word2} ${word1} | "one two" | "two one" |
$$ | It will substitute a literal "$". | \b(\d+)\s?USD | $$$1 | "44 USD" | "$44" |
$& | It will substitute a copy of the whole match. | \$?\d*\.?\d+ | **$&** | "$1.30" | "**$1.30**" |
$` | It will substitute all the text of the input string before the match. | B+ | $` | "DDBBCC" | "DDDDCC" |
$' | It will substitute all the text of the input string after the match. | B+ | $' | "AADDCC" | "AACCCC" |
$+ | It will substitute the last group that was captured. | B+(C+) | $+ | "AABBCCDD" | "AACCDD" |
$_ | It will substitute the entire input string. | B+ | $_ | "AABBCC" | "AAAABBCCCC" |
11. Inline Options
The following are the inline options supported by the .Net regex engine:
Option | Description | Pattern | Matches |
i | It is for case-insensitive matching. | \b(?i)a(?-i)a\w+\b | "aardvark", "aaaAuto" in "aardvark AAAuto aaaAuto Adam breakfast" |
m | In the case of the multiline mode. ^ and $ match the beginning and end of a line, instead of the beginning and end of a string. | ||
n | It will not capture unnamed groups. | ||
s | It will use the single-line mode. | ||
x | It will ignore the unescaped white space in the regular expression pattern. | \b(?x) \d+ \s \w+ | "1 aardvark", "2 cats" in "1 aardvark 2 cats IV centurions" |
12. POSIX Character Classes
A character class matches a small sequence of characters with a large set of characters. We can use POSIX character classes only within bracket expressions. The POSIX standard supports the following classes of characters to create regular expressions.
Character | Legend | Example | Sample Match |
[:alpha:] | PCRE (C, PHP, R…): ASCII letters A-Z and a-z | [8[:alpha:]]+ | WellDone88 |
[:alpha:] | Ruby 2: Unicode letter or ideogram | [[:alpha:]\d]+ | коÑка99 |
[:alnum:] | PCRE (C, PHP, R…): ASCII digits and letters A-Z and a-z | [[:alnum:]]{10} | ABC1235251 |
[:alnum:] | Ruby 2: Unicode digit, letter or ideogram | [[:alnum:]]{10} | коÑка90210 |
[:punct:] | PCRE (C, PHP, R…): ASCII punctuation mark | [[:punct:]]+ | ?!.,:; |
[:punct:] | Ruby: Unicode punctuation mark | [[:punct:]]+ | â½,:ã½â |
13. Inline Modifiers
The following modifiers are not supported in JavaScript. If you are using Ruby, make sure to carefully use the “?s” and “?m”.
Modifier | Legend | Example | Sample Match |
(?i) | Case-insensitive mode (except JavaScript) | (?i)Monday | monDAY |
(?s) | DOTALL mode (except JS and Ruby). The dot (.) will match the new line characters (\r\n). You can also refer it as the "single-line mode" because the dot treats the entire input as a single line | (?s)From A.*to Z | From A to Z |
(?m) | Multiline mode (except Ruby and JS) ^ and $ match at the beginning and end of every line | (?m)1\r\n^2$\r\n^3$ | 1 2 3 |
(?m) | In Ruby: it is as same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks | (?m)From A.*to Z | From A to Z |
(?x) | Free-Spacing Mode mode (except JavaScript). You can also refer it as comment mode or whitespace mode | (?x) # this is a # comment abc # write on multiple # lines [ ]d # spaces must be # in brackets | abc d |
(?n) | .NET, PCRE 10.30+: named capture only | Turns all (parentheses) into non-capture groups. To capture, use named groups. | |
(?d) | Java: Unix linebreaks only | The dot and the ^ and $ anchors are only affected by \n | |
(?^) | PCRE 10.32+: unset modifiers | Unsets ismnx modifiers |
Conclusion
As a beginner, it might be a lot to take in this regex cheat sheet. Due to various characters and symbols, it might be difficult to remember all these expressions in the right place. It requires a lot of practice to master regex.
Well, if you are stuck somewhere, you can simply refer to this regex cheat sheet. We have covered almost every expression, character, and symbol commonly used using regex. We hope that this cheat sheet will help you understand everything about regex.
Frequently Asked Questions
1. What are regex commands?
Regex commands filter the data based on the given regular expressions.
regex <field> [=|!= <regular expression>] | [IN | NOT IN (<regular expression> [(,<regular expression>)*])]
2. What does *$ mean in regex?
Regular expression does not have anything like “*$”. It is an invalid character. The “*” is a regex operator that means zero or more occurrences of the character or subexpression that precedes it. While the “$” matches the end of the string being matched, assuming that what precedes it in the regex can match the preceding characters in the string.
For example:
a*
It will match any sequence of zero or more occurrences of the character a.
a*$
It will match zero or more occurrences of an at the end of the string being matched.
3. Is regex the same in all languages?
Regular expression syntax may vary slightly for all programming languages, but most details are the same. Some regex implementations come with slightly different variations in processing and what certain special character sequences mean.
4. What is a question mark in regex?
A question mark (?) regex specifies zero or one occurrence of the preceding element.
For example:
abc?d means match ab followed by c (optional) with a mandatory d.
ab(cde)?f means match ab followed by cde (optional) with a mandatory f.
People are also reading:
- What is Programming?
- What is Programming Language?
- Best Programming Books
- Best Programming Interview Questions
- Best Programming Languages to learn
- Programming Languages for Getting a Jobs
- What is Functional Programming?
- What is Procedural Programming?
- Programming Paradigm
- How to learn to program?
- Free Coding Bootcamp
- Best Web Development IDE
- How to Code a Game?