Python RegEx | Docs With Examples

Python RegEx (regular expression) is a powerful tool for pattern matching and text manipulation.

The re module provides built-in support for regex, allowing you to search, extract, and modify text efficiently. Regex patterns can match digits, special characters, and even unicode text. Since regex has a special meaning in pattern matching, understanding how to use escape sequences is essential.

Importing the re Module

Before using regex in Python, you need to import the re module:

import re  # The regex module in Python

Why Use Regular Expressions?

Pattern Matching: Find specific patterns in strings, such as email addresses or phone numbers.
Text Validation: Ensure that strings conform to expected formats (e.g., validating user input).
Data Extraction: Extract parts of a string based on patterns.
String Replacement: Modify text efficiently using search-and-replace operations.

A regex pattern is a sequence of characters that defines a search pattern. The use of word boundaries ensures precise matching, and square brackets help define character sets. Parentheses are used to create groups within a pattern, which can be accessed separately.

How to Get a RegEx Match in Python?

The primary functions provided by the re module to get a match include:

re.search(pattern, string)  # Searches for the first occurrence of the pattern
re.match(pattern, string)   # Checks if the pattern matches at the start of the string
re.findall(pattern, string) # Returns a list of all occurrences of the pattern
re.fullmatch(pattern, string) # Ensures the entire string matches the pattern

What is the match Function in Python RegEx?

The re.match() function checks if the beginning of a string matches a pattern. It is useful when you need to confirm that the search pattern occurs at the start of the string.

A tuple containing the span of the match is returned when a successful match is found. None is returned if no match is found.

Let's take a look at the basic syntax to check for a match at the start of a string:

text = "Python is powerful!"
match = re.match(r'Python', text)
if match:
    print("Match found!", "Span:", match.span())
else:
    print("No match.")

Output:

Match found! Span: (0, 6)

Understanding Span in Regex

The span in regex refers to the range of indices in the input string where a match occurs. The span() method of a match object returns a tuple containing the start and end indices of the match.

For example:

text = "Find the number 42 in this text."
match = re.search(r'\d+', text)
if match:
    print("Matched number:", match.group(), "Span:", match.span())

Output:

Matched number: 42 Span: (14, 16)

How to Check if a String Matches a RegEx Pattern in Python?

To check if a string matches a regex pattern, you can use re.match(), re.search(), or re.fullmatch().

At times, a regex pattern may be case-insensitive, which can be achieved using flags.

pattern = r'^[a-z]+$'  # Only lowercase letters allowed
text = "python"

if re.fullmatch(pattern, text):
    print("Valid format")
else:
    print("Invalid format")

Output:

Valid format

How to Match a Pattern in RegEx?

The re.search() function looks for the first occurrence of a pattern in a string and returns a match object if found.

String literals are often used in regex patterns to ensure exact matches. A backslash is used to escape special characters when needed. A raw string can be used to prevent issues with escape sequences in regex patterns.

text = "The price is $25.99"
match = re.search(r'\d+\.\d+', text)
if match:
    print("Found price:", match.group(), "Span:", match.span())

Output:

Found price: 25.99 Span: (13, 18)

Special Characters and Character Classes

Regex supports special characters like . (wildcard), ^ (beginning of the string), $ (end of the string), and predefined character classes:

\d – Matches any digit (equivalent to [0-9]).
\w – Matches any word character (letters, digits, underscore _).
\s – Matches whitespace characters.
\S – Matches any non-whitespace character.
[] – Square brackets define a set of characters to match.

Using regex to detect an alphanumeric character can be useful when validating user input.

Case-Insensitive Matching and Flags

Regex allows case-insensitive matching using flags like re.IGNORECASE. This ensures a match regardless of the case of the sequence of characters in the input string.

pattern = r'hello'
text = "HELLO world"
match = re.search(pattern, text, flags=re.IGNORECASE)
print("Found:", match.group())

Output:

Found: HELLO

Substitution Using re.sub()

The re.sub() function allows for substitution of matched patterns with a replacement string. It is useful for data cleaning and formatting.

text = "Replace newline character with a space\n"
new_text = re.sub(r'\n', ' ', text)
print(new_text)

Output:

Replace newline character with a space

Common Regex Patterns

Here are some common regex patterns and their usage that are definitely worth adding to your Python regex cheat sheet:

\d+ – Matches one or more digits.
\w+ – Matches one or more word characters.
^abc – Matches abc at the start of a string.
abc$ – Matches abc at the end of a string.
a{2,4} – Matches a repeated 2 to 4 times.
[^abc] – Matches any character except a, b, or c.
(abc|def) – Matches either abc or def.

Key Takeaways

Regular expressions are powerful for searching, extracting, and modifying text in Python projects.
Use re.search() to find the first occurrence of a pattern in a string.
re.match() checks if a pattern occurs at the start of a string.
re.findall() returns a list of all occurrences of a pattern.
Use re.sub() for efficient text substitution.
Flags like re.IGNORECASE help with case-insensitive matching.
The span() method returns the start and end indices of a match.
Raw strings (r'') help avoid escape sequence conflicts in regex patterns.

Practice Exercise

To reinforce your understanding of regex in Python, try solving the following problem in your Python editor:

Write a Python script that extracts all valid email addresses from a given text and replaces them with [EMAIL REDACTED]. The script should handle various email formats and domain extensions.

import re

text = "Contact us at support@example.com or sales@my-company.org for inquiries."
pattern = r'[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}'
redacted_text = re.sub(pattern, '[EMAIL REDACTED]', text)

print("Redacted text:", redacted_text)

Expected Output:

Redacted text: Contact us at [EMAIL REDACTED] or [EMAIL REDACTED] for inquiries.

Wrapping Up

Regular expressions are an essential tool for working with text data. Whether validating user input, searching for patterns, or extracting structured data, mastering regex will significantly enhance your Python programming skills. Understanding how to use word boundaries, escape sequences, raw strings, and character classes will help you match complex patterns effectively. Happy coding!

Python RegEx | Docs With Examples

Importing the re Module

Why Use Regular Expressions?

How to Get a RegEx Match in Python?

What is the match Function in Python RegEx?

Understanding Span in Regex

How to Check if a String Matches a RegEx Pattern in Python?

How to Match a Pattern in RegEx?

Special Characters and Character Classes

Case-Insensitive Matching and Flags

Substitution Using re.sub()

Common Regex Patterns

Key Takeaways

Practice Exercise

Wrapping Up

Learn More

Always be in the loop.