How To Check If A Python String Contains a Substring

How can you check if a Python string contains a substring? Great question, and that’s exactly what I’m going to cover in this article.

Whether you’re brand new to Python and still learning about strings or an experienced pro, how well do you understand how to find if a Python string contains a substring?

Sure, there are lots of ways to do this, but are you sure you’re looking for substrings in the most Pythonic way? And where do you stand on LYBL vs EAFP?

These are all interesting topics that I’ll be tackling here, so let’s dive in to learn how to find whether a Python string contains a substring.

What Is A Python Substring?

When it comes to the Python programming language, a substring is essentially a portion or segment of a string object.

If you think of a string as a chain of characters, a substring is any continuous link within that chain.

For instance, in the string "Pythonista", "Python" is a substring.

Substrings are not just limited to the beginning of the string; "onista", "thonis", or even a single character like "t" are also considered substrings.

Understanding substrings is fundamental in many Python projects, as they are pivotal in various operations, from data manipulation to pattern recognition.

In the simplest possible sense, they are the building blocks for more complex string processing tasks, and any good Python course should cover how to work with them.

How Do You Check If A Python String Contains Another String?

So, you're delving into the world of Python strings, and you want to know how to check if one Python string contains a substring. Let's unpack this!

In Python, there are various methods to tackle this, each with its own charm and syntax. Some methods are more 'Pythonic' than others, which means that some adhere to the Zen of Python philosophy to emphasize readability and simplicity.

Now, what’s the most Pythonic way to check for a substring? Testing for membership. It's like asking, "Hey, is this little string a part of you, bigger string?"

This approach is perfectly in line with Python's EAFP principle – Easier to Ask for Forgiveness than Permission, and it’s about trying things out and managing exceptions if they arise rather than checking everything upfront.

Why does this matter? Because in Python, handling exceptions is not just easy but also efficient. Put simply, it's the Python way of doing things.

But there's more in the toolkit! We've also got methods like .index(), .find(), .split(), and even Regular Expressions (RegEx).

Oh, and for my fellow data enthusiasts, there are methods tailored for Pandas DataFrames too.

I’m going to explore each of these methods.

And while I’m going to champion the Pythonic approach of using the in keyword, we're all about being thorough here. So, everything gets its moment in the spotlight.

A quick heads-up, though: I’m not the biggest fan of using .find().

Sure, it's LBYL (Look Before You Leap) and doesn't demand exception handling. But I much prefer the Pythonic EAFP style, and .index() fits right in for this.

I also think that .index() more clearly indicates that it's giving you an index, while .find() can be a bit vague for my tastes. What does it find?

Sure, I know, an index or -1 if there’s no substring, but this is less clear than .index(), in my humble opinion.

So, buckle up! Whether you're a seasoned Pythonista or just starting, we're about to embark on a journey through the ins and outs of Python substring checking.

Recommended Approach: Check For Membership With ‘in’

Alright, let’s jump into the Pythonic way of checking if a string contains another string. We’re talking about the Python operator for membership and the in keyword.

Here’s a simple example to get us started:

text = 'The quick brown fox jumps over the lazy dog'
substring = 'fox'
if substring in text:
  print(f'Yep, "{substring}" is in there!')
else:
  print(f'Nope, no "{substring}" found.')

What’s happening here? We’ve got a classic sentence as a string variable. Then, there’s our substring, which in this case is the word fox.

Next, we’re using a simple conditional statement to do a little detective work. It’s asking, “Hey, is ‘fox’ hanging out somewhere in this sentence?”

The beauty of using the in keyword lies in its simplicity. It checks for membership. It’s not fussed about where ‘fox’ is in the sentence, how many times it shows up, or what it’s doing there.

All it cares about is: Is ‘fox’ a part of this string party or not?

When our conditional evaluates to the boolean value of True, we print out our success message, and likewise, if it evaluates to False, we print out the alternative.

Hopefully, you can see why using the in keyword is so handy, not only is it easy to read and write, but it gets straight to the point. It’s super intuitive and really Pythonic, which is a win-win.

It's like asking a friend a yes-or-no question – no need for elaborate details, just a straightforward check.

Note that we could also use the not operator with the in keyword to check if a Python string does not contain a substring.

Fire up your Python IDE and check out the modified example below:

text = 'The quick brown fox jumps over the lazy dog'
substring = 'fox'
if substring not in text:
  print(f'Nope, no "{substring}" found.')
else:
  print(f'Yep, "{substring}" is in there!')

Use The String .lower Method To Check For All Occurrences

Sometimes, our detective work needs to be a bit more thorough. What if 'Fox' decides to crash the party wearing a disguise as 'fox', 'FOX', or even 'FoX'?

That's where we can use Python’s string methods, and in particular, the .lower() method comes in, like a savvy detective who sees through disguises.

Here’s how it works:

text = 'The quick brown fox jumps over the lazy dog'
substring = 'fox'
lowercase_text = text.lower()
lowercase_substring = substring.lower()

if lowercase_substring in lowercase_text:
  print(f'Found "{substring}" with a disguise!')
else:
  print(f'Nope, "{substring}" is nowhere to be seen.')

In this code, we're playing it smart. Before we run our conditional checks, we convert both text and substring into their lowercase versions.

It's like asking everyone at the party to take off their hats – now we can recognize everyone, no matter how they're dressed! Maybe I’m getting carried away with this analogy?!

Anyway, by using .lower(), we're making sure that our search is case-insensitive. 'Fox', 'fox', 'FOX' – they all become 'fox'.

Now, when we check for membership, we're not fooled by uppercase letters, lowercase letters, or any mix of them, which means we can return all occurrences of the substring.

This approach is fantastic when you want to be thorough but still keep things simple. It's like saying, "Alright, I don’t care how you're trying to blend in; I’ll find you!"

So, with .lower() in our toolkit, we’re ready to uncover any sneaky occurrences of our substring, no matter how they’re dressed up.

It's a small change, but it makes our string-checking game even stronger!

Find More Details About Your Python Substring

Okay, let's turn up the detective dial! Sometimes, just knowing that our substring is in the text isn't enough.

We want the juicy details – where is it exactly? How many times does it pop up? That's where .index() and .count() come in!

Suppose we're curious about where our elusive 'fox' first makes its appearance. That’s when we use .index():

text = 'The quick brown fox jumps over the lazy dog'
substring = 'fox'

try:
  position = text.index(substring)
  print(f'"{substring}" found at position {position}')
except ValueError:
  print(f"'"{substring}" is playing hide and seek (not found)."')

This snippet is a bit like playing hide and seek. We’re using .index() to seek out the first occurrence of 'fox'. If it finds it, great! We get the exact position where 'fox' starts hiding.

But if 'fox' is a no-show, Python raises a ValueError. That’s why we wrap it in a try-except block – to handle the situation gracefully if 'fox' decides not to attend the party.

Remember when I mentioned that I prefer to take the EAFP approach? This is what I’m talking about, and that’s why I’ve added exception handling.

I think you’d agree this code is easy to understand, and no matter the outcome, we’re covered.

Now, what if our 'fox' is a social butterfly and appears multiple times? Here’s where .count() comes in handy:

occurrences = text.count(substring)
print(f'"{substring}" appears {occurrences} time(s)')

This is like taking a headcount since .count() tells us exactly how many times 'fox' graces the string with its presence.

It’s a straightforward way to quantify the substring's occurrences without playing hide and seek.

By using both .index() and .count(), we can get a fuller picture of our substring’s role in the text.

It’s a bit like being both a detective and a statistician – we know where the action is and how much action there is.

Overall, I think these methods are perfect for when you need to dive deeper into the details of your Python strings.

Python Compiler

Want to try these right now? Use this Python compiler:

Try It Yourself »

More Ways To Find A Python Substring

Now, I did promise that I’d be thorough and cover some more methods for finding whether a Python string contains a substring.

That said, I’d still recommend that you stick to using the membership test with the in keyword.

With all that out of the way, let’s take a look at how we can use the .find() and .split() methods for finding a substring.

To my mind, .find() is like the kinder cousin of .index(). It plays a similar game but doesn’t cause a fuss if it doesn’t find anything.

What do I mean here? Well, we don’t need exception handling with .find() because rather than being EAFP, it’s part of the LYBL club.

Here’s how we can use .find():

text = 'The quick brown fox jumps over the lazy dog'
substring = 'fox'

position = text.find(substring)
if position != -1:
  print(f'Caught you, "{substring}"! You're hiding at position {position}.')
else:
  print(f'Looks like "{substring}" slipped away this time.')

In this scenario, we’re still seeking the position of 'fox'. But unlike .index(), if .find() doesn’t find 'fox', it simply returns -1. No drama, no exceptions. It’s the chill way to search for substrings.

Again, this seems handy, but I’d contest that it’s more Pythonic and intuitive to use .index() and directly handle an exception.

Now, let’s talk about .split(). This method is like the social coordinator of Python strings.

It breaks up the string into a Python list of strings (words) or elements based on an optional separator. Note that if we leave this blank, it splits based on whitespace, but we could use other characters or punctuation marks.

Let's check this out:

text = 'The quick brown fox jumps over the lazy dog'
substring = 'fox'

words = text.split()
if substring in words:
  print(f'"{substring}" is definitely in the mix!')
else:
  print(f'No sign of "{substring}" in this party.')

Here, we're dissecting our string into individual words. Then, we check if 'fox' is one of the guests.

It’s a bit more granular than just checking the whole string – like checking each room at a party instead of just the house.

Then again, we’re essentially converting our String, which is a sequence of characters, into a list of words, which is, again, just an array or sequence of character collections.

We then use the membership test to check for the substring. To me, this seems a little longwinded for simple strings, but there may be occasions when it makes more sense to use.

For example, if you need to check if a substring exists as a standalone word rather than part of a larger word, it makes sense to use .split().

In our current example, what if the word ‘fox’ appears as part of the word ‘foxy’, do we want to find this, or not? Well, that depends!

But, if we use .split(), we have the ability to examine the individual words that match our substring.

Using RegEx To Find A Python Substring

Let’s continue diving into Python substrings by taking a detour into Regular Expressions, also known as RegEx.

This is where things get a bit more intricate and powerful. RegEx is like the Swiss Army knife for string manipulation – it's versatile and can handle some pretty complex patterns.

We’ll focus on two methods from Python's re module: .findall() and .finditer().

For me, .findall() is great when you want to grab every occurrence of a pattern. Think of it as throwing a net into the sea of your text and catching all the matching fish. Here’s how it works:

import re

text = 'The quick brown fox jumps over the lazy dog'
pattern = 'fox'

matches = re.findall(pattern, text, re.IGNORECASE)
print(f’Caught these foxes: {matches}’)

With .findall(), we’re not just limited to exact words. We can also search for patterns, which makes it incredibly flexible.

In this example, we're catching all variations of 'fox', ignoring case sensitivity. It’s like saying, “I don’t care how you’re written, if you’re a fox, I’ll find you.”

Note that this also means we don’t need to convert our text to lowercase like we did previously. Of course, if you’re new to coding, you might want to brush up on your skills with a RegEx cheat sheet if you need to look for more complex patterns.

If .findall() is about casting a wide net, .finditer() is like going on a treasure hunt. It returns an iterator that gives us match objects for each occurrence.

This is handy when we want more information about each match:

import re

text = 'The quick brown fox jumps over the lazy dog'
patterm = 'fox'

for match in re.finditer(pattern, text, re.IGNORECASE):
  start, end = match.span()
  print(f'Found "{match.group()}" from position {start} to {end}')

As you can see, by using .finditer(), we’re not just finding 'foxes'; we're pinpointing where each one is hiding in our string.

This is perfect when you need the details – like the start and end positions of your matches.

Find A Python Substring Inside A Pandas Dataframe

To round things off, let’s focus on data analysis and how to find a substring within a Pandas DataFrame.

This is really useful when you need to work with large datasets, and you need to filter or analyze text data efficiently.

If you’re not familiar with Pandas, it’s one of the most popular Python libraries for data manipulation, as it offers intuitive ways to process vast amounts of text data.

So, imagine you have a DataFrame full of data, and you’re interested in rows where a specific column contains a particular substring. Here’s how you can approach this:

import pandas as pd

# Sample DataFrame
data = {'Product': ['Apple iPhone', 'Samsung Galaxy',
                  'Google Pixel', 'OnePlus Nord', 'Sony Xperia'],
      'Description': ['Latest model with A14 chip',
                      'New release with improved camera',
                      'Pixel 5 with stock Android experience',
                      'Affordable flagship by OnePlus',
                      'Sony's new approach to smartphones']}
df = pd.DataFrame(data)

# Searching for a substring
substring = 'new'
filtered_df = df[df['Description'].str.contains(substring, case=False, na=False)]

In this example, I’ve created a DataFrame with smartphones and their descriptions.

Our mission? Find all products with descriptions containing the word 'new'.

To achieve this, I’ve used the .str.contains() method. This is applied to a DataFrame column and checks each cell in that column for the presence of the substring.

It’s like using the in keyword, but it’s supercharged for DataFrames.

Notice that I’ve also used some additional parameters:

case=False: This makes the search case-insensitive, ensuring we catch 'New', 'NEW', or 'new'.
na=False: This handles missing values (NaN) by treating them as if they don't contain the substring.

I’ve then assigned the result to a new DataFrame that contains only the rows where 'new' appears in the 'Description' column.

Of course, there is much more you can do with this general approach, but hopefully, you can see how incredibly powerful this can be for data analysis of large data sets.

Whether you’re sifting through customer reviews, searching for keywords in a dataset, or filtering records based on text content, Pandas makes it so straightforward.

Wrapping Up

There you have it! That sums up our exploration of how to find out if a Python string contains a substring.

If you’ve made it this far, you should know that the Python membership operator, aka the in keyword, is the most Pythonic way to check if a Python string contains a substring.

I’ve also covered various other methods to find out if a Python string contains a substring, including .index(), .find(), and .split().

Plus, we learned how to use .lower() to broaden our search, along with RegEx for more advanced string searching methods.

Finally, we took a brief look at how to check for a Python substring within a Pandas DataFrame.

I hope you’ve enjoyed learning about this topic, and feel free to leave us a comment below.

Happy coding!

Enjoyed learning how to find whether a Python string contains a substring and want to dive deeper into Python? Check out:

Our Python Masterclass - Python with Dr. Johns