Python Regular Expressions: Parsing and Searching Strings

Learn Python @ Freshers.in

Regular Expressions, often abbreviated as regex or RegEx, are a potent tool in Python for parsing and searching strings. In this comprehensive article, we will explore the art of parsing and searching strings in Python using Regular Expressions. With real-world examples and detailed explanations, you’ll gain the skills to handle complex text processing tasks efficiently. Regular expressions are sequences of characters that define search patterns. They empower you to find, match, and manipulate strings based on specific criteria, making them a crucial component of text processing in Python. Before we dive into examples, let’s import the re module:

Parsing Strings

1. Extracting Phone Numbers

Suppose you have a text containing phone numbers, and you want to extract them. You can use regex for this task:

text = "Call us at +1 (555) 123-4567 or +44 (20) 1234 5678 for assistance."
pattern = r"\+\d{1,3}\s?\(\d{2,4}\)\s?\d{3,4}\s?\d{4}"
matches = re.findall(pattern, text)
if matches:
    print("Found:", matches)
else:
    print("No phone numbers found.")

Output:

Found: ['+1 (555) 123-4567', '+44 (20) 1234 5678']

2. Extracting URLs

You can also extract URLs from a text using regex:

text = "Visit our website at https://www.example.com or https://subdomain.example.org for more information."
pattern = r"https?://\S+"
matches = re.findall(pattern, text)
if matches:
    print("Found:", matches)
else:
    print("No URLs found.")

Output:

Found: ['https://www.example.com', 'https://subdomain.example.org']

Searching Strings

1. Finding Keywords

Suppose you want to find specific keywords in a document:

text = "Python is a versatile language. Python is widely used for web development."
pattern = r"Python"
matches = re.finditer(pattern, text)
for match in matches:
    print("Found at position:", match.start())

Output:

Found at position: 0
Found at position: 28

2. Counting Occurrences

You can count the number of occurrences of a word in a text using regex:

text = "Python is an amazing language. Python is easy to learn."
pattern = r"Python"
matches = re.findall(pattern, text)
count = len(matches)
print("Python appears", count, "times in the text.")

Output:

Python appears 2 times in the text.
Author: user