How to check if a column exists in a Pandas DataFrame

Python Pandas @ Freshers.in

One frequent operation when working with DataFrames is determining if a specific column exists. This article guides you through multiple methods to achieve this.

Creating a sample DataFrame for demonstration:

import pandas as pd
data = {
    'Name': ['Sachin', 'Ramu', 'Arun'],
    'Age': [25, 30, 35],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)
print(df)

Output

      Name  Age           City
0    Sachin   25       New York
1    Ramu     30  San Francisco
2    Arun     35    Los Angeles

Checking if a Column Exists

Method 1: Using the in keyword

The simplest way to check if a column exists is by using the in keyword:

column_to_check = 'Age'
if column_to_check in df.columns:
    print(f"'{column_to_check}' exists in DataFrame.")
else:
    print(f"'{column_to_check}' does not exist in DataFrame.")

Method 2: Using df.columns.contains()

This method is particularly useful for DataFrames with a MultiIndex.

if df.columns.contains(column_to_check):
    print(f"'{column_to_check}' exists in DataFrame.")
else:
    print(f"'{column_to_check}' does not exist in DataFrame.")

Method 3: Using df.hasnans

The hasnans attribute checks if a Series (column) contains NaNs. When applied to columns, it can serve as an indirect check for a column’s existence.

try:
    if df[column_to_check].hasnans:
        print(f"'{column_to_check}' exists in DataFrame.")
    else:
        print(f"'{column_to_check}' exists in DataFrame.")
except KeyError:
    print(f"'{column_to_check}' does not exist in DataFrame.")

While this method is more unconventional and primarily used for other purposes, it’s an alternative way to approach the problem.

Handling a non-existent column

When you try accessing a non-existent column directly, Pandas will raise a KeyError. Thus, it’s crucial to check if a column exists before performing operations on it. This ensures your code’s robustness, especially when dealing with dynamic or evolving datasets.

Refer more on python here :
Refer more on python here : PySpark 

Author: user