Enhancing data manipulation in Pandas: Techniques for returning multiple columns

Python Pandas @ Freshers.in

Working with data frames in Python’s Pandas library often involves selecting and manipulating multiple columns. This article explains how to effectively return multiple columns, a fundamental skill for data analysis and manipulation in Pandas.

Techniques for Returning Multiple Columns

Using column names

The simplest method to select multiple columns is by using their names within double brackets. This returns a new DataFrame with just the selected columns.

Example:

Let’s create a DataFrame with names and ages for our demonstration:

import pandas as pd
# Sample data
data = {
    'Name': ['Sachin', 'Manju', 'Ram', 'Raju', 'David', 'Wilson'],
    'Age': [30, 25, 40, 35, 28, 32],
    'City': ['Delhi', 'Mumbai', 'Chennai', 'Kolkata', 'Bangalore', 'Hyderabad']
}
df = pd.DataFrame(data)
# Selecting multiple columns
selected_columns = df[['Name', 'Age']]
print(selected_columns)

Output

     Name  Age
0  Sachin   30
1   Manju   25
2     Ram   40
3    Raju   35
4   David   28
5  Wilson   32

Using loc[] and iloc[]

The loc[] and iloc[] methods provide more flexibility. loc[] is label-based, meaning you use the column names, while iloc[] is integer index-based.

Example:

# Using loc[]
selected_columns_loc = df.loc[:, ['Name', 'City']]
# Using iloc[]
selected_columns_iloc = df.iloc[:, [0, 2]]
print(selected_columns_loc)
print(selected_columns_iloc)

Output

     Name       City
0  Sachin      Delhi
1   Manju     Mumbai
2     Ram    Chennai
3    Raju    Kolkata
4   David  Bangalore
5  Wilson  Hyderabad
     Name       City
0  Sachin      Delhi
1   Manju     Mumbai
2     Ram    Chennai
3    Raju    Kolkata
4   David  Bangalore
5  Wilson  Hyderabad

Advanced techniques

For more complex scenarios, you can use boolean indexing or query expressions to select columns based on conditions.

Boolean indexing

Example:

# Selecting people older than 30
older_than_30 = df[df['Age'] > 30][['Name', 'Age']]
print(older_than_30)

Output

     Name  Age
2     Ram   40
3    Raju   35
5  Wilson   32

Using query()

Example:

# Using query to select specific names
specific_names = df.query("Name in ['Sachin', 'David']")[['Name', 'City']]
print(specific_names)

Output

     Name       City
0  Sachin      Delhi
4   David  Bangalore

Returning multiple columns in Pandas is a versatile operation that can be achieved through various methods, depending on the complexity and requirements of your task. Whether through direct column name selection, index-based methods like loc[] and iloc[], or more advanced techniques like boolean indexing and queries, Pandas provides robust functionality for effective data manipulation.

Refer more on python here :

Refer more on Pandas here

Author: user