Enhancing data manipulation in Pandas: Techniques for returning multiple columns

Python Pandas @ Freshers.in

Working with data frames in Python’s Pandas library often involves selecting and manipulating multiple columns. This article explains how to effectively return multiple columns, a fundamental skill for data analysis and manipulation in Pandas.

Techniques for Returning Multiple Columns

Using column names

The simplest method to select multiple columns is by using their names within double brackets. This returns a new DataFrame with just the selected columns.

Example:

Let’s create a DataFrame with names and ages for our demonstration:

import pandas as pd
# Sample data
data = {
    'Name': ['Sachin', 'Manju', 'Ram', 'Raju', 'David', 'Wilson'],
    'Age': [30, 25, 40, 35, 28, 32],
    'City': ['Delhi', 'Mumbai', 'Chennai', 'Kolkata', 'Bangalore', 'Hyderabad']
}
df = pd.DataFrame(data)
# Selecting multiple columns
selected_columns = df[['Name', 'Age']]
print(selected_columns)
Python

Output

     Name  Age
0  Sachin   30
1   Manju   25
2     Ram   40
3    Raju   35
4   David   28
5  Wilson   32
Bash

Using loc[] and iloc[]

The loc[] and iloc[] methods provide more flexibility. loc[] is label-based, meaning you use the column names, while iloc[] is integer index-based.

Example:

# Using loc[]
selected_columns_loc = df.loc[:, ['Name', 'City']]
# Using iloc[]
selected_columns_iloc = df.iloc[:, [0, 2]]
print(selected_columns_loc)
print(selected_columns_iloc)
Python

Output

     Name       City
0  Sachin      Delhi
1   Manju     Mumbai
2     Ram    Chennai
3    Raju    Kolkata
4   David  Bangalore
5  Wilson  Hyderabad
     Name       City
0  Sachin      Delhi
1   Manju     Mumbai
2     Ram    Chennai
3    Raju    Kolkata
4   David  Bangalore
5  Wilson  Hyderabad
Python

Advanced techniques

For more complex scenarios, you can use boolean indexing or query expressions to select columns based on conditions.

Boolean indexing

Example:

# Selecting people older than 30
older_than_30 = df[df['Age'] > 30][['Name', 'Age']]
print(older_than_30)
Python

Output

     Name  Age
2     Ram   40
3    Raju   35
5  Wilson   32
Bash

Using query()

Example:

# Using query to select specific names
specific_names = df.query("Name in ['Sachin', 'David']")[['Name', 'City']]
print(specific_names)
Python

Output

     Name       City
0  Sachin      Delhi
4   David  Bangalore
Batch

Returning multiple columns in Pandas is a versatile operation that can be achieved through various methods, depending on the complexity and requirements of your task. Whether through direct column name selection, index-based methods like loc[] and iloc[], or more advanced techniques like boolean indexing and queries, Pandas provides robust functionality for effective data manipulation.

Refer more on python here :

Refer more on Pandas here

Author: user