Explore the essentials of categorical data in Python Pandas.

Python Pandas @ Freshers.in

Categorical data refers to values that can be categorized into distinct groups or categories. Unlike continuous data, categorical data represent discrete sets, like gender, colors, or ratings.

Importance of Categorical Data in Pandas

Using categorical data in Pandas can lead to more efficient data processing. It reduces memory usage and speeds up operations like grouping and sorting, especially beneficial for large datasets with many repeating values.

Creating a Categorical Series

Let’s start by creating a Pandas Series with categorical data.

Example:

import pandas as pd
# Sample data
names = ['Sachin', 'Manju', 'Ram', 'Raju', 'David', 'Wilson']
categories = ['Engineering', 'Medicine', 'Arts', 'Engineering', 'Law', 'Medicine']
# Creating a categorical series
category_series = pd.Series(categories, dtype="category", index=names)

In this example, we assign professions to different individuals, categorizing them into various fields like ‘Engineering’, ‘Medicine’, and so on.

Exploring the Categorical Series

Once a categorical series is created, you can explore its properties like categories and codes.

# Displaying categories
print("Categories:", category_series.cat.categories)
# Displaying codes
print("Codes:", category_series.cat.codes)

Advantages of Categorical Data

  • Memory Efficiency: Categorical data uses less memory, which is advantageous for large datasets.
  • Performance Improvement: Operations like sorting and grouping are faster with categorical data.
  • Clearer Analysis: Categorical data make some types of analysis and visualization more straightforward and meaningful.

Modifying Categories

Pandas allows you to add, remove, or rename categories in a categorical series.

Example of Modifying Categories:

# Adding a new category
category_series.cat.add_categories('Science', inplace=True)
# Removing a category
category_series.cat.remove_categories('Law', inplace=True).

Refer more on python here :

Refer more on Pandas here

Author: user