How to map values of a Series according to an input correspondence:SSeries.map()

user April 11, 2024

Understanding SSeries.map(): The SSeries.map() method in the Pandas API on Spark allows users to map values of a Series according to an input correspondence. It is similar to Pandas’ Series.map() method, which applies a function to each element of the Series.

Syntax:

SSeries.map(arg[, na_action])

arg: The mapping function or a dictionary containing the mapping correspondence.
na_action (optional): Specifies how to handle missing values. It can be set to 'ignore' to exclude missing values from the result or 'raise' to raise an error if missing values are encountered.

Example 1: Mapping Values Using a Function Suppose we have a Spark DataFrame df with a column numbers containing integer values. We can use SSeries.map() to apply a function that squares each number.

Ensure we’re using the correct syntax for converting a Spark DataFrame to a Pandas DataFrame. Here’s the corrected example:

# Import necessary libraries
from pyspark.sql import SparkSession
import pandas as pd

# Initialize Spark session
spark = SparkSession.builder \
    .appName("Learning @ Freshers.in Pandas SSeries.map()") \
    .getOrCreate()
# Create a Spark DataFrame
data = [(1,), (2,), (3,), (4,), (5,)]
df = spark.createDataFrame(data, ["numbers"])
# Convert Spark DataFrame to Pandas DataFrame
pandas_df = df.toPandas()
# Define mapping function
def square(x):
    return x ** 2
# Apply mapping function using SSeries.map()
mapped_series = pandas_df["numbers"].map(square)
# Display the original and mapped Series
print("Original Series:")
print(pandas_df["numbers"])
print("\nMapped Series:")
print(mapped_series)

Output:

Original Series:
0    1
1    2
2    3
3    4
4    5
Name: numbers, dtype: int64

Mapped Series:
0     1
1     4
2     9
3    16
4    25
Name: numbers, dtype: int64

Mapping Values Using a Dictionary

In this example, let’s use a dictionary to map each value to its corresponding square root.

# Define mapping dictionary
mapping_dict = {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Apply mapping using SSeries.map() with dictionary
mapped_series_dict = pandas_df["numbers"].map(mapping_dict)
# Display the mapped Series using dictionary
print("Mapped Series using Dictionary:")
print(mapped_series_dict)

Output:

Mapped Series using Dictionary:
0     1
1     4
2     9
3    16
4    25
Name: numbers, dtype: int64

The SSeries.map() method in the Pandas API on Spark provides a convenient way to map values of a Series based on a function or a dictionary. This allows users familiar with Pandas to leverage their existing knowledge and apply it to large-scale data processing tasks in Spark. By exploring and understanding methods like SSeries.map(), users can unlock the full potential of the Pandas API on Spark for their data manipulation needs.

Spark important urls to refer

Post Views: 3

Author: user

How to map values of a Series according to an input correspondence:SSeries.map()

Mapping Values Using a Dictionary

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

Mapping Values Using a Dictionary

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget