How to map values of a Series according to an input


Understanding The method in the Pandas API on Spark allows users to map values of a Series according to an input correspondence. It is similar to Pandas’ method, which applies a function to each element of the Series.

Syntax:[, na_action])
  • arg: The mapping function or a dictionary containing the mapping correspondence.
  • na_action (optional): Specifies how to handle missing values. It can be set to 'ignore' to exclude missing values from the result or 'raise' to raise an error if missing values are encountered.

Example 1: Mapping Values Using a Function Suppose we have a Spark DataFrame df with a column numbers containing integer values. We can use to apply a function that squares each number.

Ensure we’re using the correct syntax for converting a Spark DataFrame to a Pandas DataFrame. Here’s the corrected example:

# Import necessary libraries
from pyspark.sql import SparkSession
import pandas as pd

# Initialize Spark session
spark = SparkSession.builder \
    .appName("Learning @ Pandas") \
# Create a Spark DataFrame
data = [(1,), (2,), (3,), (4,), (5,)]
df = spark.createDataFrame(data, ["numbers"])
# Convert Spark DataFrame to Pandas DataFrame
pandas_df = df.toPandas()
# Define mapping function
def square(x):
    return x ** 2
# Apply mapping function using
mapped_series = pandas_df["numbers"].map(square)
# Display the original and mapped Series
print("Original Series:")
print("\nMapped Series:")


Original Series:
0    1
1    2
2    3
3    4
4    5
Name: numbers, dtype: int64

Mapped Series:
0     1
1     4
2     9
3    16
4    25
Name: numbers, dtype: int64

Mapping Values Using a Dictionary

In this example, let’s use a dictionary to map each value to its corresponding square root.

# Define mapping dictionary
mapping_dict = {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Apply mapping using with dictionary
mapped_series_dict = pandas_df["numbers"].map(mapping_dict)
# Display the mapped Series using dictionary
print("Mapped Series using Dictionary:")


Mapped Series using Dictionary:
0     1
1     4
2     9
3    16
4    25
Name: numbers, dtype: int64

The method in the Pandas API on Spark provides a convenient way to map values of a Series based on a function or a dictionary. This allows users familiar with Pandas to leverage their existing knowledge and apply it to large-scale data processing tasks in Spark. By exploring and understanding methods like, users can unlock the full potential of the Pandas API on Spark for their data manipulation needs.

Spark important urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user