Understanding Series.transform(func[, axis])

Series.transform(func[, axis])

In this article, we’ll explore the Series.transform(func[, axis]) function, shedding light on its capabilities through comprehensive examples and outputs.

Understanding Series.transform(func[, axis]): The Series.transform(func[, axis]) function in Pandas API on Spark is tailored to invoke func, yielding a Series with transformed values akin to the original Series. It ensures the transformed Series retains the same length as the input, thereby facilitating seamless integration within Spark DataFrames. This function serves as a linchpin for executing custom transformations on Series data, fostering intricate data manipulations in Spark.

Syntax:

Series.transform(func[, axis])

Where:

  • func: A transformation function applied to each element of the Series.
  • axis (optional): Specifies the axis along which the transformation is applied. Default is 0 (column-wise), while 1 signifies row-wise transformations.

Examples and Outputs: Let’s embark on practical examples to elucidate the functionality of Series.transform(func[, axis]) within Spark DataFrames.

Example 1: Applying a Simple Transformation Function.

Consider a Spark DataFrame df with a Series named column2. We’ll double each element of column2.

# Sample data
from pyspark.sql import SparkSession
spark = SparkSession.builder \
    .appName("Pandas API on Spark") \
    .getOrCreate()
data = [("A", 10), ("B", 20), ("C", 30)]
df = spark.createDataFrame(data, ["column1", "column2"])
# Define transformation function
def double_value(x):
    return x * 2
# Apply transformation function using Series.transform(func)
transformed_df = df.withColumn("transformed_column", double_value(df["column2"]))
transformed_df.show()

Output:

+-------+-------+-------------------+
|column1|column2|transformed_column|
+-------+-------+-------------------+
|      A|     10|                 20|
|      B|     20|                 40|
|      C|     30|                 60|
+-------+-------+-------------------+

Example 2: Applying a Custom Transformation Function.

Let’s define a custom transformation function that converts strings to uppercase and apply it to a Series containing strings.

# Sample data
from pyspark.sql.functions import upper
data = [("A", "hello"), ("B", "world"), ("C", "spark")]
df = spark.createDataFrame(data, ["column1", "column2"])
# Apply custom transformation function using Series.transform(func)
transformed_df = df.withColumn("transformed_column", upper(df["column2"]))
transformed_df.show()

Output:

+-------+-------+-------------------+
|column1|column2|transformed_column|
+-------+-------+-------------------+
|      A|  hello|              HELLO|
|      B|  world|              WORLD|
|      C|  spark|              SPARK|
+-------+-------+-------------------+
Author: user