Series.aggregate(func) : Pandas API on Spark

Spark_Pandas_Freshers_in

In this article, we will explore the Series.aggregate(func) function, which enables users to aggregate data using one or more operations over a specified axis in a Spark DataFrame. Through comprehensive examples and outputs, we’ll unravel the versatility and power of this function.

Understanding Series.aggregate(func): The Series.aggregate(func) function in Pandas is designed to apply one or more aggregation functions to the elements of a Series. Similarly, in the context of Spark DataFrames, this function allows users to perform aggregation operations on a Series within the DataFrame. It offers flexibility by accepting a single aggregation function or a list of aggregation functions to be applied to the Series.

Syntax:

Example 2: Applying Multiple Aggregation

Functions Now, let’s apply multiple aggregation functions to the same Series, such as finding the sum and maximum value.

# Calculate sum and maximum using Series.aggregate(func)
agg_result = df.selectExpr("sum(column2) as sum_column2", "max(column2) as max_column2").collect()[0]
print("Sum:", agg_result["sum_column2"])
print("Max:", agg_result["max_column2"])
Output
Sum: 60
Max: 30
Author: user