The integration of Pandas API in Spark bridges the gap between these two ecosystems, allowing users familiar with Pandas to leverage their knowledge in a distributed computing environment. In this article, we will delve into the `Series.agg(func)`

function, which enables us to aggregate data using one or more operations over a specified axis, demonstrating its usage with examples and outputs.

Understanding `Series.agg(func)`

: The `Series.agg(func)`

function in Pandas is used to apply one or more aggregation functions to the elements of a Series. Similarly, in Spark, this function allows us to perform aggregation operations on a Series within a Spark DataFrame. It provides flexibility by accepting a single aggregation function or a list of aggregation functions to be applied to the Series.

**Syntax:**

```
Series.agg(func)
```

Where `func`

can be a single aggregation function or a list of aggregation functions.

Examples and Outputs: Let’s dive into some examples to understand how `Series.agg(func)`

works in the context of Spark DataFrames.

**Example 1:** Applying a single aggregation function Suppose we have a Spark DataFrame `df`

with a Series named `column1`

, and we want to calculate the sum of its values using `Series.agg(func)`

.

```
# Import necessary libraries
from pyspark.sql import SparkSession
# Create a SparkSession
spark = SparkSession.builder \
.appName("Pandas API on Spark") \
.getOrCreate()
# Sample data
data = [("A", 10), ("B", 20), ("C", 30)]
df = spark.createDataFrame(data, ["column1", "column2"])
# Calculate sum using Series.agg(func)
sum_result = df.select("column2").agg({"column2": "sum"}).collect()[0][0]
print("Sum:", sum_result)
```

Output:

```
Sum: 60
```

**Example 2: Applying multiple aggregation functions **

Now, let’s apply multiple aggregation functions to the same Series, such as finding the sum and maximum value.

```
# Calculate sum and maximum using Series.agg(func)
agg_result = df.selectExpr("sum(column2) as sum_column2", "max(column2) as max_column2").collect()[0]
print("Sum:", agg_result["sum_column2"])
print("Max:", agg_result["max_column2"])
```

```
Sum: 60
Max: 30
```

**Spark important urls to refer**