The **mean** function in PySpark is used to compute the average value of a numeric column. This function is part of PySpark’s aggregate functions, which are essential in statistical analysis. This article explores the **mean** function in PySpark, its benefits, and its practical application through a real-world example. The **mean** function in PySpark is a powerful tool for statistical analysis, offering a simple yet effective way to understand the central tendency of numerical data.

**The syntax for mean is:**

```
from pyspark.sql.functions import mean
```

### Advantages of using mean

**Statistical Insights**: Provides a quick overview of the central tendency of numeric data.**Data Reduction**: Summarizes large datasets into a single representative value.**Versatility**: Can be used in various contexts, from financial analysis to scientific research.

### Example : Analyzing employee salaries

Consider a dataset with the names of employees and their salaries. Our goal is to calculate the average salary.

### Dataset

Name | Salary |
---|---|

Sachin | 70000 |

Ram | 48000 |

Raju | 54000 |

David | 62000 |

Wilson | 58000 |

### Objective

Compute the average salary of the employees.

### Implementation in PySpark

Setting up the PySpark environment and creating the DataFrame:

```
from pyspark.sql import SparkSession
from pyspark.sql.functions import mean
# Initialize Spark Session
spark = SparkSession.builder.appName("Mean Example").getOrCreate()
# Sample Data
data = [("Sachin", 70000), ("Ram", 48000), ("Raju", 54000), ("David", 62000), ("Wilson", 58000)]
# Creating DataFrame
df = spark.createDataFrame(data, ["Name", "Salary"])
df.show()
```

**Output**

```
+------+------+
| Name|Salary|
+------+------+
|Sachin| 70000|
| Ram| 48000|
| Raju| 54000|
| David| 62000|
|Wilson| 58000|
+------+------+
```

**Applying the mean function:**

```
# Calculating Mean Salary
mean_salary = df.select(mean("Salary")).collect()[0][0]
print("Average Salary:", mean_salary)
```

**Output**

`Average Salary: 58400.0`

**Spark important urls to refer**