# Computing the kurtosis value of a numeric column in a DataFrame in PySpark-kurtosis

The kurtosis function in PySpark aids in computing the kurtosis value of a numeric column in a DataFrame. Kurtosis gauges the “tailedness” of a data distribution, where higher values indicate heavier tails and a sharper peak, and lower values indicate lighter tails and a flatter peak relative to a normal distribution.

### Example

from pyspark.sql import SparkSession
from pyspark.sql.functions import kurtosis

# Initialize SparkSession
spark = SparkSession.builder \
.appName("KurtosisFunctionDemo") \
.getOrCreate()

# Sample data
data = [(85,),
(90,),
(78,),
(92,),
(89,),
(76,),
(95,),
(87,)]

# Define DataFrame
df = spark.createDataFrame(data, ["score"])

# Compute kurtosis of the scores
kurt_value = df.select(kurtosis(df["score"])).collect()[0][0]
print(f"Kurtosis of scores: {kurt_value:.2f}")


Output

Kurtosis of scores: -0.97


### Benefits of using the kurtosis function:

1. Insightful Analysis: Offers deeper insights into data distribution, especially the extremities.
2. Performance: Swiftly computes kurtosis values across vast datasets, leveraging PySparkâ€™s distributed processing capabilities.
3. Decision-making: Aids businesses in making informed decisions by understanding data behavior, especially in risk-prone sectors.
4. Comprehensive Data Studies: Acts as an essential statistical tool in conjunction with other measures like mean, variance, and skewness, providing a holistic view of data.

### Where can we use kurtosis function:

1. Financial Analysis: To analyze financial data where extremes (both gains and losses) hold significance.
2. Quality Control: In industries, detecting outliers or abnormal behaviors in manufacturing processes.
3. Meteorological Studies: Observing unusual weather patterns by analyzing the “tailedness” of meteorological datasets.
4. Risk Management: Assessing the likelihood of rare and extreme events in various fields, from insurance to finance.

Spark important urls to refer

Author: user