PySpark function that is used to extract the quarter from a given date.

The quarter function in PySpark is used to extract the quarter from a given date, aiding in the analysis and grouping of data by quarterly periods. It’s particularly valuable in financial analysis, trend analysis, and any scenario where data is evaluated on a quarterly basis. This article aims to elucidate the quarter function with a detailed example, tailored for both beginners and seasoned data professionals.


from pyspark.sql.functions import quarter
df.withColumn("quarter_column", quarter(df["date_column"]))


Let’s consider a scenario where we have a dataset containing sales data, and we want to determine the quarter of each sale for seasonal trend analysis.

Sample data

Imagine we have the following data in a DataFrame named sales_df:

Date Sales
2023-01-15 300
2023-04-10 450
2023-07-20 500
2023-10-05 550
2023-12-30 600

Code implementation

from pyspark.sql import SparkSession
from pyspark.sql.functions import quarter
from pyspark.sql.types import *
# Initialize Spark Session
spark = SparkSession.builder.appName("QuarterFunctionExample").getOrCreate()
# Sample data
data = [("2023-01-15", 300),
        ("2023-04-10", 450),
        ("2023-07-20", 500),
        ("2023-10-05", 550),
        ("2023-12-30", 600)]
# Define schema
schema = StructType([
    StructField("Date", StringType(), True),
    StructField("Sales", IntegerType(), True)
# Create DataFrame
sales_df = spark.createDataFrame(data, schema)
sales_df = sales_df.withColumn("Date", sales_df["Date"].cast(DateType()))
# Apply quarter function
sales_df_with_quarters = sales_df.withColumn("Quarter", quarter(sales_df["Date"]))
# Show results
|      Date|Sales|Quarter|
|2023-01-15|  300|      1|
|2023-04-10|  450|      2|
|2023-07-20|  500|      3|
|2023-10-05|  550|      4|
|2023-12-30|  600|      4|

