PySpark : Extracting dayofmonth, dayofweek, and dayofyear in PySpark

PySpark @ Freshers.in
  1. pyspark.sql.functions.dayofmonth
  2. pyspark.sql.functions.dayofweek
  3. pyspark.sql.functions.dayofyear

One of the most common data manipulations in PySpark is working with date and time columns. PySpark provides several functions to extract day-related information from date and time columns, such as dayofmonth, dayofweek, and dayofyear. In this article, we will explore these functions in detail.

dayofmonth

dayofmonth: The dayofmonth function returns the day of the month from a date column. The function returns an integer between 1 and 31, representing the day of the month.

Syntax: df.select(dayofmonth(col("date_column_name"))).show()

from pyspark.sql.functions import dayofmonth
from pyspark.sql.functions import col
# Sample DataFrame
df = spark.createDataFrame([("2023-01-19",),("2023-02-11",),("2023-03-12",)], ["date"])
# Extract day of the month
df.select(dayofmonth(col("date"))).show()

Output

+----------------+
|dayofmonth(date)|
+----------------+
|              19|
|              11|
|              12|
+----------------+

dayofweek

dayofweek: The dayofweek function returns the day of the week from a date column. The function returns an integer between 1 (Sunday) and 7 (Saturday), representing the day of the week.

Syntax: df.select(dayofweek(col("date_column_name"))).show()

from pyspark.sql.functions import dayofweek
from pyspark.sql.functions import col
# Sample DataFrame
df = spark.createDataFrame([("2023-01-19",),("2023-02-11",),("2023-03-12",)], ["date"])
# Extract day of the week
df.select(dayofweek(col("date"))).show()

Output

+---------------+
|dayofweek(date)|
+---------------+
|              5|
|              7|
|              1|
+---------------+

dayofyear

dayofyear: The dayofyear function returns the day of the year from a date column. The function returns an integer between 1 and 366, representing the day of the year.

Syntax: df.select(dayofyear(col("date_column_name"))).show()

from pyspark.sql.functions import dayofyear
from pyspark.sql.functions import col
# Sample DataFrame
df = spark.createDataFrame([("2023-01-19",),("2023-02-11",),("2023-03-12",)], ["date"])
# Extract day of the year
df.select(dayofyear(col("date"))).show()

Output

+---------------+
|dayofyear(date)|
+---------------+
|             19|
|             42|
|             71|
+---------------+

The dayofmonth, dayofweek, and dayofyear functions in PySpark provide an easy way to extract day-related information .

Spark important urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply