PySpark : Adding a specified number of days to a date column in PySpark

PySpark @ Freshers.in

pyspark.sql.functions.date_add

The date_add function in PySpark is used to add a specified number of days to a date column. It’s part of the built-in Spark SQL functions and can be used in Spark DataFrames, Spark SQL and Spark Datasets.

The basic syntax of the function is as follows:

date_add(date, days)

where:

date is the date column that you want to add days to.
days is the number of days to add to the date column (can be a positive or negative number).

Here’s an example of how you can use the date_add function in PySpark:

from pyspark.sql import functions as F
df = spark.createDataFrame([("2023-01-01",),("2023-01-02",)], ["date"])
df = df.withColumn("new_date", F.date_add(df["date"], 1))
df.show()

Result

+----------+----------+
|      date|  new_date|
+----------+----------+
|2023-01-01|2022-01-02|
|2023-01-02|2022-01-03|
+----------+----------+

Note that the date_add function only adds days to a date, not to a timestamp. If you want to add a specified number of seconds, minutes, hours, etc. to a timestamp column, you can use the date_add function along with the expr function to write a custom expression.

Spark important urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply