Spark : Converting argument into a timedelta object

Spark_Pandas_Freshers_in

to_timedelta(), proves invaluable for handling time-related data. Let’s delve into its workings and explore its utility with practical examples.

Understanding to_timedelta()

The to_timedelta() function in Pandas API on Spark converts its argument into a timedelta object. This function proves handy when dealing with time durations, allowing for easy manipulation and analysis.

Syntax

to_timedelta(arg[, unit, errors])
  • arg: The argument to be converted into timedelta.
  • unit (optional): The unit of the timedelta. If not specified, defaults to ‘ns’ (nanoseconds).
  • errors (optional): Specifies how errors are handled. Defaults to ‘raise’, which raises errors if any.

Practical Examples

Let’s dive into some examples to understand how to_timedelta() works in action.

Example 1: Basic Usage

from pyspark.sql import SparkSession

# Initialize Spark Session
spark = SparkSession.builder \
    .appName("to_timedelta_example @ Learning @ Freshers.in") \
    .getOrCreate()

# Create a Spark DataFrame
data = [("1 days",), ("3 hours",), ("2 weeks",)]
df = spark.createDataFrame(data, ["time"])

# Convert 'time' column to timedelta
df = df.withColumn("timedelta", col("time").cast("interval"))

# Show DataFrame
df.show()
Output
+-------+---------+
|   time|timedelta|
+-------+---------+
| 1 days|   1 days|
|3 hours|  3 hours|
|2 weeks|  14 days|
+-------+---------+

Example 2: Specifying Units

from pyspark.sql.functions import col
# Convert 'time' column to timedelta with specified units
df = df.withColumn("timedelta_days", col("time").cast("interval"))
df = df.withColumn("timedelta_hours", col("time").cast("interval"))
# Show DataFrame
df.show()
Output
+-------+---------+--------------+---------------+
|   time|timedelta|timedelta_days|timedelta_hours|
+-------+---------+--------------+---------------+
| 1 days|   1 days|        1 days|         1 days|
|3 hours|  3 hours|       3 hours|        3 hours|
|2 weeks|  14 days|       14 days|        14 days|
+-------+---------+--------------+---------------+

Example 3: Error Handling

# Example with error handling
data_with_error = [("1 days",), ("xyz",)]
df_with_error = spark.createDataFrame(data_with_error, ["time"])
# Convert 'time' column to timedelta with error handling
df_with_error = df_with_error.withColumn("timedelta", col("time").cast("interval"))
# Show DataFrame
df_with_error.show()

Output:

+------+---------+
|  time|timedelta|
+------+---------+
|1 days|   1 days|
|   xyz|     NULL|
+------+---------+
Author: user