PySpark offers the exp function in its pyspark.sql.functions module, which calculates the exponential of a given column.
In this article, we will delve into the details of this function, exploring its usage through an illustrative example.
The exp function signature in PySpark is as follows:
The function takes a single argument:
col: A column expression representing a column in a DataFrame. The column should contain numeric data for which you want to compute the exponential.
Let’s examine a practical example to better understand the exp function. Suppose we have a DataFrame named df containing a single column, col1, with five numeric values.
from pyspark.sql import SparkSession from pyspark.sql.functions import lit spark = SparkSession.builder.getOrCreate() data = [(1.0,), (2.0,), (3.0,), (4.0,), (5.0,)] df = spark.createDataFrame(data, ["col1"]) df.show()
Result : DataFrame:
+----+ |col1| +----+ | 1.0| | 2.0| | 3.0| | 4.0| | 5.0| +----+
Now, we wish to compute the exponential of each value in the col1 column. We can achieve this using the exp function:
from pyspark.sql.functions import exp df_exp = df.withColumn("col1_exp", exp(df["col1"])) df_exp.show()
In this code, the withColumn function is utilized to add a new column to the DataFrame. This new column, col1_exp, will contain the exponential of each value in the col1 column. The output will resemble the following:
+----+------------------+ |col1| col1_exp| +----+------------------+ | 1.0|2.7182818284590455| | 2.0| 7.38905609893065| | 3.0|20.085536923187668| | 4.0|54.598150033144236| | 5.0| 148.4131591025766| +----+------------------+
As you can see, the col1_exp column now holds the exponential of the values in the col1 column.
PySpark’s exp function is a beneficial tool for computing the exponential of numeric data. It is a must-have in the toolkit of data scientists and engineers dealing with large datasets, as it empowers them to perform complex transformations with ease.
Spark important urls to refer