How to round the given value to scale decimal places using HALF_EVEN rounding in Spark – PySpark

PySpark @ Freshers.in

bround function

bround function returns the rounded expr using HALF_EVEN rounding mode. That means bround will round the given value to scale decimal places using HALF_EVEN rounding mode if scale >= 0 or at integral part when scale < 0.

Syntax

bround(expr [,targetScale] )

expr: The Numeric expression.

targetScale: This is the INTEGER expression which is greater or equal to 0. If targetScale is omitted the default is 0.

If the expr is DECIMAL, then it will return the result is DECIMAL with a scale that is the smaller of expr scale and targetScale.

In the case of HALF_EVEN rounding, which is also known as Gaussian or banker’s rounding, the digit 5 is rounded towards an even digit.

Where bround is used?

bround  is useful when we are performing multiple rounding operations and we want the cumulative result to be a true average, and not skewed up or down, as it would be with HALF_UP or HALF_DOWN. This is used for statistical analysis or with any situation where you want random averaging.

Scenario where this is used.

Consider you have a retail account receiving large numbers of purchases of various sizes, and did standard rounding like 0.5 goes to 1 , then surely there can be a subtle bias to overestimate the total. Now if you did the converse which mean 0.5 goes to 0, there will be an opposite bias. By doing the half-even strategy, about half the time you round .5 up, and half the time down, controlled by the digit before the decimal. You rely on this digit being about half the time even, and half the time odd.

Example

from pyspark.sql import SparkSession
from pyspark.sql.functions import bround
df = spark.createDataFrame([(2.5,4.4,2.1)], ['col1','col2','col3'])
df.select(bround('col1', 0)).collect()
[Row(bround(col1, 0)=2.0)]
df.select(bround('col1', 0)).show()
+---------------+
|bround(col1, 0)|
+---------------+
|            2.0|
+---------------+
df.select(bround('col1', 0),bround('col2', 1)).show()
+---------------+---------------+
|bround(col1, 0)|bround(col2, 1)|
+---------------+---------------+
|            2.0|            4.4|
+---------------+---------------+
df.select(bround('col1', 0),bround('col2', 0)).show()
+---------------+---------------+
|bround(col1, 0)|bround(col2, 0)|
+---------------+---------------+
|            2.0|            4.0|
+---------------+---------------+
df.select(bround('col1', 0),bround('col2', 0),bround('col2', 1)).show()
+---------------+---------------+---------------+
|bround(col1, 0)|bround(col2, 0)|bround(col2, 1)|
+---------------+---------------+---------------+
|            2.0|            4.0|            4.4|
+---------------+---------------+---------------+
Author: user

Leave a Reply