Category: spark
Spark User full article
How to perform SQL-like column transformations in PySpark : selectExpr
selectExpr, a method that simplifies and enhances data transformation. This article aims to demystify selectExpr, highlighting its advantages and demonstrating…
Duplicating the contents of a string column a specified number of times
The repeat function in PySpark is used to duplicate the contents of a string column a specified number of times….
Extracting specific parts of a string that match a given regular expression pattern using PySpark
The regexp_extract function in PySpark is used for extracting specific parts of a string that match a given regular expression…
PySpark : Replace parts of a string that match a regular expression pattern using regexp_replace
PySpark provides powerful string manipulation capabilities, a crucial aspect of which is regular expression replacement. This article delves into the…
PySpark Math Functions: A Deep Dive into cos() and cosh()
Among its numerous features, PySpark provides a comprehensive set of mathematical functions that are essential for data analysis. In this…
Data Transformation and Analysis with PySpark ASCII
In today’s data-driven world, efficient data processing is essential for businesses to gain valuable insights and make informed decisions. PySpark…
Returning the last value in a group during aggregation in PySpark
pyspark.sql.functions.last PySpark’s last() function is part of the PySpark SQL module, and it’s used to return the last value in…
PySpark : Converting the first letter of each word in a string to uppercase and the rest to lowercase using PySpark
PySpark’s initcap() function is used to convert the first letter of each word in a string to uppercase and the…
PySpark to count the number of elements in RDDs, DataFrames and DataSets
PySpark count() is a method applied to RDDs (Resilient Distributed Datasets), DataFrames, and DataSets in PySpark to count the number…
Inverting the bits of an integer, changing all ‘0’ bits to ‘1’ and vice versa using PySpark : bitwiseNOT
BitwiseNOT is a fundamental bitwise operation that inverts the bits of an integer, changing all ‘0’ bits to ‘1’ and…