Tag: big_data_interview

AWS Glue @ Freshers.in

Optimizing data queries with AWS Glue and Amazon Athena

AWS Glue, a serverless data integration service, and Amazon Athena, an interactive query service, together offer a seamless solution for…

Continue Reading Optimizing data queries with AWS Glue and Amazon Athena
AWS Glue @ Freshers.in

Mastering data partitioning in AWS Glue

This article explores how AWS Glue handles data partitioning during processing, supplemented by a real-world example. Understanding data partitioning in…

Continue Reading Mastering data partitioning in AWS Glue
AWS Glue @ Freshers.in

Ensuring data integrity with AWS Glue: A practical guide to data validation

In the world of big data, ensuring the accuracy and integrity of data during ingestion is paramount. AWS Glue, a…

Continue Reading Ensuring data integrity with AWS Glue: A practical guide to data validation
PySpark @ Freshers.in

Replacing NaN (Not a Number) values with a specified value in a column : nanvl

The nanvl function in PySpark is used to replace NaN (Not a Number) values with a specified value in a…

Continue Reading Replacing NaN (Not a Number) values with a specified value in a column : nanvl
PySpark @ Freshers.in

Computing the average value of a numeric column in PySpark

The mean function in PySpark is used to compute the average value of a numeric column. This function is part…

Continue Reading Computing the average value of a numeric column in PySpark
PySpark @ Freshers.in

Concatenating two or more maps into a single map : map_concat

The map_concat function in PySpark is designed to concatenate two or more maps into a single map. It merges key-value…

Continue Reading Concatenating two or more maps into a single map : map_concat
PySpark @ Freshers.in

Removing leading spaces (spaces on the left side) from a string in PySpark

PySpark, a leading tool in big data processing, provides several functions for string manipulation, one of which is ltrim. This…

Continue Reading Removing leading spaces (spaces on the left side) from a string in PySpark
PySpark @ Freshers.in

Adding a new column to a DataFrame with a constant value

The lit function in PySpark is a straightforward yet powerful tool for adding constant values as new columns in a…

Continue Reading Adding a new column to a DataFrame with a constant value
PySpark @ Freshers.in

Finding the position of a substring within a string using PySpark

pyspark.sql.functions.locate PySpark, a tool for handling large-scale data processing, offers a plethora of functions for string manipulation, one of which…

Continue Reading Finding the position of a substring within a string using PySpark