Tag: big_data_interview

Hive @ Freshers.in

Hive Transactional Table vs. Non-Transactional Table

Before we explore the differences between transactional and non-transactional tables, let’s grasp the basic concepts of Hive tables. Hive Table…

Continue Reading Hive Transactional Table vs. Non-Transactional Table
Hive @ Freshers.in

HiveServer1 vs. HiveServer2: A Comprehensive Comparison

Before diving into the comparison, let’s briefly understand what Hive servers are and their role in the Hive ecosystem. HiveServer1…

Continue Reading HiveServer1 vs. HiveServer2: A Comprehensive Comparison
Hive @ Freshers.in

Hive Script vs. Hive Query: Unraveling the Differences

This article aims to shed light on this topic, offering clarity and real-world examples to illustrate the contrasts. Understanding Hive…

Continue Reading Hive Script vs. Hive Query: Unraveling the Differences
PySpark @ Freshers.in

Returning the last value in a group during aggregation in PySpark

pyspark.sql.functions.last PySpark’s last() function is part of the PySpark SQL module, and it’s used to return the last value in…

Continue Reading Returning the last value in a group during aggregation in PySpark
PySpark @ Freshers.in

PySpark to count the number of elements in RDDs, DataFrames and DataSets

PySpark count() is a method applied to RDDs (Resilient Distributed Datasets), DataFrames, and DataSets in PySpark to count the number…

Continue Reading PySpark to count the number of elements in RDDs, DataFrames and DataSets
PySpark @ Freshers.in

Calculating the average of a set of numerical values in PySpark – avg – Examples included

PySpark’s avg function is designed for one of the most common data analysis tasks – calculating the average of a…

Continue Reading Calculating the average of a set of numerical values in PySpark – avg – Examples included
PySpark @ Freshers.in

PySpark’s atan2 function : To solve complex mathematical problems in distributed data processing.

pyspark.sql.functions.atan2 In this comprehensive guide, we will delve into the world of PySpark’s atan2 function – a mathematical gem that…

Continue Reading PySpark’s atan2 function : To solve complex mathematical problems in distributed data processing.
PySpark @ Freshers.in

Computing the Levenshtein distance between two strings using PySpark – Examples included

pyspark.sql.functions.levenshtein The Levenshtein function in PySpark computes the Levenshtein distance between two strings – that is, the minimum number of…

Continue Reading Computing the Levenshtein distance between two strings using PySpark – Examples included