Hive is an open-source data warehouse tool built on top of Hadoop. It allows users…
Category: article
Hive : Hive Optimizers: A Comprehensive Guide
Hive is a data warehousing tool that provides a SQL-like interface for querying large datasets stored in Hadoop Distributed File…
Hive : Comparison between the ORC and Parquet file formats in Hive
ORC (Optimized Row Columnar) and Parquet are two popular file formats for storing and processing large datasets in Hadoop-based systems…
Hive : Different types of storage formats supported by Hive.[16 Formats supported by Hive]
Apache Hive is an open-source data warehousing tool that was developed to provide an SQL-like interface to query and analyze…
Airflow : Using Boto3 in Airflow
Boto3 is the Amazon Web Services (AWS) SDK for Python, which allows Python developers to write software that makes use…
PySpark : Setting PySpark parameters – A complete Walkthru [3 Ways]
In PySpark, you can set various parameters to configure your Spark application. These parameters can be set in different ways…
PySpark : Using CASE WHEN for Spark SQL to conditionally execute expressions : Dataframe and SQL way explained
The WHEN clause is used in Spark SQL to conditionally execute expressions. It’s similar to a CASE statement in SQL…
Spark : Calculation of executor memory in Spark – A complete info.
The executor memory is the amount of memory allocated to each executor in a Spark cluster. It determines the amount…
Hive : How to load JSON and nested JSON in Hive and how to view it [Sample code with Data]
In this article, I’ll walk you through how to read JSON data from a Hive table using an example with…
Snowflake : LIMIT and FETCH of Snowflake . How it differs ? When and where its used.
In Snowflake, the LIMIT and FETCH clauses are used to limit the number of rows returned by a query. While…
Snowflake : Filtering the results of window functions in Snowflake [QUALIFY]
One of the features that sets Snowflake apart from other data warehousing solutions is its support for advanced SQL constructs…