Category: article
Hive : How to Kill a Running Query in Apache Hive
There may be times when a running query needs to be terminated due to excessive resource usage, incorrect syntax, or…
Hive : Seeing Long Running Queries in Apache Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop that provides data query and analysis….
DBT : Using DBT (Data Build Tool) for Testing and Capturing Test Results
One of the great features of DBT is its testing framework. DBT allows us to validate the correctness and reliability…
Python : How to list all available timezones
To list all available timezones, you can use pytz library: import pytz for tz in pytz.all_timezones: print(tz) Here is the…
PySpark : from_utc_timestamp Function: A Detailed Guide
The from_utc_timestamp function in PySpark is a highly useful function that allows users to convert UTC time to a specified…
Python : Implementing Threads in Python [Run concurrently]
Threading is a technique in programming where tasks can be run concurrently. This is particularly useful for I/O-bound tasks, where…
PySpark : Fixing ‘TypeError: an integer is required (got type bytes)’ Error in PySpark with Spark 2.4.4
Apache Spark is an open-source distributed general-purpose cluster-computing framework. PySpark is the Python library for Spark, and it provides an…
AWS : Transferring files from Amazon S3 to an external SFTP server using AWS Transfer Family
AWS Transfer Family is a fully managed service that enables the transfer of files over SFTP, FTPS, and FTP directly…
Python : Search for a word in all files and subfolders with Python
In this article, we’re going to learn how to search for a particular word in all the files contained within…
How to Copy Data from Redshift to Snowflake
Copying data from one database management system to another, specifically from Amazon Redshift to Snowflake, can be done by several…