Category: article
Setting up Minikube using docker driver on ubuntu
Minikube is a tool that lets you run Kubernetes clusters locally, which can be especially useful for learning and development…
Optimizing PySpark queries with adaptive query execution – (AQE) – Example included
Spark 3+ brought numerous enhancements and features, and one of the notable ones is Adaptive Query Execution (AQE). AQE is…
Transferring elastic IP between AWS accounts – Step by step process
An AWS Elastic IP (EIP) is a steadfast public IPv4 address that users can allocate to AWS resources like EC2…
Handling NULL values in dynamic SQL insert statements using Python
In this we are dynamically creating and executing SQL insert statements to add rows from a DataFrame to a Snowflake…
PySpark : Calculate the Euclidean distance or the square root of the sum of the squares of its arguments using PySpark.
In PySpark, the hypot function is a mathematical function used to calculate the Euclidean distance or the square root of…
PySpark : How to perform compute covariance using covar_pop and covar_samp with PySpark
Covariance is a statistical measure that indicates the extent to which two variables change together. If the variables increase and…
Automated email responses using Gmail and google sheets with Google apps script
Automated email responses can be set up using Google Scripts, a scripting platform developed by Google for light-weight application development…
Navigating job dependencies in AWS glue – Managing ETL workflows
AWS Glue manages dependencies between jobs using triggers. Triggers can start jobs based on the completion status of other jobs,…
Airflow scheduler does not appear to be running. Last heartbeat was received 20 minutes ago. The DAGs list may not update : Resolved
You may get an error in Airflow asĀ “The scheduler does not appear to be running. Last heartbeat was received…
Spark repartition() vs coalesce() – A complete information
In PySpark, managing data across different partitions is crucial for optimizing performance, especially for large-scale data processing tasks. Two methods…