Category: article
PySpark : Calculating the Difference Between Dates with PySpark: The months_between Function
When working with time series data, it is often necessary to calculate the time difference between two dates. Apache Spark…
PySpark : Retrieving Unique Elements from two arrays in PySpark
Let’s start by creating a DataFrame named freshers_in. We’ll make it contain two array columns named ‘array1’ and ‘array2’, filled…
GIT : The runner has not yet connected yet [ Solved ]
In GitLab, the process of setting up a new runner and connecting it to a project involves several steps. Here…
Hive : How to preserve Hive metadata [Preserve the last DDL time for the table]
HOLD_DDLTIME The “last DDL time” refers to the timestamp of the most recent DDL (Data Definition Language) operation that was…
Google BigQuery vs. AWS Redshift vs. Snowflake: A Detailed Comparison
Cloud-based data warehousing has revolutionized the way organizations manage and analyze large datasets. Among the most popular cloud data warehouse…
GCP : Monitoring Google BigQuery Costs for Each SQL Query
Google BigQuery is a powerful tool for analyzing large datasets, but it’s also important to keep track of costs to…
GCP : Connecting Python to Google BigQuery
Google BigQuery is a web service from Google that is used for handling and analyzing big data. It’s part of…
ML : Convolutional Neural Network (CNN) : Most frequently asked questions
What is a Convolutional Neural Network (CNN) and how does it differ from other types of neural networks? A Convolutional…
Docker : Not able to access other website or git inside a docker ? Solved
Consider an example when you are trying to connect git within the dockerĀ The error message fatal: unable to access…
DNS resolution problem : Runner is unable to resolve the domain name ‘git.com’
When you are trying to run the job through the runner, the runner is unable to resolve the domain name…