Tag: Big Data

AWS Glue @ Freshers.in

What are the Python libraries provided by AWS Glue Version 2.0

The defaults Python libraries available in AWS Glue version 2.0 are as below boto3==1.12.4 botocore==1.15.4 certifi==2019.11.28 chardet==3.0.4 cycler==0.10.0 Cython==0.29.15 docutils==0.15.2…

PySpark @ Freshers.in

AWS Glue : Example on how to read a sample csv file with PySpark

Here assume that you have your CSV data in AWS S3 bucket. The next step is the crawl the data…

PySpark @ Freshers.in

How to renaming Spark Dataframe having a complex schema with AWS Glue – PySpark

There can be multiple reason to rename the Spark Data frame . Even though withColumnRenamed can be used to rename…

What is the problem in having lots of small files in HDFS? What is the remediation plan?

In Hadoop ecosystem we are storing files under folders in HDFS, most of the time the folder name we are…

Explain distributed cache in Hadoop ?

Distributed cache is a facility provided by Hadoop map reduce framework to access small file needed by application during its…

What is Swappiness Value? What is the role of Swappiness Value during the cluster set up?

vm.swappiness is one of the Kernel Parameter in Linux or UNIX, vm.swappiness value is from 0-100 which controls the swapping…

What is Snowflake Merge Command ? How to use it ?

The Snowflake Merge command will allows you to perform merge operations between two tables. The Merge operation includes Insert, Delete,…

What are the Data Processing Operators in Snowflake ?

Filter : Represents an operation that filters the records. Attributes: Filter condition – the condition used to perform filtering. Join…

What are the Query Operators supported by Snowflake

Snowflake supports most of the standard operators defined in SQL:1999. Arithmetic Operators + , – , * , / ,…