AWS Glue interview questions

11. Which Data Stores Can I Crawl using Glue?
Crawlers can crawl both file-based and table-based data stores.
Crawlers can crawl the following data stores through their respective native interfaces:
Amazon Simple Storage Service (Amazon S3)
Amazon DynamoDB
Crawlers can crawl the following data stores through a JDBC connection:
Amazon Redshift
Amazon Relational Database Service (Amazon RDS)
Amazon Aurora
Microsoft SQL Server
MySQL
Oracle
PostgreSQL
Publicly accessible databases
Aurora
Microsoft SQL Server
MySQL
Oracle
PostgreSQL

12. What is AWS Tags in AWS Glue ?
A tag is a label that you assign to an AWS resource. Each tag consists of a key and an optional value, both of which you define. You can use tags in AWS Glue to organize and identify your resources. Tags can be used to create cost accounting reports and restrict access to resources.

13. What is AWS Glue Metrics ?
When you interact with AWS Glue, it sends metrics to CloudWatch. You can view these metrics using the AWS Glue console (the preferred method), the CloudWatch console dashboard, or the AWS Command Line Interface (AWS CLI).

14. Is it possible to re-partition the data using AWS glue crawler?
You cant do it with help of crawler, however you can create new table manually in Athena.

15. Can we use Apache Spark web UI to monitor and debug AWS Glue ETL jobs ?
Yes, you can use the Apache Spark web UI to monitor and debug AWS Glue ETL jobs running on the AWS Glue job system, and also Spark applications running on AWS Glue development endpoints. The Spark UI enables you to check the following for each job:
The event timeline of each Spark stage
A directed acyclic graph (DAG) of the job
Physical and logical plans for SparkSQL queries
The underlying Spark environmental variables for each job

Author: user

Leave a Reply