Amazon Athena interview questions

11. Can I run inference on models deployed on other services such as Comprehend, Forecasting or Models deployed on my own EC2 cluster?
Athena only supports invoking ML models deployed on SageMaker.

12. How to improve the performance of my query in AWS Athena?
You can improve the performance of your query by compressing, partitioning, or converting your data into columnar formats. Amazon Athena supports open source columnar data formats such as Apache Parquet and Apache ORC. Converting your data into a compressed, columnar format lowers your cost and improves query performance by enabling Athena to scan less data from S3 when executing your query.

13. Does Athena support User Defined Functions (UDFs)?
Amazon Athena now supports user-defined functions (UDFs) to enable you to write custom scalar functions and invoke them in SQL queries. While Athena provides built-in functions, UDFs enables you to perform custom processing such as compressing and decompressing data, redacting sensitive data, or applying customized decryption.

14.Can I run any Hive Query on Athena?
Amazon Athena uses Hive only for DDL (Data Definition Language) and for creation/modification and deletion of tables and/or partitions. You can run ANSI-Compliant SQL SELECT statements to query your data in Amazon S3.

15. When should I use Amazon EMR vs. Amazon Athena?
Amazon EMR goes far beyond just running SQL queries. With EMR you can run a wide variety of scale-out data processing tasks for applications such as machine learning, graph analytics, data transformation, streaming data, and virtually anything you can code. You should use Amazon EMR if you use custom code to process and analyze extremely large datasets with the latest big data processing frameworks such as Spark, Hadoop, Presto, or Hbase. Amazon EMR gives you full control over the configuration of your clusters and the software installed on them.
You should use Amazon Athena if you want to run interactive ad hoc SQL queries against data on Amazon S3, without having to manage any infrastructure or clusters.

Author: user

Leave a Reply