Amazon Athena interview questions

36. How can I query data coming from Kinesis Firehose using Athena?
If your Kinesis Firehose data is stored in Amazon S3, you can query it using Amazon Athena. Simply create a schema for your data in Athena and start querying. We recommend that you organize the data into partitions to optimize performance. You can add partitions created by Kinesis Firehose using ALTER TABLE DDL statements.

37. How do I add new data to an existing table in Amazon Athena?
If your data is partitioned, you will need to run a metadata query (ALTER TABLE ADD PARTITION) to add the partition to Athena once new data becomes available on Amazon S3. If your data is not partitioned, just adding the new data (or files) to the existing prefix automatically adds the data to Athena.

38. I already have large quantities of log data in Amazon S3. Can I use Amazon Athena to query it?
Yes, Amazon Athena makes it easy to run standard SQL queries on your existing log data. Athena queries data directly from Amazon S3 so there’s no data movement or loading required. Simply define your schema using DDL statements and start querying your data right away.

39. What kinds of queries does Amazon Athena support?
Amazon Athena supports ANSI SQL queries. Amazon Athena uses Presto, an open source, in-memory, distributed SQL engine, and can handle complex analysis, including large joins, window functions, and arrays.

40. How do Athena data source connectors work?
You can run SQL queries against new data stores by registering the data store with Athena. To register a data source, you use an Athena Data Source Connector specific to the data source. A connector can be used to extend Athena’s querying capability to new data sources. You can use AWS provided open source connectors, build your own or contribute to existing connectors, or use community or marketplace-built connectors. Depending on the type of data source, a connector manages metadata information, identifies specific parts of the tables that need to be scanned, read or filtered, and manages parallelism.

Author: user

Leave a Reply