Amazon Athena interview questions

46. How to Connect to AWS Athena with R ?
The Athena JDBC driver is expecting AWS Access Key Id as the user, and the Secret Key as the password
accessKeyId <- “your access key id…”
secretKey <- “your secret key…”
jdbcConnection <- dbConnect(
drv,
‘jdbc:awsathena://athena.us-east-1.amazonaws.com:443’,
s3_staging_dir=”s3://mybucket”,
user=accessKeyId,
password=secretKey
)

47. How to create AWS Athena view programmatically ?
We can create an Athena view programmatically via the AWS CLI using the start-query-execution. This require you to provide an S3 location for the results even though you won’t need to check the file (Athena will put an empty txt file in the location).
$ aws athena start-query-execution –query-string “create view my_view as select * from my_table” –result-configuration “OutputLocation=s3://my-bucket/tmp” –query-execution-context “Database=my_database”
{
“QueryExecutionId”: “54585525-r121-4311-80e11-b31234551cc654”
}
You can check whether your view creation was successful by using the get-query-execution command.
$ aws –region athena get-query-execution –query-execution-id doirjdu-qqb0-43d1-787f-8888888c6d9b
Creating views programmatically in Athena is not documented, and unsupported, but possible. What happens behind the scenes when you create a view using StartQueryExecution is that Athena lets Presto create the view and then extracts Presto’s internal representation and puts it in the Glue catalog.
These are the requirements on a Glue table to work as an Athena view:
TableType : Must be VIRTUAL_VIEW
Parameters : Must contain presto_view: true
TableInput.ViewOriginalText : Must contain an encoded Presto view (see below)
StorageDescriptor.SerdeInfo : Must be an empty map
StorageDescriptor.Columns : Must contain all the columns that the view defines, with their types

48. Is there any support for OFFSET in AWS Athena ?
OFFSET Is not supported by AWS Athena. Athena is basically managed Presto. Since Presto 311 you can use OFFSET m LIMIT n syntax or ANSI SQL equivalent: OFFSET m ROWS FETCH NEXT n ROWS ONLY.
SELECT * FROM (SELECT row_number() over() AS rn, * FROM elb_logs) WHERE rn BETWEEN 5 AND 10;

49. How to get input file name as column in AWS Athena external tables
You can get the input file name as column with the $path pseudo column.
select “$path” from table

50. How to create external table skipping first row ?
You can use the skip.header.line.count property when defining tables, to allow Athena to ignore headers. This is supported for queries that use the LazySimpleSerDe and OpenCSV SerDe, and not for Grok or Regex SerDes.
TBLPROPERTIES (‘skip.header.line.count’=’1’)

 

Author: user

Leave a Reply