Amazon Redshift interview questions

Appends rows to a target table by moving data from an existing source table. Data in the source table is moved to matching columns in the target table. ALTER TABLE APPEND moves data blocks between the source table and the target table. To improve performance, ALTER TABLE APPEND doesn’t compact storage as part of the append operation. As a result, storage usage increases temporarily. To reclaim the space, run a VACUUM operation.

22. Where can we get the comprehensive information about a table, including data distribution skew ?
Use SVV_TABLE_INFO to view more comprehensive information about a table, including data distribution skew, key distribution skew, table size, and statistics.

23. What is redshift spectrum?
Amazon Redshift Spectrum is a feature within Amazon Web Services’ Redshift data warehousing service that lets a data analyst conduct fast, complex analysis on objects stored on the AWS cloud. With Redshift Spectrum, an analyst can perform SQL queries on data stored in Amazon S3 buckets.Redshift Spectrum enables you to run queries against exabytes of data in Amazon S3. There is no loading or ETL required. Even if you don’t store any of your data in Amazon Redshift, you can still use Redshift Spectrum to query datasets as large as an exabyte in Amazon S3. When you issue a query, it goes to the Amazon Redshift SQL endpoint, which generates the query plan. Amazon Redshift determines what data is local and what is in Amazon S3, generates a plan to minimize the amount of Amazon S3 data that needs to be read, requests Redshift Spectrum workers out of a shared resource pool to read and process data from Amazon S3, and pulls results back into your Amazon Redshift cluster for any remaining processing.

24. Differentiate RDS and Redshift ?
RDS is Amazon’s relational databases as a service offering. They will provision you server(s) that run a certain relational database software like PostgreSQL, MySQL, SQL Server etc. It is meant to be used as your main or supporting data store and transactional database.
Redshift on the other hand is Amazon’s analytics database offering based on ParAccel technology and running a fork of PostgreSQL (v 8.4 I believe). Unlike RDS it is not meant to be your primary database where user traffic is hitting ( unless you’re running an analytics service :)). Redshift is designed and suited to crunch data and excels at doing that, i.e. running “big” or “heavy” queries againt large datasets.

25. What is the pricing model of Amazon Redshift?
Compute node hours : Compute node hours are the total number of hours you run across all your compute nodes for the billing period.
Backup Storage : Backup storage is the storage associated with your automated and manual snapshots for your data warehouse.
Data transfer : There is no data transfer charge for data transferred to or from Amazon Redshift and Amazon S3 within the same AWS Region. For all other data transfers into and out of Amazon Redshift, you will be billed at standard AWS data transfer rates.
Data scanned : With Redshift Spectrum, you are charged for the amount of Amazon S3 data scanned to execute your query.

Author: user

Leave a Reply