Hive interview questions

43. When you point a partition of a hive table to a new directory, what happens to the data?
The data stays in the old location. It has to be moved manually.
Copy the data for the partition being moved to S3. For example, you can use the
hadoop distcp command:
hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02
Alter the table to point the partition to the S3 location:
ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)
SET LOCATION ‘s3n://ourbucket/logs/2011/01/02’;
Remove the HDFS copy of the partition using the hadoop fs -rmr command:
hadoop fs -rmr /data/log_messages/2011/01/02

44. Does the archiving of Hive tables give any space saving in HDFS?
No. It only reduces the number of files which becomes easier for namenode to manage.Hive has built-in support to convert files in existing partitions to a Hadoop Archive (HAR) so that a partition that may once have consisted of 100’s of files can occupy just ~3 files.However, the trade-off is that queries may be slower due to the additional overhead in reading from the HAR. Note that archiving does NOT compress the files – HAR is analogous to the Unix tar command.
hive> set hive.archive.enabled=true;
hive> set hive.archive.har.parentdir.settable=true;
hive> set har.partfile.size=1099511627776;
Usage: hadoop archive -archiveName name -p <parent> <src>* <dest>
hadoop archive -archiveName foo.har -p /user/hadoop dir1 dir2 /user/zoo

45. How can you stop a partition from being queried?
By using the ENABLE OFFLINE clause with ALTER TABLE atatement.
The following statements prevent the partition from being dropped and queried:
ALTER TABLE log_messages
PARTITION(year = 2012, month = 1, day = 1) ENABLE NO_DROP;
ALTER TABLE log_messages
PARTITION(year = 2012, month = 1, day = 1) ENABLE OFFLINE;
Enabling OFFLINE prevents the data in a table or partition from being queried, but the metadata can still be accessed.

46. What does the following query do?
PARTITION (country, state)
SELECT …, se.cnty,
FROM staged_employees se;
It creates partition on table employees with partition values coming from the columns in the select clause. It is called Dynamic partition insert.

47. What is a Table generating Function on hive?
A table generating function is a function which takes a single column as argument and expands it to multiple column or rows. Example explode()

48. How can Hive avoid mapreduce?
If we set the property to true then hive will avoid mapreduce to fetch query results.

49. What is the difference between LIKE and RLIKE operators in Hive?
The LIKE operator behaves the same way as the regular SQL operators used in select queries. Example −
street_name like ‘%Chi’
But the RLIKE operator uses more advance regular expressions which are available in java
Example − street_name RLIKE ‘.*(Chi|Oho).*’ which will select any word which has either chi or oho in it.

Author: user

Leave a Reply

Your email address will not be published. Required fields are marked *