Apache PIG interview questions

16. Is it possible to join multiple fields in pig scripts?
yes,Join select records from one input and join with another input.This is done by indicating keys for each input. When those keys are equal, the two rows are joined.
input2 = load ‘daily’ as (exchanges, stocks);
input3 = load ‘week’ as (exchanges, stocks);
grpds = join input2 by stocks,input3 by stocks;
we can also join multiple keys
example:
input2 = load ‘daily’ as (exchanges, stocks);
input3 = load ‘week’ as (exchanges, stocks);
grpds = join input2 by (exchanges,stocks),input3 by (exchanges,stocks);

17. Is it possible to display the limited no of results?
yes,Sometimes you want to see only a limited number of results. ‘limit’ allows you do this:
input2 = load ‘daily’ as (exchanges, stocks);
first10 = limit input2 10;

18. What is a data flow language
(http://www.bigdataanalyst.in/pig-interview-questions-answers/) Instructions are flowing thru data by executing different control statements, but data doesnt get moved. Dataflow language can get a stream of data which passes from one instruction to another instruction to be processed.

19. Local mode vs Mapreduce mode ?
In Local mode no need to install or start the hadoop. Pig scripts run in the local system. By default PIG stores data in the file system. In mapreduce mode , you need to install and start the hadoop. Pig scripts run and store data in HDFS. In both mode we need to install and Java and PIG.

20. What is PIG engine ?
PIG engine operates on client server. It is simply an interpreter which converts your simple code into complex map-reduce operations. This mapreduce is now handled on distributed network of Hadoop.

Author: user

Leave a Reply