Apache PIG interview questions

41. What does Flatten do in Pig?
Flatten un-nests bags and tuples. For tuples, the Flatten operator will substitute the fields of a tuple in place of a tuple, whereas un-nesting bags is a little complex because it requires creating new tuples.

42. Is PigLatin a strongly typed language? If yes, then how did you come to the conclusion?
In a strongly typed language, the user has to declare the type of all variables upfront. In Apache Pig, when you describe the schema of the data, it expects the data to come in the same format you mentioned. However, when the schema is not known, the script will adapt to actually data types at runtime. So, it can be said that PigLatin is strongly typed in most cases but in rare cases it is gently typed, i.e. it continues to work with data that does not live up to its expectations.

43. Differentiate between GROUP and COGROUP operators.
GROUP operator is generally used to group the data in a single relation for better readability, whereas COGROUP can be used to group the data in 2 or more relations.COGROUP is more like a combination of GROUP and JOIN, i.e., it groups the tables based on a column and then joins them on the grouped columns. It is possible to cogroup up to 127 relations at a time.

44. Explain the difference between COUNT_STAR and COUNT functions in Apache Pig?
COUNT function does not include the NULL value when counting the number of elements in a bag, whereas COUNT_STAR (0 function includes NULL values while counting.

45. How will you merge the contents of two or more relations and divide a
single relation into two or more relations?
This can be accomplished using the UNION and SPLIT operators.

Author: user

Leave a Reply