Apache Spark interview questions

PySpark @ Freshers.in

120. Explain Lambda
Lambda function is an anonymous function because it has no name . Lambda can have only line.Python supports the creation of anonymous functions (i.e. functions that are not bound to a name) at runtime, using a construct called lambda.Like def, the lambda creates a function to be called later. But it returns the function instead of assigning it to a name. This is why lambdas are sometimes known as anonymous functions.
Add = lambda num1,num2 : num1 + num2
Print ( Add(1,3))

121. map vs flatMap
map :It returns a new RDD by applying a function to each element of the RDD. Function in map can return only one item.

flatMap: Similar to map, it returns a new RDD by applying a function to each element of the RDD, but output is flattened.
Also, function in flatMap can return a list of elements (0 or more)

Example1:-
sc.parallelize([3,4,5]).map(lambda x: range(1,x)).collect()
Output:
[[1, 2], [1, 2, 3], [1, 2, 3, 4]]

sc.parallelize([3,4,5]).flatMap(lambda x: range(1,x)).collect()
Output: notice o/p is flattened out in a single list
[1, 2, 1, 2, 3, 1, 2, 3, 4]

122. What are the list of Transformations in Apache Spark ?
Map
Filter
Flatmap
MapPartitions
MapPartitionsWithIndex
Sample
Union
Intersection
Distinct
Cartesian
Pipe
Coalesce
Repartition
RepartitionAndSortWithinPartitions

123. What are the list of Actions in Apache Spark ?
Reduce
Collect
Count
First
Take
TakeSample
TakeOrdered
SaveAsTextFile
CountByKey
ForEach

124. What are the list of Key-Value Pair RDDs in Apache Spark ?
GroupByKey
ReduceByKey
AggregateByKey
SortByKey
Join
CoGroup

125. What are the list of Input And Output functions in Apache Spark ?
WholeTextFile
Pickle Files
HadoopInputFormat
HadoopOutputFormat

126. What are the list of modules to in crease the Performance in Apache Spark ?
Broadcast Variables
Accumulators
Using A Custom Accumulator
Partitioning

Author: user

Leave a Reply