Apache Spark interview questions

PySpark @ Freshers.in

154. What is textFile(name, minPartitions=None, use_unicode=True) in Apache Spark ?
Read a text file from HDFS, a local file system (available on all nodes), or any Hadoopsupported
file system URI, and return it as an RDD of Strings.

155. What is union(rdds) in Apache Spark ?
Build the union of a list of RDDs
rdd1 = sc.parallelize([1, 2, 3])
rdd2 = sc.parallelize([5, 6, 7,4])
x = sc.union([rdd1,rdd2])
print x.collect()

Author: user

Leave a Reply