Apache Spark interview questions

PySpark @ Freshers.in

147. What will getLocalProperty(key) do in Apache Spark ?
Get a local property set in this thread, or null if it is missing. See setLocalProperty

148. What is sequenceFile in Apache Spark ? How to access ?
sequenceFile(path, keyClass=None, valueClass=None, keyConverter=None, valueConverter=None, minSplits=None, batchSize=0)
Read a Hadoop SequenceFile with arbitrary key and value Writable class from HDFS, a local file system (available on all nodes), or any Hadoopsupported file system URI. The mechanism is as follows
1. A Java RDD is created from the SequenceFile or other InputFormat, and the
key and value Writable classes
2. Serialization is attempted via Pyrolite pickling
3. If this fails, the fallback is to call ‘toString’ on each key and value
4. PickleSerializer is used to deserialize pickled objects on the Python side
Parameters: path – path to sequncefile
keyClass – fully qualified classname of key Writable class (e.g.
“org.apache.hadoop.io.Text”)
valueClass – fully qualified classname of value Writable class (e.g.
“org.apache.hadoop.io.LongWritable”)
keyConverter –
valueConverter –
minSplits – minimum splits in dataset (default min(2, sc.defaultParallelism))
batchSize – The number of Python objects represented as a single Java
object. (default 0, choose batchSize automatically)

149. What is setCheckpointDir(dirName) in Apache Spark ?
Set the directory under which RDDs are going to be checkpointed. The directory must be a HDFS path if running on a cluster.

150. What is sparkUser() in Apache Spark ?
Get SPARK_USER for user who is running SparkContext.
>>> sc.sparkUser()

151. What is startTime in Apache Spark ?
Return the epoch time when the Spark Context was started.
>>> sc._jsc.startTime()
1510567393189L

152. What is statusTracker() in Apache Spark ?
Return StatusTracker object (Low-level status reporting APIs for monitoring job and stage progress.)
>>> status = sc.statusTracker()
>>> print status
<pyspark.status.StatusTracker object at 0x7f954a5c1dd0>

153. What is stop() in Apache Spark ?
Shut down the SparkContext.

Author: user

Leave a Reply