Welcome! Blurpalicious is a social platform where you can create content and share it to the world!

Apache Spark Interview Questions and Answers

Q1. What is Apache Spark ?Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming. Apart from supporting all these workload in a respective system, it reduces the management burden of maintaining separate tools.

 Q2. What is sparkContext?SparkContext is the entry point to Spark. Using sparkContext you create RDDs which provided various ways of churning data.

Q3. Why is Spark faster than MapReduce?A. There are few important reasons why Spark is faster than MapReduce and some of them are below:

  • There is no tight coupling in Spark i.e., there is no mandatory rule that reduce must come after map.
  • Spark tries to keep the data “in-memory” as much as possible.

In MapReduce, the intermediate data will be stored in HDFS and hence takes longer time to get the data from a source but this is not the case with Spark.

 Q4. Explain the Apache Spark Architecture.

  • Apache Spark application contains two programs namely a Driver program and Workers program.
  • A cluster manager will be there in-between to interact with these two cluster nodes. Spark Context will keep in touch with the worker nodes with the help of Cluster Manager.
  • Spark Context is like a master and Spark workers are like slaves.
  • Workers contain the executors to run the job. If any dependencies or arguments have to be passed then Spark Context will take care of that. RDD’s will reside on the Spark Executors.
  • You can also run Spark applications locally using a thread, and if you want to take advantage of distributed environments you can take the help of S3, HDFS or any other storage system


>>Read More>>

1 visits |0 Comments|Reply

There are no comments on this post

Write a New Comment on Apache Spark Interview Questions and Answers

Please Log In or Register to post comments.