Apache Storm interview questions

user March 21, 2021 Leave a Comment

1. What is Apache Storm?
Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Apache Storm is simple, can be used with any programming language.

2. What are the different type of nodes on a Storm cluster?
There are two kinds of nodes on a Storm cluster: the master node and the worker nodes. The master node runs a daemon called “Nimbus” that is similar to Hadoop’s “JobTracker”. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.Each worker node runs a daemon called the “Supervisor”. The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Each worker process executes a subset of a topology; a running topology consists of many worker processes spread across many machines.

3. What is Topologies in Apache Storm ?
A topology is a graph of computation. Each node in a topology contains processing logic, and links between nodes indicate how data should be passed around between nodes. To do realtime computation on Storm, we need to create “topologies”. Since topology definitions are just Thrift structs, and Nimbus is a Thrift service, you can create and submit topologies using any programming language

4. What is Streams in Apache Storm ?
The core abstraction in Storm is the “stream”. A stream is an unbounded sequence of tuples. Storm provides the primitives for transforming a stream into a new stream in a distributed and reliable way.

5. What is Spouts in Apache Storm ?
A spout is a source of streams in a topology. Generally spouts will read tuples from an external source and emit them into the topology .
Spouts can emit more than one stream. To do so, declare multiple streams using the declareStream method of OutputFieldsDeclarer and specify the stream to emit to when using the emit method on SpoutOutputCollector.

Post Views: 107

Related Posts

Installing Apache Spark standalone on Linux
Installing Spark on a Linux machine can be done in a few steps. The following…

Learn how to connect Hive with Apache Spark.
HiveContext is a Spark SQL module that allows you to work with Hive data in…

When you should not use Apache Spark ? Explain with reason.
There are a few situations where it may not be appropriate to use Apache Spark,…

Apache PIG interview questions
1. What is pig? Pig is a Apache open soucre project which run on top…

How do you break a lineage in Apache Spark ? Why we need to break a lineage in Apache Spark ?
In Apache Spark, a lineage refers to the series of RDD (Resilient Distributed Dataset) operations…

AWS Glue interview questions
For Spark please visit (1) Spark Interview Questions (2) Spark Examples (3) PySpark Blogs 1.…

Algorithm interview questions
1. What is Insertion sort ? Insertion sort takes elements of the array sequentially and…

Digital Electronics interview questions
1. What is a Multiplexer ? Multiplexer, also known as a data selector, is a…

Operating system interview questions
1. What is the main purpose of an operating system ? Three main functions: a.…

Compiler interview questions
1. What is an interpreter ? An interpreter is a program that appears to execute…

Pages: 1 2 3 4 5

Share: Twitter Facebook Pinterest Reddit VK Digg Linkedin Mix
Tagged interview_qa, software_engineering, Technical

Author: user

Website

Related Articles

dbt (data build tool) interview questions

Hive interview questions

Apache PIG interview questions

Amazon Athena interview questions

Computer Organization interview questions

OOPS interview questions for freshers and experienced

Data communication interview questions

Amazon Redshift interview questions

Post navigation

Apache PIG interview questions →
← Amazon API Gateway interview questions

Leave a Reply Cancel reply
You must be logged in to post a comment.

Search for:
Trending
DBT
Python
Numpy
PySpark
Hive
Snowflake
Redshift
Airflow
Aptitude

Recent Posts

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Featured Posts – Slider Widget

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Impact of Shard Count Modification on AWS Kinesis Streams

How to map values of a Series according to an input correspondence:SSeries.map()

Understanding Series.transform(func[, axis])

Series.aggregate(func) : Pandas API on Spark

Series.agg(func) : Pandas API on Spark

Related Posts

Installing Apache Spark standalone on Linux
Installing Spark on a Linux machine can be done in a few steps. The following…

Learn how to connect Hive with Apache Spark.
HiveContext is a Spark SQL module that allows you to work with Hive data in…

When you should not use Apache Spark ? Explain with reason.
There are a few situations where it may not be appropriate to use Apache Spark,…

Apache PIG interview questions
1. What is pig? Pig is a Apache open soucre project which run on top…

How do you break a lineage in Apache Spark ? Why we need to break a lineage in Apache Spark ?
In Apache Spark, a lineage refers to the series of RDD (Resilient Distributed Dataset) operations…

AWS Glue interview questions
For Spark please visit (1) Spark Interview Questions (2) Spark Examples (3) PySpark Blogs 1.…

Algorithm interview questions
1. What is Insertion sort ? Insertion sort takes elements of the array sequentially and…

Digital Electronics interview questions
1. What is a Multiplexer ? Multiplexer, also known as a data selector, is a…

Operating system interview questions
1. What is the main purpose of an operating system ? Three main functions: a.…

Compiler interview questions
1. What is an interpreter ? An interpreter is a program that appears to execute…

Most Viewed Posts

dbt (data build tool) interview questions

Python throwing as NameError: name ‘__file__’ is not defined – Solution

DBT command not found after intalling DBT-How to resolve.

BigQuery : Handle missing or null values in BigQuery

Airflow dags not getting refreshed/updating. How to do it manually?

How to delete a partition data as well from Hive external table on DROP command?

PySpark – groupby with aggregation (count, sum, mean, min, max)

Copyright © 2024 Freshers.in