Apache Storm interview questions

user March 21, 2021 Leave a Comment

11. When will you call the clean-up method in Apache Storm?
The cleanup method is called when a Bolt is being shutdown and should cleanup any resources that were opened. There’s no guarantee that this method will be called on the cluster: for example, if the machine the task is running on blows up, there’s no way to invoke the method. The cleanup method is intended for when you run topologies in local mode (where a Storm cluster is simulated in process), and you want to be able to run and kill many topologies without suffering any resource leaks.

12. How to set up SSL for Apache Storm?
For UI users needs to set following config in storm.yaml. Generating keystores with proper keys and certs should be taken care by the user before this step.
ui.https.port
ui.https.keystore.type (example “jks”)
ui.https.keystore.path (example “/etc/ssl/storm_keystore.jks”)
ui.https.keystore.password (keystore password)
ui.https.key.password (private key password)
optional config 6. ui.https.truststore.path (example “/etc/ssl/storm_truststore.jks”) 7. ui.https.truststore.password (truststore password) 8. ui.https.truststore.type (example “jks”)
If users want to setup 2-way auth 9. ui.https.want.client.auth (If this set to true server requests for client certifcate authentication, but keeps the connection if no authentication provided) 10. ui.https.need.client.auth (If this set to true server requires client to provide authentication)

13. Apache Kafka vs Apache Storm
a. Data Security
i. Apache Kafka
Basically, Kafka does not guarantee data loss, or we can say it have the very low guarantee. For Example, for 7 Million message transactions per day, Netflix achieved 0.01% of data loss.
ii. Apache Storm
On comparison with Kafka, Storm guarantees full data security.
b. Data Storage
i. Apache Kafka
Apache Kafka store its data on the local filesystem, such as EXT4 and XFS.
ii. Apache Storm
On the other hand, Storm is just a data processing framework. That says it doesn’t store data it just transfers it from input to Output stream.
c. Real-time messaging system
i. Apache Kafka
Before processing only, Kafka used to store incoming messages.
ii. Apache Storm
However, Storm works on a Real-time messaging system.
d. Processing/ Transforming
i. Apache Kafka
We use Apache Kafka for processing the real-time data.
ii. Apache Storm
Whereas, we use Storm for transforming the data.
e. Data Source
i. Apache Kafka
Basically, Kafka pulls the data from the actual source of data.
ii. Apache Storm
On the other hand, Storm gets the data from Kafka itself regarding further processes.
f. Basic Task
i. Apache Kafka
While it comes to transferring real-time application data from the source application to another, we use Kafka application.
ii. Apache Storm
Well, we use Storm for aggregation as well as computation purpose.
g. Zookeeper Dependency
i. Apache Kafka
While setting up the Kafka, it’s mandatory to have Apache Zookeeper.
ii. Apache Storm
Whereas, we don’t need Zookeeper to make Storm work.
h. Fault-Tolerant
i. Apache Kafka
Due to Zookeeper, Kafka is fault tolerant.
ii. Apache Storm
The storm is capable of auto-restart its daemons itself.
i. Inventor
i. Apache Kafka
Kafka is invented by LinkedIn.
ii. Apache Storm
Whereas, Twitter invented Apache Storm.
j. Language Support
i. Apache Kafka
Basically, Kafka can work with all languages but while it comes to work best, Kafka works best with Java language only.
ii. Apache Storm
Strom supports all the languages.

14. Does Apache Storm UI supprots REST API
The Storm UI daemon provides a REST API that allows you to interact with a Storm cluster, which includes retrieving metrics data and configuration information as well as management operations such as starting or stopping topologies.
The API base URL would thus be:
http://<ui-host>:<ui-port>/api/v1/…

15. What happens when a worker dies in Apache Storm?
When a worker dies, the supervisor will restart it. If it continuously fails on startup and is unable to heartbeat to Nimbus, Nimbus will reschedule the worker.

Post Views: 107

Related Posts

Installing Apache Spark standalone on Linux
Installing Spark on a Linux machine can be done in a few steps. The following…

Learn how to connect Hive with Apache Spark.
HiveContext is a Spark SQL module that allows you to work with Hive data in…

When you should not use Apache Spark ? Explain with reason.
There are a few situations where it may not be appropriate to use Apache Spark,…

Apache PIG interview questions
1. What is pig? Pig is a Apache open soucre project which run on top…

How do you break a lineage in Apache Spark ? Why we need to break a lineage in Apache Spark ?
In Apache Spark, a lineage refers to the series of RDD (Resilient Distributed Dataset) operations…

AWS Glue interview questions
For Spark please visit (1) Spark Interview Questions (2) Spark Examples (3) PySpark Blogs 1.…

Algorithm interview questions
1. What is Insertion sort ? Insertion sort takes elements of the array sequentially and…

Digital Electronics interview questions
1. What is a Multiplexer ? Multiplexer, also known as a data selector, is a…

Operating system interview questions
1. What is the main purpose of an operating system ? Three main functions: a.…

Compiler interview questions
1. What is an interpreter ? An interpreter is a program that appears to execute…

Pages: 1 2 3 4 5

Share: Twitter Facebook Pinterest Reddit VK Digg Linkedin Mix
Tagged interview_qa, software_engineering, Technical

Author: user

Website

Related Articles

Data Structure interview questions

Compiler interview questions

Apache PIG interview questions

Operating system interview questions

Computer Organization interview questions

Database management system – DBMS

Cobol interview questions

Amazon RDS interview questions

Post navigation

Apache PIG interview questions →
← Amazon API Gateway interview questions

Leave a Reply Cancel reply
You must be logged in to post a comment.

Search for:
Trending
DBT
Python
Numpy
PySpark
Hive
Snowflake
Redshift
Airflow
Aptitude

Recent Posts

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Impact of Shard Count Modification on AWS Kinesis Streams

Featured Posts – Slider Widget

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Impact of Shard Count Modification on AWS Kinesis Streams

How to map values of a Series according to an input correspondence:SSeries.map()

Understanding Series.transform(func[, axis])

Series.aggregate(func) : Pandas API on Spark

Series.agg(func) : Pandas API on Spark

Security Features of Snowflake

Related Posts

Installing Apache Spark standalone on Linux
Installing Spark on a Linux machine can be done in a few steps. The following…

Learn how to connect Hive with Apache Spark.
HiveContext is a Spark SQL module that allows you to work with Hive data in…

When you should not use Apache Spark ? Explain with reason.
There are a few situations where it may not be appropriate to use Apache Spark,…

Apache PIG interview questions
1. What is pig? Pig is a Apache open soucre project which run on top…

How do you break a lineage in Apache Spark ? Why we need to break a lineage in Apache Spark ?
In Apache Spark, a lineage refers to the series of RDD (Resilient Distributed Dataset) operations…

AWS Glue interview questions
For Spark please visit (1) Spark Interview Questions (2) Spark Examples (3) PySpark Blogs 1.…

Algorithm interview questions
1. What is Insertion sort ? Insertion sort takes elements of the array sequentially and…

Digital Electronics interview questions
1. What is a Multiplexer ? Multiplexer, also known as a data selector, is a…

Operating system interview questions
1. What is the main purpose of an operating system ? Three main functions: a.…

Compiler interview questions
1. What is an interpreter ? An interpreter is a program that appears to execute…

Most Viewed Posts

dbt (data build tool) interview questions

Python throwing as NameError: name ‘__file__’ is not defined – Solution

DBT command not found after intalling DBT-How to resolve.

BigQuery : Handle missing or null values in BigQuery

Airflow dags not getting refreshed/updating. How to do it manually?

How to delete a partition data as well from Hive external table on DROP command?

PySpark – groupby with aggregation (count, sum, mean, min, max)

Copyright © 2024 Freshers.in