Hive : How can you configure job scheduling in Hive?

Hive @ Freshers.in

To ensure that your Hive jobs run smoothly, it is important to configure job scheduling in Hive. Job scheduling allows you to specify when your jobs should run, and how they should be prioritized relative to other jobs running on the same cluster.

Configuring Job Scheduling in Hive:

Hive supports two types of job schedulers: the default FIFO scheduler, and the Fair Scheduler. The FIFO scheduler is a simple scheduler that runs jobs in the order they are submitted, while the Fair Scheduler is a more advanced scheduler that allocates resources to jobs based on their priority and the amount of resources they require.

To configure job scheduling in Hive, you can use the following steps:

Step 1: Configure the Scheduler Type

To configure the scheduler type, you need to set the following configuration property in your Hive configuration file:

hive.execution.engine=<scheduler-type>

Replace <scheduler-type> with either “mr” for the default FIFO scheduler, or “tez” for the Fair Scheduler.

Step 2: Configure the Scheduler Parameters

If you are using the Fair Scheduler, you can configure the following parameters to control how resources are allocated to jobs:

  • hive.server2.tez.default.queues: specifies the default queue that jobs are assigned to if no queue is specified.
  • hive.server2.tez.initialize.default.sessions: specifies whether the Fair Scheduler should initialize sessions automatically when a job is submitted.
  • hive.server2.tez.session.ttl: specifies the time-to-live for sessions created by the Fair Scheduler.

Step 3: Prioritize Jobs

To prioritize jobs, you can assign them a priority level using the following syntax:

SET mapred.job.priority=<priority-level>;

Replace <priority-level> with either “VERY_HIGH”, “HIGH”, “NORMAL”, or “LOW”. Jobs with a higher priority level will be allocated more resources than jobs with a lower priority level.

Step 4: Monitor Job Scheduling

To monitor job scheduling in Hive, you can use the following commands:

  • SHOW RESOURCES: displays the resources allocated to each running job.
  • SHOW QUEUE: displays the queue configuration and status.
  • SHOW RUNNING: displays information about running jobs, including their priority level.

Configuring job scheduling in Hive is essential for ensuring that your Hive jobs run smoothly and efficiently. By configuring the scheduler type and parameters, prioritizing jobs, and monitoring job scheduling, you can optimize the performance of your Hive jobs and ensure that they are allocated the resources they need to run successfully. Whether you are using the default FIFO scheduler or the advanced Fair Scheduler, taking the time to configure job scheduling in Hive can help you get the most out of your data warehousing solution.

Author: user

Leave a Reply