Airflow : Triggering Python or Shell Scripts on a Remote Server Using Airflow

Apache Airflow

In this article, we will provide a detailed, step-by-step guide on how to trigger a Python or shell script on another server using Airflow, including the necessary access rights, steps to perform, and ways to retrieve results from the other system to Airflow logs.

Prerequisites:

  1. Access to both Airflow and the remote server: You should have administrative access to the Airflow server and appropriate permissions to run scripts on the remote server.
  2. SSH access: You should have SSH access to the remote server from the Airflow server. You need to be able to SSH into the server without requiring a password prompt, usually configured via SSH key pairs.
  3. Airflow SSH Hook: Airflow’s SSH Hook allows Airflow to connect to remote servers via SSH and run commands. This must be properly set up.

Step-by-step Procedure:

1. Setting up SSH Key Pair:

The first step is to set up passwordless SSH from the Airflow server to the remote server.

a. Generate an SSH key pair on the Airflow server. You can use the following command:

ssh-keygen -t rsa -b 4096 -C "training@freshers.in"

b. Copy the public key to the remote server using the command:

ssh-copy-id -i ~/.ssh/id_rsa.pub myid@remote_host_ip

2. Configuring Airflow SSH Hook:

a. In the Airflow Web UI, navigate to Admin > Connections.

b. Click on the “Create” button to create a new connection.

c. Fill in the details:

  • Conn Id: Enter a name for this connection.
  • Conn Type: Select SSH.
  • Host: Enter the hostname or IP address of the remote server.
  • Username: Enter the username that you will use for the SSH connection.
  • Password: Leave this blank since we’re using SSH keys.
  • Port: Enter the SSH port, usually 22.
  • Extra: Enter your private key, or the path to your private key here.

3. Creating an Airflow DAG to Run the Remote Script:

You can now create a DAG that will SSH into the remote server and run the script.

a. First, import the necessary modules in your Python DAG file:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

b. Define your default arguments and instantiate your DAG:

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2023, 7, 17),
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

dag = DAG('remote_script', default_args=default_args, schedule_interval=timedelta(1))

c. Create a task to run your remote script. Replace “ssh_connection” with the connection id you set up in the SSH Hook.

t1 = BashOperator(
    task_id='viewership_job_script',
    bash_command='ssh ssh_connection "python /mnt/views/daily/tot_views.py"',
    dag=dag)

4. Checking the Results:

After the DAG has been executed, you can check the Airflow logs to see the results of your script.

a. In the Airflow Web UI, navigate to Browse > Task Instances.

b. Click on the task instance for the DAG and task you just ran.

c. Click on the “Log” button to view the logs for that task. The output of your remote script will be included in these logs.

Triggering Python or shell scripts on a remote server with Airflow requires administrative access, SSH setup between the Airflow and the remote server, configuration of the Airflow SSH Hook, and the creation of an appropriate Airflow DAG. With this setup, you can manage and monitor your scripts efficiently, leveraging Airflow’s robust logging and error handling capabilities.

Read more on Airflow here :

Author: user

Leave a Reply