Initializing the Apache Airflow Database: A Comprehensive Guide

Apache Airflow

Before you can start scheduling workflows, the database needs to be initialized. In this article, we’ll explore the steps to initialize the Airflow database, the necessary access privileges, and ways to log in.

Prerequisites:

1. A working installation of Apache Airflow.

2. Database software (like PostgreSQL, MySQL) where Airflow’s metadata will reside.

3. Appropriate database drivers installed (e.g., psycopg2 for PostgreSQL).

Step-by-Step Guide to Initializing the Airflow Database:

Set up your database connection:

Before you initialize the database, configure the database connection string in the Airflow configuration file, typically located at ~/airflow/airflow.cfg.

Find the line that starts with sql_alchemy_conn and update it with your database connection string.

Example for PostgreSQL:

sql_alchemy_conn = postgresql+psycopg2://username:password@host:port/database

Assign necessary privileges:

Your database user needs certain privileges to initialize and use the Airflow metadata database. At a minimum, the user requires:

1. CREATE to make new tables.
2. DROP to drop tables if needed.
3. SELECT, INSERT, UPDATE, and DELETE for normal operations.
4. ALL PRIVILEGES on the Airflow metadata database.
Example SQL command for PostgreSQL:

GRANT ALL PRIVILEGES ON DATABASE airflow_metadata TO your_username;

Initialize the database:

Once your connection string is set and privileges assigned, run the following command to initialize the database:

airflow db init

This command sets up all the necessary tables required by Airflow.

Creating an Airflow user (optional, but recommended for UI access):

Airflow comes with a web-based UI that requires user authentication. To access the UI, you’ll need to create a user. Use the following command:

airflow users create \
    --username your_username \
    --firstname your_firstname \
    --lastname your_lastname \
    --role Admin \
    --email your_email@example.com

Login to the Airflow web UI:

By default, the Airflow web server runs on port 8080. Navigate to http://localhost:8080 in your browser. You can now login using the username and password you set in the previous step.

MySQL as your backend database

1. Database Connection Configuration:

In your Airflow configuration file (typically ~/airflow/airflow.cfg), you’ll need to set up the sql_alchemy_conn to point to your MySQL instance.

Example for MySQL:

sql_alchemy_conn = mysql+mysqlclient://username:password@host:port/database

Make sure you have the mysqlclient Python library installed. If not, install it using pip:

pip install mysqlclient

2. Assign Necessary Privileges:

Your MySQL user will need privileges similar to the PostgreSQL example, with some syntactic changes:

To grant privileges on MySQL, you can use:

GRANT ALL PRIVILEGES ON airflow_metadata.* TO 'your_username'@'localhost';

To apply the changes:

FLUSH PRIVILEGES;

3. Collation and Charset Considerations:

MySQL has multiple collations and charsets. When using Airflow with MySQL, ensure your database is set up with utf8mb4 charset and utf8mb4_unicode_ci collation to handle a wider range of characters and avoid potential issues with specific characters in task IDs, dag IDs, etc.

To create a database with these settings:

CREATE DATABASE airflow_metadata CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

4. Initialize the Database and Create a User:

The process of initializing the database with airflow db init and creating a user for the web UI remains the same as described in the PostgreSQL example.

Read more on Airflow here :

Author: user

Leave a Reply