Installation of Airflow® — Airflow 3.0.2 Documentation (original) (raw)

This page describes installation options that you might use when considering how to install Airflow®. Airflow consists of many components, often distributed among many physical or virtual machines, therefore installation of Airflow might be quite complex, depending on the options you choose.

You should also check out the Prerequisites that must be fulfilled when installing Airflow as well as Supported versions to know what are the policies for the supporting Airflow, Python and Kubernetes.

Airflow requires additional Dependencies to be installed - which can be done via extras and providers.

When you install Airflow, you need to setup the database which must also be kept updated when Airflow is upgraded.

Using released sources

More details: Installing from Sources

When this option works best

Intended users

What are you expected to handle

What Apache Airflow Community provides for that method

Where to ask for help

Using PyPI

More details: Installation from PyPI

When this option works best

Intended users

What are you expected to handle

What Apache Airflow Community provides for that method

Where to ask for help

Using Production Docker Images

More details: Docker Image for Apache Airflow

When this option works best

This installation method is useful when you are familiar with Container/Docker stack. It provides a capability of running Airflow components in isolation from other software running on the same physical or virtual machines with easy maintenance of dependencies.

The images are built by Apache Airflow release managers and they use officially released packages from PyPI and official constraint files - same that are used for installing Airflow from PyPI.

Intended users

What are you expected to handle

What Apache Airflow Community provides for that method

Where to ask for help

Using Official Airflow Helm Chart

More details: Helm Chart for Apache Airflow

When this option works best

Intended users

What are you expected to handle

What Apache Airflow Community provides for that method

Where to ask for help

Using Managed Airflow Services

Follow the Ecosystem page to find all Managed Services for Airflow.

When this option works best

Intended users

What are you expected to handle

What Apache Airflow Community provides for that method

Where to ask for help

Using 3rd-party images, charts, deployments

Follow the Ecosystem page to find all 3rd-party deployment options.

When this option works best

Intended users

What are you expected to handle

What Apache Airflow Community provides for that method

Where to ask for help

Notes about minimum requirements

There are often questions about minimum requirements for Airflow for production systems, but it is not possible to give a simple answer to that question.

The requirements that Airflow might need depend on many factors, including (but not limited to):

The above “DAG” characteristics will change over time and even will change depending on the time of the day or week, so you have to be prepared to continuously monitor the system and adjust the parameters to make it works smoothly.

While we can provide some specific minimum requirements for some development “quick start” - such as in case of our Running Airflow in Docker quick-start guide, it is not possible to provide any minimum requirements for production systems.

The best way to think of resource allocation for Airflow instance is to think of it in terms of process control theory - where there are two types of systems:

  1. Fully predictable, with few knobs and variables, where you can reliably set the values for the knobs and have an easy way to determine the behaviour of the system
  2. Complex systems with multiple variables, that are hard to predict and where you need to monitor the system and adjust the knobs continuously to make sure the system is running smoothly.

Airflow (and generally any modern systems running usually on cloud services, with multiple layers responsible for resources as well multiple parameters to control their behaviour) is a complex system and it fall much more in the second category. If you decide to run Airflow in production on your own, you should be prepared for the monitor/observe/adjust feedback loop to make sure the system is running smoothly.

Having a good monitoring system that will allow you to monitor the system and adjust the parameters is a must to put that in practice.

There are a few guidelines that you can use for optimizing your resource usage as well. TheFine-tuning your Scheduler performance is a good starting point to fine-tune your scheduler, you can also follow the Best Practices guide to make sure you are using Airflow in the most efficient way.

Also, one of the important things that Managed Services for Airflow provide is that they make a lot of opinionated choices and fine-tune the system for you, so you don’t have to worry about it too much. With such managed services, there are usually far less number of knobs to turn and choices to make and one of the things you pay for is that the Managed Service provider manages the system for you and provides paid support and allows you to scale the system as needed and allocate the right resources - following the choices made there when it comes to the kinds of deployment you might have.