Managed Airflow overview (original) (raw)

Managed Airflow (Gen 3) | Managed Airflow (Gen 2) | Managed Airflow (Legacy Gen 1)

This page provides a brief introduction to Airflow and DAGs, and describes the features and capabilities of Managed Airflow.

For more information about new features in Managed Airflow releases, see Release notes.

Managed Airflow is a fully managed workflow orchestration service, enabling you to create, schedule, monitor, and manage workflow pipelines that span across clouds and on-premises data centers.

Managed Airflow is built on the popularApache Airflow open source project and operates using the Python programming language.

By using Managed Airflow instead of a local instance of Apache Airflow, you can benefit from the best of Airflow with no installation or management overhead. Managed Airflow helps you create managed Airflow environments quickly and use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, so you can focus on your workflows and not your infrastructure.

Differences between Managed Airflow versions

For more information about differences between major versions of Managed Airflow, seeManaged Service for Apache Airflow versioning overview.

Airflow and Airflow DAGs (workflows)

In data analytics, a workflow represents a series of tasks for ingesting, transforming, analyzing, or utilizing data. In Airflow, workflows are created using DAGs, or "Directed Acyclic Graphs".

Relationship between DAGs and tasks

Figure 1. Relationship between DAGs and tasks

A DAG is a collection of tasks that you want to schedule and run, organized in a way that reflects their relationships and dependencies. DAGs are created in Python files, which define the DAG structure using code. The DAG's purpose is to ensure that each task is executed at the right time and in the right order.

Each task in a DAG can represent almost anything—for example, one task might perform any of the following functions:

In addition to running a DAG on a schedule, you can trigger DAGs manually or in response to events, such as changes in a Cloud Storage bucket. For more information, see Schedule and trigger DAGs.

For more information about DAGs and tasks, see theApache Airflow documentation.

Managed Airflow environments

Managed Airflow environments are self-contained Airflow deployments based on Google Kubernetes Engine. They work with other Google Cloud services using connectors built into Airflow. You cancreate one or more environments in a single Google Cloud project, in any supported region.

Managed Airflow provisions Google Cloud services that run your workflows and all Airflow components. The main components of an environment are:

For an in-depth look at the components of an environment, seeEnvironment architecture.

Managed Airflow interfaces

Managed Airflow provides interfaces for managing environments, Airflow instances that run within environments, and individual DAGs.

For example, you can create and configure Managed Airflow environments in Google Cloud console, Google Cloud CLI, Cloud Composer API, or Terraform.

As another example, you can manage DAGs from Google Cloud console, native Airflow UI, or by running Google Cloud CLI and Airflow CLI commands.

Airflow features in Managed Airflow

When using Managed Airflow, you can manage and use Airflow features such as:

Access control in Managed Airflow

You manage security at the Google Cloud project level and canassign IAM roles that allow individual users to modify or create environments. If someone does not have access to your project or does not have an appropriate Managed Airflow IAM role, that person cannot access any of your environments.

In addition to IAM, you can useAirflow UI access control, which is based on the Apache Airflow Access Control model.

For more information about security features in Managed Airflow, seeManaged Airflow security overview.

Environment networking

Managed Airflow supports several networking configurations for environments, with many configuration options. For example, in a Private IP environment, DAGs and Airflow components are fully isolated from the public internet.

For more information about networking in Managed Airflow, see pages for individual networking features:

Other features of Managed Airflow

Other Managed Airflow features include:

Frequently Asked Questions

What version of Apache Airflow does Managed Airflow use?

Managed Airflow environments are based onManaged Airflow images. When you create an environment, you can select an image with a specific Airflow version:

You have control over the Apache Airflow version of your environment. You can decide to upgrade your environment to a later version of Managed Airflow image. EachManaged Airflow release supports several Apache Airflow versions.

Can I use native Airflow UI and CLI?

You can access the Apache Airflow web interface of your environment. Each of your environments has its own Airflow UI. For more information about accessing the Airflow UI, see Airflow web interface.

To run Airflow CLI commands in your environments, use gcloud commands. For more information about running Airflow CLI commands in Managed Airflow environments, seeAirflow command-line interface.

Can I use my own database as the Airflow database?

Managed Airflow uses a managed database service for the Airflow database. It is not possible to use a user-provided database as the Airflow database.

Can I use my own cluster as a Managed Airflow cluster?

Managed Airflow uses Google Kubernetes Engine service to create, manage and delete environment clusters where Airflow components run. These clusters are fully managed by Managed Airflow.

It is not possible to build a Managed Airflow environment based on a self-managed Google Kubernetes Engine cluster.

Can I use my own container registry?

Managed Airflow uses Artifact Registry service to manage container image repositories used by Managed Airflow environments. It is not possible to replace it with a user-provided container registry.

Are Managed Airflow environments zonal or regional?

When you create an environment, you specify a region for it:

What's next