Installing Python dependencies - Amazon Managed Workflows for Apache Airflow (original) (raw)

A Python dependency is any package or distribution that is not included in the Apache Airflow base install for your Apache Airflow version on your Amazon Managed Workflows for Apache Airflow environment. This topic describes the steps to install Apache Airflow Python dependencies on your Amazon MWAA environment using a requirements.txt file in your Amazon S3 bucket.

Contents

Prerequisites

You'll need the following before you can complete the steps on this page.

How it works

On Amazon MWAA, you install all Python dependencies by uploading a requirements.txt file to your Amazon S3 bucket, then specifying the version of the file on the Amazon MWAA console each time you update the file. Amazon MWAA runs pip3 install -r requirements.txt to install the Python dependencies on the Apache Airflow scheduler and each of the workers.

To run Python dependencies on your environment, you must do three things:

  1. Create a requirements.txt file locally.
  2. Upload the local requirements.txt to your Amazon S3 bucket.
  3. Specify the version of this file in the Requirements file field on the Amazon MWAA console.
Note

If this is the first time you're creating and uploading a requirements.txt to your Amazon S3 bucket, you also need to specify the path to the file on the Amazon MWAA console. You only need to complete this step once.

Python dependencies overview

You can install Apache Airflow extras and other Python dependencies from the Python Package Index (PyPi.org), Python wheels (.whl), or Python dependencies hosted on a private PyPi/PEP-503 Compliant Repo on your environment.

Python dependencies location and size limits

The Apache Airflow Scheduler and the Workers look for the packages in the requirements.txt file and the packages are installed on the environment at /usr/local/airflow/.local/bin.

Creating a requirements.txt file

The following steps describe the steps we recommend to create a requirements.txt file locally.

Step one: Test Python dependencies using the Amazon MWAA CLI utility

Step two: Create the requirements.txt

The following section describes how to specify Python dependencies from the Python Package Index in a requirements.txt file.

Apache Airflow v2

  1. Test locally. Add additional libraries iteratively to find the right combination of packages and their versions, before creating a requirements.txt file. To run the Amazon MWAA CLI utility, see the aws-mwaa-local-runner on GitHub.
  2. Review the Apache Airflow package extras. To view a list of the packages installed for Apache Airflow v2 on Amazon MWAA, seeAmazon MWAA local runner requirements.txt on the GitHub website.
  3. Add a constraints statement. Add the constraints file for your Apache Airflow v2 environment at the top of yourrequirements.txt file. Apache Airflow constraints files specify the provider versions available at the time of a Apache Airflow release.
    Beginning with Apache Airflow v2.7.2, your requirements file must include a --constraint statement. If you do not provide a constraint, Amazon MWAA will specify one for you to ensure the packages listed in your requirements are compatible with the version of Apache Airflow you are using.
    In the following example, replace {environment-version} with your environment's version number, and {Python-version} with the version of Python that's compatible with your environment.
    For information on the version of Python compatible with your Apache Airflow environment, see Apache Airflow Versions.
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-{Airflow-version}/constraints-{Python-version}.txt"  

If the constraints file determines that xyz==1.0 package is not compatible with other packages in your environment, pip3 install will fail in order to prevent incompatible libraries from being installed to your environment. If installation fails for any packages, you can view error logs for each Apache Airflow component (the scheduler, worker, and web server) in the corresponding log stream on CloudWatch Logs. For more information on log types, see Viewing Airflow logs in Amazon CloudWatch. 4. Apache Airflow packages. Add the package extras and the version (==). This helps to prevent packages of the same name, but different version, from being installed on your environment.

apache-airflow[package-extra]==2.5.1  
  1. Python libraries. Add the package name and the version (==) in your requirements.txt file. This helps to prevent a future breaking update from PyPi.org from being automatically applied.
library == version  
Example Boto3 and psycopg2-binary

This example is provided for demonstration purposes. The boto and psycopg2-binary libraries are included with the Apache Airflow v2 base install and don't need to be specified in a requirements.txt file.

boto3==1.17.54  
boto==2.49.0  
botocore==1.20.54  
psycopg2-binary==2.8.6  

If a package is specified without a version, Amazon MWAA installs the latest version of the package from PyPi.org. This version may conflict with other packages in your requirements.txt.

Apache Airflow v1

  1. Test locally. Add additional libraries iteratively to find the right combination of packages and their versions, before creating a requirements.txt file. To run the Amazon MWAA CLI utility, see the aws-mwaa-local-runner on GitHub.
  2. Review the Airflow package extras. Review the list of packages available for Apache Airflow v1.10.12 at https://raw.githubusercontent.com/apache/airflow/constraints-1.10.12/constraints-3.7.txt.
  3. Add the constraints file. Add the constraints file for Apache Airflow v1.10.12 to the top of your requirements.txt file. If the constraints file determines that xyz==1.0 package is not compatible with other packages on your environment, the pip3 install will fail to prevent incompatible libraries from being installed to your environment.
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-1.10.12/constraints-3.7.txt"  
  1. Apache Airflow v1.10.12 packages. Add the Airflow package extras and the Apache Airflow v1.10.12 version (==). This helps to prevent packages of the same name, but different version, from being installed on your environment.
apache-airflow[package]==1.10.12  
Example Secure Shell (SSH)

The following example requirements.txt file installs SSH for Apache Airflow v1.10.12.

apache-airflow[ssh]==1.10.12  
  1. Python libraries. Add the package name and the version (==) in your requirements.txt file. This helps to prevent a future breaking update from PyPi.org from being automatically applied.
library == version  
Example Boto3

The following example requirements.txt file installs the Boto3 library for Apache Airflow v1.10.12.

boto3 == 1.17.4  

If a package is specified without a version, Amazon MWAA installs the latest version of the package from PyPi.org. This version may conflict with other packages in your requirements.txt.

Uploading requirements.txt to Amazon S3

You can use the Amazon S3 console or the AWS Command Line Interface (AWS CLI) to upload a requirements.txt file to your Amazon S3 bucket.

Using the AWS CLI

The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command-line shell. To complete the steps on this page, you need the following:

To upload using the AWS CLI
  1. Use the following command to list all of your Amazon S3 buckets.
aws s3 ls  
  1. Use the following command to list the files and folders in the Amazon S3 bucket for your environment.
aws s3 ls s3://YOUR_S3_BUCKET_NAME  
  1. The following command uploads a requirements.txt file to an Amazon S3 bucket.
aws s3 cp requirements.txt s3://YOUR_S3_BUCKET_NAME/requirements.txt  

Using the Amazon S3 console

The Amazon S3 console is a web-based user interface that allows you to create and manage the resources in your Amazon S3 bucket.

To upload using the Amazon S3 console
  1. Open the Environments page on the Amazon MWAA console.
  2. Choose an environment.
  3. Select the S3 bucket link in the DAG code in S3 pane to open your storage bucket on the Amazon S3 console.
  4. Choose Upload.
  5. Choose Add file.
  6. Select the local copy of your requirements.txt, choose Upload.

Installing Python dependencies on your environment

This section describes how to install the dependencies you uploaded to your Amazon S3 bucket by specifying the path to the requirements.txt file, and specifying the version of the requirements.txt file each time it's updated.

Specifying the path to requirements.txt on the Amazon MWAA console (the first time)

If this is the first time you're creating and uploading a requirements.txt to your Amazon S3 bucket, you also need to specify the path to the file on the Amazon MWAA console. You only need to complete this step once.

  1. Open the Environments page on the Amazon MWAA console.
  2. Choose an environment.
  3. Choose Edit.
  4. On the DAG code in Amazon S3 pane, choose Browse S3 next to the Requirements file - optional field.
  5. Select the requirements.txt file on your Amazon S3 bucket.
  6. Choose Choose.
  7. Choose Next, Update environment.

You can begin using the new packages immediately after your environment finishes updating.

Specifying the requirements.txt version on the Amazon MWAA console

You need to specify the version of your requirements.txt file on the Amazon MWAA console each time you upload a new version of your requirements.txt in your Amazon S3 bucket.

  1. Open the Environments page on the Amazon MWAA console.
  2. Choose an environment.
  3. Choose Edit.
  4. On the DAG code in Amazon S3 pane, choose a requirements.txt version in the dropdown list.
  5. Choose Next, Update environment.

You can begin using the new packages immediately after your environment finishes updating.

Viewing logs for your requirements.txt

You can view Apache Airflow logs for the Scheduler scheduling your workflows and parsing your dags folder. The following steps describe how to open the log group for the Scheduler on the Amazon MWAA console, and view Apache Airflow logs on the CloudWatch Logs console.

To view logs for a requirements.txt
  1. Open the Environments page on the Amazon MWAA console.
  2. Choose an environment.
  3. Choose the Airflow scheduler log group on the Monitoring pane.
  4. Choose the requirements_install_ip log in Log streams.
  5. You should see the list of packages that were installed on the environment at /usr/local/airflow/.local/bin. For example:
Collecting appdirs==1.4.4 (from -r /usr/local/airflow/.local/bin (line 1))  
Downloading https://files.pythonhosted.org/packages/3b/00/2344469e2084fb28kjdsfiuyweb47389789vxbmnbjhsdgf5463acd6cf5e3db69324/appdirs-1.4.4-py2.py3-none-any.whl  
Collecting astroid==2.4.2 (from -r /usr/local/airflow/.local/bin (line 2))  
  1. Review the list of packages and whether any of these encountered an error during installation. If something went wrong, you may see an error similar to the following:
2021-03-05T14:34:42.731-07:00  
No matching distribution found for LibraryName==1.0.0 (from -r /usr/local/airflow/.local/bin (line 4))  
No matching distribution found for LibraryName==1.0.0 (from -r /usr/local/airflow/.local/bin (line 4))  

What's next?