Using a startup script with Amazon MWAA (original) (raw)

A startup script is a shell (.sh) script that you host in your environment's Amazon S3 bucket similar to your DAGs, requirements, and plugins. Amazon MWAA runs this script during startup on every individual Apache Airflow component (worker, scheduler, and web server) before installing requirements and initializing the Apache Airflow process. Use a startup script to do the following:

The following topics describe how to configure a startup script to install Linux runtimes, set environment variables, and troubleshoot related issues using CloudWatch Logs.

Topics

Configure a startup script

To use a startup script with your existing Amazon MWAA environment, upload a .sh file to your environment's Amazon S3 bucket. Then, to associate the script with the environment, specify the following in your environment details:

To complete the steps in this section, use the following sample script. The script outputs the value assigned to MWAA_AIRFLOW_COMPONENT. This environment variable identifies each Apache Airflow component that the script runs on.

Copy the code and save it locally as startup.sh.

#!/bin/sh
​
echo "Printing Apache Airflow component"
echo $MWAA_AIRFLOW_COMPONENT

Next, upload the script to your Amazon S3 bucket.

AWS Management Console

To upload a shell script (console)
  1. Sign in to the AWS Management Console and open the Amazon S3 console athttps://console.aws.amazon.com/s3/.
  2. From the Buckets list, choose the name of the bucket associated with your environment.
  3. On the Objects tab, choose Upload.
  4. On the Upload page, drag and drop the shell script you created.
  5. Choose Upload.

The script appears in the list of Objects. Amazon S3 creates a new version ID for the file. If you update the script and upload it again using the same file name, a new version ID is assigned to the file.

AWS CLI

To create and upload a shell script (CLI)
  1. Open a new command prompt, and run the Amazon S3 ls command to list and identify the bucket associated with your environment.
$ aws s3 ls  
  1. Navigate to the folder where you saved the shell script. Use cp in a new prompt window to upload the script to your bucket. Replace your-s3-bucket with your information.
$ aws s3 cp startup.sh s3://your-s3-bucket/startup.sh  

If successful, Amazon S3 outputs the URL path to the object:
upload: ./startup.sh to s3://your-s3-bucket/startup.sh 3. Use the following command to retrieve the latest version ID for the script.

$ aws s3api list-object-versions --bucket your-s3-bucket --prefix startup --query 'Versions[?IsLatest].[VersionId]' --output text  

BbdVMmBRjtestta1EsVnbybZp1Wqh1J4

You specify this version ID when you associate the script with an environment.

Now, associate the script with your environment.

AWS Management Console

To associate the script with an environment (console)
  1. Open the Environments page on the Amazon MWAA console.
  2. Select the row for the environment you want to update, then choose Edit.
  3. On the Specify details page, for Startup script file - optional, enter the Amazon S3 URL for the script, for example: s3://`your-mwaa-bucket`/startup-sh..
  4. Choose the latest version from the drop down list, or Browse S3 to find the script.
  5. Choose Next, then proceed to the Review and save page.
  6. Review changes, then choose Save.

Environment updates can take between 10 to 30 minutes. Amazon MWAA runs the startup script as each component in your environment restarts.

AWS CLI

To associate the script with an environment (CLI)
$ aws mwaa update-environment \  
    --name your-mwaa-environment \  
    --startup-script-s3-path startup.sh \  
    --startup-script-s3-object-version BbdVMmBRjtestta1EsVnbybZp1Wqh1J4  

If successful, Amazon MWAA returns the Amazon Resource Name (ARN) for the environment:
arn:aws::airflow:us-west-2:123456789012:environment/your-mwaa-environment

Environment update can take between 10 to 30 minutes. Amazon MWAA runs the startup script as each component in your environment restarts.

Finally, retrieve log events to verify that the script is working as expected. When you activate logging for an each Apache Airflow component, Amazon MWAA creates a new log group and log stream. For more information, see Apache Airflow log types.

AWS Management Console

To check the Apache Airflow log stream (console)
  1. Open the Environments page on the Amazon MWAA console.
  2. Choose your environment.
  3. In the Monitoring pane, choose the log group for which you want to view logs, for example, Airflow scheduler log group .
  4. In the CloudWatch console, from the Log streams list, choose a stream with the following prefix: startup_script_exection_ip.
  5. On the Log events pane, you will see the output of the command printing the value for MWAA_AIRFLOW_COMPONENT. For example, for scheduler logs, you will the following:
    Printing Apache Airflow component
    scheduler
    Finished running startup script. Execution time: 0.004s.
    Running verification
    Verification completed

You can repeat the previous steps to view worker and web server logs.

Install Linux runtimes using a startup script

Use a startup script to update the operating system of an Apache Airflow component, and install additional runtime libraries to use with your workflows. For example, the following script runs yum update to update the operating system.

When running yum update in a startup script, you must exclude Python using --exclude=python* as shown in the example. For your environment to run, Amazon MWAA installs a specific version of Python compatible with your environment. Therefore, you can't update the environment's Python version using a startup script.

#!/bin/sh

echo "Updating operating system"
sudo yum update -y --exclude=python*

To install runtimes on specific Apache Airflow component, use MWAA_AIRFLOW_COMPONENT and if and fi conditional statements. This example runs a single command to install the libaio library on the scheduler and worker, but not on the web server.

Important
#!/bin/sh

if [[ "${MWAA_AIRFLOW_COMPONENT}" != "webserver" ]]
then
     sudo yum -y install libaio
fi

You can use a startup script to check the Python version.

#!/bin/sh

export PYTHON_VERSION_CHECK=`python -c 'import sys; version=sys.version_info[:3]; print("{0}.{1}.{2}".format(*version))'`
echo "Python version is $PYTHON_VERSION_CHECK"

Amazon MWAA does not support overriding the default Python version, as this may lead to incompatibilities with the installed Apache Airflow libraries.

Set environment variables using a startup script

Use startup scripts to set environment variables and modify Apache Airflow configurations. The following defines a new variable, ENVIRONMENT_STAGE. You can reference this variable in a DAG or in your custom modules.

#!/bin/sh

export ENVIRONMENT_STAGE="development"
echo "$ENVIRONMENT_STAGE"

Use startup scripts to overwrite common Apache Airflow or system variables. For example, you set LD_LIBRARY_PATH to instruct Python to look for binaries in the path you specify. This lets you provide custom binaries for your workflows usingplugins:

#!/bin/sh

export LD_LIBRARY_PATH=/usr/local/airflow/plugins/your-custom-binary

Reserved environment variables

Amazon MWAA reserves a set of critical environment variables. If you overwrite a reserved variable, Amazon MWAA restores it to its default. The following lists the reserved variables:

Unreserved environment variables

You can use a startup script to overwrite unreserved environment variables. The following lists some of these common variables: