Simplifying Offline Python Deployments With Docker (original) (raw)

In cases when a production server does not have access to the Internet or to the internal network, you will need to bundle up the Python dependencies (as wheel files) and interpreter along with the source code.

This post looks at how to package up a Python project for distribution internally on a machine cut off from the Internet using Docker.

Objectives

By the end of this post, you will be able to…

  1. Describe the difference between a Python wheel and egg
  2. Explain why you may want to build Python wheel files within a Docker container
  3. Spin up a custom environment for building Python wheels using Docker
  4. Bundle and deploy a Python project to an environment without access to the Internet
  5. Explain how this deployment setup can be considered immutable

Scenario

The genesis for this post came from a scenario where I had to distribute a legacy Python 2.7 Flask app to a Centos 5 box that did not have access to the Internet due to security reasons.

Python wheels (rather than eggs) are the way to go here.

Python wheel files are similar to eggs in that they are both just zip archives used for distributing code. Wheels differ in that they are installable but not executable. They are also pre-compiled, which saves the user from having to build the packages themselves; and, thus, speeds up the installation process. Think of them as lighter, pre-compiled versions of Python eggs. They’re particularly great for packages that need to be compiled, like lxml or NumPy.

For more on Python wheels, check out Python on Wheels and The Story of Wheel.

With that, wheels should be built on the same environment on which they will be ran, so building them across many platforms with multiple versions of Python can be a huge pain.

This is where Docker comes into play.

Bundle

Before beginning, it’s important to note that we will be using Docker simply to spin up an environment for building the wheels. In other words, we’ll be using Docker as a build tool rather than as a deploy environment.

Also, keep in mind that this process is not just for legacy apps - it can be used for any Python application.

Stack:

Want a challenge? Replace one of the pieces from the above stack. Use Python 3.6 or perhaps a different version of Centos, for example.

If you’d like to follow along, clone down the base repo:

Again, we need to bundle the application code along with the Python interpreter and dependency wheel files. cd into the “deploy” directory and then run:

Review the deploy/build_tarball.sh script, taking note of the code comments:

Here, we:

  1. Created a temporary working directory
  2. Copied over the application files to that directory, removing any .pyc and .DS_Store files
  3. Built (using Docker) and copied over the wheel files
  4. Added the Python interpreter
  5. Created a tarball, ready for deployment

Then, take note of the Dockerfile within the “wheels” directory:

After extending from the base Centos 5.11 image, we configured a Python 2.7.14 environment, and then generated the wheel files based on the list of dependencies found in the requirements file.

Here’s a quick video in case you missed any of that:

With that, let’s configure a server for deployment.

Environment Setup

We will be downloading and installing dependencies through the network in this section. Assume that you normally will not need to set up the server itself; it should already be pre-configured.

Since the wheels were built on a Centos 5.11 environment, they should work on nearly any Linux environment. So, again, if you’d like to follow along, spin up a Digital Ocean droplet with the latest version of Centos.

Review PEP 513 for more information on building broadly compatible Linux wheels (manylinux1).

SSH into the box, as a root user, and add the dependencies necessary for installing Python before continuing with this tutorial:

Next, install and then run Nginx:

Navigate to the server’s IP address in your browser. You should see the default Nginx test page.

Next, update the Nginx config in /etc/nginx/conf.d/default.conf to redirect traffic:

Restart Nginx:

You should now see a 502 error in the browser.

Create a regular user on the box:

Exit the environment when done.

Deploy

To deploy, first manually secure copy over the tarball along with with the setup script, setup.sh, to the remote box:

Take a quick look at the setup script:

This should be fairly straightforward: This script simply sets up a new Python environment and installs the dependencies within a new virtual environment.

SSH into the box and run the setup script:

This will take a few minutes. Once done, cd into the app directory and activate the virtual environment:

Run the tests:

Once complete, fire up gunicorn as a daemon:

Feel free to use a process manager, like Supervisor, to manage gunicorn.

Again, check out the video to see the script in action!

Conclusion

In this article we looked at how to package up a Python project with Docker and Python wheels for deployment on a machine cut off from the Internet.

With this setup, since we’re packaging the code, dependencies, and interpreter up, our deployments are considered immutable. For each new deploy, we’ll spin up a new environment and test to ensure it’s working before bringing down the old environment. This will eliminate any errors or issues that could arise from continuing to deploy on top of legacy code. Plus, if you uncover issues with the new deploy you can easily rollback.

Looking for some challenges?

  1. At this point, the Dockerfile and each of the scripts are tied to a Python 2.7.14 environment on Centos 5.11. What if you also had to deploy a Python 3.6.1 version to a different version of Centos? Think about how you could automate this process given a configuration file.
    For example:
    Alternatively, check out the cibuildwheel project, for managing the building of wheel files.
  2. You probably only need to bundle the Python interpreter for the first deploy. Update the build_tarball.sh script so that it asks the user whether Python is needed before bundling it.
  3. How about logs? Logging could be handled either locally or at the system-level. If locally, how would you handle log rotation? Configure this on your own.

Grab the code from the repo. Please leave comments below!