Tensorflow GPU MNIST Model with GKE — seldon-core documentation (original) (raw)

Please note: This tutorial uses Tensorflow-gpu=1.13.1, CUDA 10.0 and cuDNN 7.6

Requirements: Ubuntu 18.+ and Python 3.6

In this tutorial we will run a deep MNIST Tensorflow example with GPU.

The tutorial will be broken down into the following sections:

  1. Install all dependencies to run Tensorflow-GPU
    1.1 Installing CUDA 10.0
    1.2 Installing cuDNN 7.6
    1.3 Configure CUDA and cuDNN
    1.4 Install Tensorflow GPU
  2. Train the MNIST model locally
  3. Push the Image to your proejcts Container Registry
  4. Deploy the model on GKE using Seldon Core

Local Testing Environment

For the development of this example a GCE Virtual Machine was used to allow access to a GPU. The configuration for this VM is as follows:

1) Installing all dependencies to run Tensorflow-GPU

Check Nvidia drivers >= 3.0

1.1) Install CUDA 10.0

!wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux

! chmod +x cuda_10.0.130_410.48_linux ! ./cuda_10.0.130_410.48_linux --extract=$HOME

From the terminal, run the following command

$ sudo ./cuda-linux.10.0.130-24817639.run

Hold ‘d’ to scroll to the bottom of the license agreement.

Accept the licencing agreement and all of the default settings.

$ sudo ./cuda-samples.10.0.130-24817639-linux.run

Again, accept the agreement and all default settings

$ sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"

$ sudo vim /etc/environment

Add ‘:/usr/local/cuda/bin’ to the end of the PATH (inside quotes)

$ cd /usr/local/cuda-10.0/samples

$ sudo make

If run into an error involving the GCC version:

$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-6 10

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 10

And run again, otherwise, skip this step.

%%bash

cd /usr/local/cuda/samples/bin/x86_64/linux/release ./deviceQuery

Remember to clean up by removing all of the downloaded runtime packages

1.2) Install cuDNN 7.6

You will have to create a Nvidia account for this and go to the archive section of the cuDNN downloads

Ensure you download all 3 files: - Runtime - Developer - Code Samples

Unpackage the three files in this order

%%bash sudo dpkg -i ~/libcudnn7_7.6.0.64-1+cuda10.0_amd64.deb sudo dpkg -i ~/libcudnn7-dev_7.6.0.64-1+cuda10.0_amd64.deb sudo dpkg -i ~/libcudnn7-doc_7.6.0.64-1+cuda10.0_amd64.deb

From the download folder. Copy the files to somewhere with write access:

! cp -r /usr/src/cudnn_samples_v7/ ~

Go to the MNIST example code, compile and run it

%%bash cd ~/cudnn_samples_v7/mnistCUDNN sudo make sudo ./mnistCUDNN

Remember to clean up by removing all of the downloaded runtime packages

1.3) Configure CUDA and cuDNN

Add LD_LIBRARY_PATH in your .bashrc file:

Add the following line in the end or your .bashrc file export export:

LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

And source it with:

1.4) Install tensorflow with GPU

Require v=1.13.1 as with CUDA 10.0

! pip3 install --upgrade tensorflow-gpu==1.13.1

import tensorflow as tf

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

2) Train the MNIST model locally

Dependencies

Train locally

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) import tensorflow as tf

if name == "main":

x = tf.placeholder(tf.float32, [None, 784], name="x")

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b, name="y")

y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.reduce_mean(
    -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])
)

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

saver = tf.train.Saver()

saver.save(sess, "model/deep_mnist_model")

Wrap model using s2i

!s2i build . seldonio/seldon-core-s2i-python3-tf-gpu:0.1 deep-mnist-gpu:0.1

!docker run --name "mnist_predictor" -d --rm -p 5000:5000 deep-mnist-gpu:0.1

Send some random features that conform to the contract

!seldon-core-tester contract.json 0.0.0.0 5000 -p

!docker rm mnist_predictor --force

3) Push the image to Google Container Registry

Configure access to container registry (follow the configuration to link to your own project).

$ gcloud auth configure-docker

Tag Image with your project’s registry path (Edit the command below)

!docker tag deep-mnist-gpu:0.1 gcr.io//deep-mnist-gpu:0.1

Push the Image to the Container Registry (Again edit command below)

!docker push gcr.io//deep-mnist-gpu:0.1

4) Deploy in GKE

Spin up a GKE Cluster

For this example only one node is needed within the cluster. The cluster should have the following config:

Leave the rest of the config as default.

Connect to your cluster and check the context.

!gcloud config set project !gcloud container clusters get-credentials !kubectl config current-context

Installing NVIDIA GPU device drivers

(The below command is for the Ubuntu Node Image - if using a COS image, please see the Google Cloud Documentation for the correct command).

!kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml

Setup Seldon Core

Use the setup notebook to Setup Cluster with Ambassador Ingress and Install Seldon Core. Instructions also online.

Build the Seldon Graph

First lets look at the Seldon Graph Yaml file:

{ "apiVersion": "machinelearning.seldon.io/v1alpha2", "kind": "SeldonDeployment", "metadata": { "labels": { "app": "seldon" }, "name": "deep-mnist-gpu" }, "spec": { "annotations": { "project_name": "Tensorflow MNIST", "deployment_version": "v1" }, "name": "deep-mnist-gpu", "predictors": [ { "componentSpecs": [{ "spec": { "containers": [ { "image": "gcr.io//deep-mnist-gpu:0.1", "imagePullPolicy": "IfNotPresent", "name": "classifier", "resources": { "limits": { "nvidia.com/gpu": 1 } } } ], "terminationGracePeriodSeconds": 20 } }], "graph": { "children": [], "name": "classifier", "endpoint": { "type" : "REST" }, "type": "MODEL" }, "name": "single-model", "replicas": 1, "annotations": { "predictor_version" : "v1" } } ] } }

Change the image name in this file (line 24) to match the path to the image in your container registry.

Next, we are ready to build the seldon graph.

!kubectl create -f deep_mnist_gpu.json

seldondeployment.machinelearning.seldon.io/deep-mnist-gpu created

!kubectl rollout status deploy/deep-mnist-gpu-single-model-8969cc0

Error from server (NotFound): deployments.extensions "deep-mnist-gpu-single-model-8969cc0" not found

Check the deployment is running

NAME READY STATUS RESTARTS AGE ambassador-865c877494-2td9s 1/1 Running 0 101m ambassador-865c877494-2vsk2 1/1 Running 0 101m ambassador-865c877494-qzh4c 1/1 Running 0 101m deep-mnist-gpu-single-model-0588ac2-865d745b7d-kqcp9 2/2 Running 0 71m seldon-operator-controller-manager-0 1/1 Running 1 101m

Test the deployment with test data

Change the IP address to the External IP of your Ambassador deployment.

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ambassador LoadBalancer 10.76.8.138 104.197.71.69 80:30783/TCP,443:32277/TCP 101m ambassador-admins ClusterIP 10.76.12.144 8877/TCP 101m deep-mnist-gpu-deep-mnist-gpu ClusterIP 10.76.5.205 8000/TCP,5001/TCP 71m kubernetes ClusterIP 10.76.0.1 443/TCP 107m seldon-87fe3957f4554e9b5af993717a0b9327 ClusterIP 10.76.14.160 9000/TCP 71m seldon-operator-controller-manager-service ClusterIP 10.76.8.100 443/TCP 101m webhook-server-service ClusterIP 10.76.7.151 443/TCP 101m

!seldon-core-api-tester contract.json kubectl get svc ambassador -o jsonpath='{.spec.ports[0].port}'
deep-mnist-gpu --namespace default -p


SENDING NEW REQUEST:

[[0.798 0.827 0.034 0.384 0.938 0.036 0.135 0.555 0.86 0.263 0.411 0.894 0.327 0.865 0.906 0.914 0.133 0.565 0.803 0.417 0.825 0.678 0.805 0.206 0.017 0.698 0.41 0.503 0.984 0.214 0.468 0.366 0.132 0.973 0.472 0.346 0.001 0.662 0.412 0.537 0.522 0.242 0.289 0.676 0.379 0.542 0.452 0.467 0.392 1. 0.771 0.442 0.352 0.505 0.259 0.505 0.664 0.942 0.457 0.417 0.895 0.42 0.322 0.885 0.578 0.528 0.222 0.283 0.137 0.605 0.915 0.182 0.42 0.94 0.262 0.599 0.552 0.437 0.179 0.928 0.831 0.193 0.391 0.416 0.315 0.012 0.815 0.925 0.52 0.773 0.93 0.673 0.757 0.979 0.151 0.459 0.621 0.553 0.605 0.176 0.702 0.814 0.784 0.952 0.513 0.125 0.68 0.043 0.377 0.67 0.466 0.824 0.245 0.221 0.324 0.749 0.182 0.992 0.243 0.855 0.477 0.176 0.262 0.537 0.69 0.717 0.059 0.711 0.26 0.149 0.34 0.71 0.041 0.623 0.447 0.319 0.089 0.954 0.435 0.267 0.416 0.275 0.923 0.254 0.542 0.995 0.782 0.337 0.991 0.187 0.183 0.479 0.73 0.288 0.6 0.583 0.392 0.389 0.572 0.281 0.016 0.097 0.745 0.161 0.053 0.994 0.998 0.21 0.348 0.531 0.423 0.894 0.153 0.759 0.277 0.002 0.113 0.236 0.171 0.979 0.315 0.171 0.217 0.328 0.995 0.231 0.134 0.69 0.468 0.437 0.536 0.198 0.412 0.15 0.465 0.402 0.975 0.698 0.057 0.885 0.433 0.463 0.73 0.285 0.429 0.068 0.942 0.367 0.96 0.042 0.383 0.498 0.563 0.606 0.139 0.148 0.151 0.4 0.946 0.805 0.954 0.739 0.925 0.305 0.909 0.222 0.475 0.729 0.679 0.43 0.7 0.085 0.103 0.3 0.073 0.263 0.472 0.998 0.615 0.218 0.677 0.555 0.155 0.093 0.36 0.149 0.343 0.801 0.896 0.106 0.253 0.875 0.245 0.853 0.909 0.958 0.362 0.663 0.674 0.298 0.139 0.118 0.242 0.282 0.095 0.755 0.635 0.168 0.259 0.515 0.77 0.196 0.185 0.659 0.379 0.64 0.351 0.184 0.723 0.639 0.893 0.132 0.833 0.377 0.486 0.262 0.091 0.694 0.043 0.957 0.927 0.469 0.47 0.407 0.166 0.673 0.065 0.582 0.403 0.795 0.39 0.991 0.723 0.863 0.347 0.612 0.63 0.628 0.298 0.398 0.788 0.491 0.497 0.669 0.016 0.609 0.778 0.379 0.454 0.113 0.4 0.649 0.155 0.687 0.317 0.248 0.044 0.933 0.615 0.335 0.022 0.661 0.582 0.418 0.053 0.924 0.69 0.723 0.007 0.149 0.703 0.1 0.799 0.991 0.877 0.626 0.191 0.829 0.07 0.814 0.989 0.664 0.192 0.849 0.611 0.78 0.397 0.281 0.688 0.876 0.423 0.185 0.036 0.476 0.417 0.804 0.336 0.498 0.653 0.585 0.339 0.155 0.438 0.781 0.321 0.462 0.595 0.324 0.463 0.065 0.655 0.534 0.01 0.906 0.836 0.389 0.457 0.629 0.831 0.145 0.082 0.889 0.231 0.075 0.404 0.408 0.035 0.226 0.371 0.961 0.907 0.366 0.937 0.818 0.373 0.813 0.645 0.009 0.16 0.797 0.81 0.48 0.76 0.464 0.127 0.842 0.531 0.362 0.546 0.95 0.788 0.069 0.276 0.79 0.287 0.64 0.797 0.262 0.132 0.317 0.766 0.759 0.714 0.642 0.601 0.482 0.529 0.43 0.934 0.07 0.137 0.794 0.5 0.065 0.157 0.672 0.858 0.336 0.991 0.054 0.352 0.163 0.981 0.481 0.29 0.3 0.38 0.136 0.911 0.231 0.556 0.798 0.496 0.407 0.237 0.474 0.676 0.356 0.757 0.954 0.217 0.165 0.948 0.746 0.986 0.501 0.216 0.638 0.398 0.863 0.462 0.924 0.889 0.448 0.325 0.922 0.895 0.331 0.491 0.626 0.207 0.133 0.68 0.304 0.126 0.835 0.233 0.485 0.217 0.405 0.44 0.124 0.71 0.332 0.546 0.58 0.151 0.447 0.104 0.206 0.257 0.053 0.716 0.804 0.67 0.789 0.804 0.473 0.008 0.318 0.033 0.381 0.634 0.407 0.659 0.62 0.497 0.689 0.83 0.384 0.67 0.911 0.101 0.668 0.355 0.579 0.111 0.446 0.596 0.814 0.318 0.355 0.07 0.542 0.017 0.21 0.327 0.599 0.059 0.252 0.951 0.56 0.367 0.813 0.074 0.964 0.079 0.68 0.446 0.019 0.7 0.903 0.918 0.74 0.22 0.241 0.656 0.283 0.625 0.209 0.154 0.862 0.254 0.151 0.323 0.789 0.393 0.023 0.668 0.55 0.408 0.54 0.207 0.064 0.844 0.323 0.216 0.688 0.273 0.71 0.542 0.32 0.277 0.535 0.621 0.014 0.272 0.235 0.959 0.067 0.027 0.585 0.001 0.853 0.189 0.687 0.059 0.284 0.419 0.995 0.151 0.391 0.184 0.741 0.752 0.956 0.646 0.84 0.619 0.993 0.37 0.499 0.491 0.318 0.782 0.724 0.748 0.552 0.485 0.667 0.206 0.813 0.511 0.128 0.936 0.33 0.937 0.484 0.157 0.878 0.834 0.133 0.809 0.977 0.567 0.366 0.964 0.535 0.678 0.64 0.076 0.866 0.211 0.853 0.619 0.103 0.433 0.667 0.73 0.136 0.519 0.612 0.184 0.044 0.448 0.233 0.885 0.38 0.172 0.804 0.106 0.724 0.107 0.619 0.554 0.548 0.812 0.587 0.577 0.417 0.962 0.774 0.364 0.485 0.881 0.533 0.714 0.52 0.963 0.718 0.651 0.375 0.889 0.239 0.148 0.715 0.551 0.768 0.073 0.599 0.671 0.947 0.059 0.453 0.356 0.271 0.156 0.096 0.975 0.454 0.594 0.605 0.689 0.151 0.823 0.286 0.107 0.031 0.59 0.801 0.847 0.291 0.516 0.977 0.883 0.169 0.848 0.954 0.371 0.632 0.313 0.397 0.944 0.937 0.051 0.193 0.221 0.446 0.327 0.456 0.619 0.924 0.326 0.848 0.496 0.515 0.668 0.703 0.942 0.712 0.533 0.656 0.691 0.669 0.407 0.42 0.659 0.933 1. 0.244 0.566 0.613 0.747 0.896 0.236 0.355 0.338 0.243 0.069 0.416 0.684 0.923 0.392 0.654 0.523 0.38 0.319 0.327 0.522 0.985 0.01 0.316 0.938 0.907]] RECEIVED RESPONSE: meta { puid: "14k74obmqhus06jl6pai9hcg7r" requestPath { key: "classifier" value: "gcr.io/dev-joel/deep-mnist-gpu:0.1" } } data { names: "class:0" names: "class:1" names: "class:2" names: "class:3" names: "class:4" names: "class:5" names: "class:6" names: "class:7" names: "class:8" names: "class:9" ndarray { values { list_value { values { number_value: 0.0025008211378008127 } values { number_value: 7.924897005295861e-08 } values { number_value: 0.057240355759859085 } values { number_value: 0.21792393922805786 } values { number_value: 6.878228759887861e-06 } values { number_value: 0.5588285326957703 } values { number_value: 0.0005614690016955137 } values { number_value: 0.0004520844086073339 } values { number_value: 0.161981999874115 } values { number_value: 0.0005038614035584033 } } } } }

Clean up

Make sure you delete the cluster once you have finished with it to avoid any ongoing charges.

!gcloud container clusters delete