Python Model Deployment Using TensorFlow Serving (original) (raw)

Last Updated : 27 Jan, 2022

The most important part of the machine learning pipeline is the model deployment. Model Deployment means Deployment is the method by which you integrate a machine learning model into an existing production environment to allow it to use for practical purposes in real-time.

There are many ways to deploy a model. One way is to integrate a model with Django/Flask application with a script that takes input, load the model, and generates results. So, we can easily pass image data to the model and display results after the model generates the output. Below is the limitation of the above method:

Depending upon the size of the model, it will take some time to process input and generate results.
The model cannot be used in other applications. (Consider, we do not write any REST/gRPC API).
I/O processing is slow in Flask as compared to Node.
Training the model is also resource-intensive and time-consuming (Since it requires a lot of I/O and computation.

The other way is to deploy a model using TensorFlow serving. Since it also provides API (in form of REST and gRPC), so it is portable and can be used in different devices by using its API. It is easy to deploy and works well even for larger models.

Advantages of TensorFlow Serving:

Part of TensorFlow Extended (TFX) ecosystem.
Works well for large models (up to 2 GB).
Provides consistent API structures for the RESTful and gRPC client requests.
Can manage model versioning.
Used internally at Google

RESTful API:

TensorFlow Serving supports two types of client request format in the form of RESTful API.

Classify and Regress API
Predict API (For Prediction task)

Here, we will use predict API, the URL format for this will be:

POST http://{host}:{port}/v1/models/${MODEL_NAME}[/versions/${VERSION}|/labels/${LABEL}]:predict

and the request body contains a JSON object in the form of :

{ // (Optional) Serving signature to use. // default : 'serving-default' "signature_name": ,

// Instance : for row format (list, array etc.), inputs: for columns format. // can have any one of them "instances": |<(nested)list>| "inputs": |<(nested)list>| }

gRPC API:

To use gRPC API, we install a package call tensorflow-serving-api using pip. More details about gRPC API endpoint are provided in code.

Implementation:

We will demonstrate the ability of TensorFlow Serving. First, we import (or install) the necessary modules, then we will train the model on CIFAR 10 dataset to 100 epochs. For production uses we can save this file as train.py

Code:

python3 `

General import

!pip install -Uq grpcio==1.26.0 import numpy as np import matplotlib.pyplot as plt import os import subprocess import requests import json

TensorFlow Imports

from tensorflow import keras from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten,Dense, Dropout from tensorflow.keras.models import Sequential,save_model from tensorflow.keras.optimizers import SGD from tensorflow.keras.utils import to_categorical from tensorflow.keras.datasets import cifar10

class_names =["airplane","automobile","bird","cat","deer","dog", "frog","horse", "ship","truck"]

load and preprocessdataset

def load_and_preprocess(): (x_train, y_train), (x_test,y_test) = cifar_10.load_data() y_train = to_categorical(y_train) y_test = to_categorical(y_test) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train = x_train/255 x_test = x_test/255 return (x_train, y_train), (x_test,y_test)

define model architecture

def get_model(): model = Sequential([ Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)), Conv2D(32, (3, 3), activation='relu', padding='same'), MaxPooling2D((2, 2)), Dropout(0.2), Conv2D(64, (3, 3), activation='relu', padding='same'), Conv2D(64, (3, 3), activation='relu', padding='same'), MaxPooling2D((2, 2)), Dropout(0.2), Flatten(), Dense(64, activation='relu'), Dense(10, activation='softmax') ])

model.compile(
  optimizer=SGD(learning_rate= 0.01 , momentum=0.1), 
  loss='categorical_crossentropy',
  metrics=['accuracy']
)
model.summary()
return model

train model

model = get_model() model.fit( x_train, y_train, epochs=100, validation_data=(x_test, y_test),

Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param #

conv2d_4 (Conv2D) (None, 32, 32, 32) 896
_________________________________________________________________ conv2d_5 (Conv2D) (None, 32, 32, 32) 9248
_________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 16, 16, 32) 0
_________________________________________________________________ dropout_2 (Dropout) (None, 16, 16, 32) 0
_________________________________________________________________ conv2d_6 (Conv2D) (None, 16, 16, 64) 18496
_________________________________________________________________ conv2d_7 (Conv2D) (None, 16, 16, 64) 36928
_________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 8, 8, 64) 0
_________________________________________________________________ dropout_3 (Dropout) (None, 8, 8, 64) 0
_________________________________________________________________ flatten_1 (Flatten) (None, 4096) 0
_________________________________________________________________ dense_2 (Dense) (None, 64) 262208
_________________________________________________________________ dense_3 (Dense) (None, 10) 650

Total params: 328,426 Trainable params: 328,426 Non-trainable params: 0

Epoch 1/100 1563/1563 [==============================] - 7s 5ms/step - loss: 2.0344 - accuracy: 0.2537 - val_loss: 1.7737 - val_accuracy: 0.3691 Epoch 2/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.6704 - accuracy: 0.4036 - val_loss: 1.5645 - val_accuracy: 0.4289 Epoch 3/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.4688 - accuracy: 0.4723 - val_loss: 1.3854 - val_accuracy: 0.4999 Epoch 4/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.3209 - accuracy: 0.5288 - val_loss: 1.2357 - val_accuracy: 0.5540 Epoch 5/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.2046 - accuracy: 0.5699 - val_loss: 1.1413 - val_accuracy: 0.5935 Epoch 6/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.1088 - accuracy: 0.6082 - val_loss: 1.2331 - val_accuracy: 0.5572 Epoch 7/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.0248 - accuracy: 0.6373 - val_loss: 1.0139 - val_accuracy: 0.6389 Epoch 8/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.9613 - accuracy: 0.6605 - val_loss: 0.9723 - val_accuracy: 0.6577 . . . . .

Epoch 90/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0775 - accuracy: 0.9734 - val_loss: 1.3356 - val_accuracy: 0.7473 Epoch 91/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0739 - accuracy: 0.9740 - val_loss: 1.2990 - val_accuracy: 0.7681 Epoch 92/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0743 - accuracy: 0.9739 - val_loss: 1.2629 - val_accuracy: 0.7655 Epoch 93/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0740 - accuracy: 0.9743 - val_loss: 1.3276 - val_accuracy: 0.7635 Epoch 94/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0724 - accuracy: 0.9746 - val_loss: 1.3179 - val_accuracy: 0.7656 Epoch 95/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0737 - accuracy: 0.9740 - val_loss: 1.3039 - val_accuracy: 0.7677 Epoch 96/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0736 - accuracy: 0.9734 - val_loss: 1.3243 - val_accuracy: 0.7653 Epoch 97/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0704 - accuracy: 0.9756 - val_loss: 1.3264 - val_accuracy: 0.7660 Epoch 98/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0693 - accuracy: 0.9757 - val_loss: 1.3284 - val_accuracy: 0.7658 Epoch 99/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0668 - accuracy: 0.9764 - val_loss: 1.3649 - val_accuracy: 0.7636 Epoch 100/100 1563/1563 [==============================] - 7s 5ms/step - loss: 0.0710 - accuracy: 0.9749 - val_loss: 1.3206 - val_accuracy: 0.7682 <tensorflow.python.keras.callbacks.History at 0x7f36a042e7f0>

Then we save the model in the temp folder using TensorFlow save_model() and export it to Tar Gz for downloading.

Code:

python3 `

import tempfile

MODEL_DIR = tempfile.gettempdir() version = 1 export_path = os.path.join(MODEL_DIR, str(version)) print('export_path = {}\n'.format(export_path))

save_model( model, export_path, overwrite=True, include_optimizer=True )

print('\nSaved model:') !ls -l {export_path}

The command display input and output kayers with signature and data type

These details are required when we make gRPC API call

!saved_model_cli show --dir {export_path} --all

Create a compressed model from the savedmodel .

!tar -cz -f model.tar.gz --owner=0 --group=0 -C /tmp/1/ .

Now, We will host the model using TensorFlow Serving, we will demonstrate the hosting using two methods.
First, we will take advantage of colab environment and install TensorFlow Serving in that environment
Then, we will use the docker environment to host the model and use both gRPC and REST API to call the model and get predictions. Now, we will implement the first method.

Code:

python3 `

Install TensorFlow Serving using Aptitude [For Debian]

!echo "deb http://storage.googleapis.com/tensorflow-serving-apt" "stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list &&
!curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add - !apt update

Install TensorFlow Server

!apt-get install tensorflow-model-server

Run TensorFlow Serving on a new thread

os.environ["MODEL_DIR"] = MODEL_DIR

%%bash --bg nohup tensorflow_model_server
--rest_api_port=8501
--model_name=cifar_10
--model_base_path="${MODEL_DIR}" >server.log 2>&1

Starting job # 0 in a separate thread.

Now, we define the function to make REST API requests to the server. This will take random images from test dataset and send it to model (in the JSON format). The model then returns prediction in the JSON format.

Now, we will run our model on Docker Environment. It supports both CPU and GPU architecture and it is also recommended by developers at TensorFlow. For serving model, it provides a REST (def PORT: 8501) and gRPC[Remote Procedure Call] (def PORT: 8500) endpoint to communicate with models. We will host our model on docker using the following commands.

Code:

python3 ``

To TensorFlow Image from Docker Hub

!docker pull tensorflow/serving

Run our model

!docker run -p 8500:8500 -p 8501:8501
--mount type=bind,source=pwd/cifar_10/,target=/models/cifar_10
-e MODEL_NAME=cifar_10 -t tensorflow/serving

Now, we need to write a script to communicate with our model using REST and gRPC endpoints. Below is the script for REST endpoint. Save this code and run it in terminal using python. Download some images from CIFAR-10 dataset and test the results

Code:

python3 `

import json import requests import sys from PIL import Image import numpy as np

def get_rest_url(model_name, host='127.0.0.1', port='8501', task='predict', version=None): """ This function takes hostname, port, task (b/w predict and classify) and version to generate the URL path for REST API""" # Our REST URL should be http://127.0.0.1:8501/v1/models/cifar_10/predict url = "http://{host}:{port}/v1/models/{model_name}".format(host=host, port=port, model_name=model_name) if version: url += 'versions/{version}'.format(version=version) url += ':{task}'.format(task=task) return url

def get_model_prediction(model_input, model_name='cifar_10', signature_name='serving_default'): """ This function sends request to the URL and get prediction in the form of response""" url = get_rest_url(model_name) image = Image.open(model_input) # convert image to array im = np.asarray(image) # add the 4th dimension im = np.expand_dims(im, axis=0) im= im/255 print("Image shape: ",im.shape) data = json.dumps({"signature_name": "serving_default", "instances": im.tolist()}) headers = {"content-type": "application/json"} # Send the post request and get response
rv = requests.post(url, data=data, headers=headers) return rv.json()['predictions']

if name == 'main': class_names =["airplane","automobile","bird","cat","deer" ,"dog","frog","horse", "ship","truck"] print("\nGenerate REST url ...") url = get_rest_url(model_name='cifar_10') print(url)

while True:
    print("\nEnter the image path [:q for Quit]")
    if sys.version_info[0] >= 3:
        path = str(input())
    if path == ':q':
        break
    model_prediction = get_model_prediction(path)
    print("The model predicted ...")
    print(class_names[np.argmax(model_prediction)])

And the code below gRPC request. Here, it is important to get the right signature name (by default it is 'serving default') and the name of the input and output layers.

Code:

python3 `

import sys import grpc from grpc.beta import implementations import tensorflow as tf from PIL import Image import numpy as np

import prediction service functions from TF-Serving API

from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2, get_model_metadata_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc

def get_stub(host='127.0.0.1', port='8500'): channel = grpc.insecure_channel('127.0.0.1:8500') stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) return stub

def get_model_prediction(model_input, stub, model_name='cifar_10', signature_name='serving_default'): """ input => (image path, url, model_name, signature) output the results in the form of tf.array"""

image = Image.open(model_input)
im =  np.asarray(image, dtype=np.float64)
im = (im/255)
im = np.expand_dims(im, axis=0)

print("Image shape: ",im.shape)
# We will be using Prediction Task so it uses predictRequest function from predict_pb2

request = predict_pb2.PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = signature_name
#pass Image input to input layer (Here it is named 'conv2d_4_input')
request.inputs['conv2d_4_input'].CopyFrom(tf.make_tensor_proto(im, dtype = tf.float32))

response = stub.Predict.future(request, 5.0)
# get results from final layer(dense_3)
return response.result().outputs["dense_3"].float_val

def get_model_version(model_name, stub): request = get_model_metadata_pb2.GetModelMetadataRequest() request.model_spec.name = 'cifar_10' request.metadata_field.append("signature_def") response = stub.GetModelMetadata(request, 10) # signature of loaded model is available here: response.metadata['signature_def'] return response.model_spec.version.value

if name == 'main': class_names =["airplane","automobile","bird","cat","deer","dog","frog","horse", "ship","truck"] print("\nCreate RPC connection ...") stub = get_stub() while True: print("\nEnter the image path [:q for Quit]") if sys.version_info[0] <= 3: path = raw_input() if sys.version_info[0] < 3 else input() if path == ':q': break model_input = str(path) model_prediction = get_model_prediction(model_input, stub) print(" Predictiom from Model ...") print(class_names[np.argmax(model_prediction)])

References:

TensorFlow Serving Docs