Python Model Deployment Using TensorFlow Serving (original) (raw)

Last Updated : 27 Jan, 2022

The most important part of the machine learning pipeline is the model deployment. Model Deployment means Deployment is the method by which you integrate a machine learning model into an existing production environment to allow it to use for practical purposes in real-time.

There are many ways to deploy a model. One way is to integrate a model with Django/Flask application with a script that takes input, load the model, and generates results. So, we can easily pass image data to the model and display results after the model generates the output. Below is the limitation of the above method:

The other way is to deploy a model using TensorFlow serving. Since it also provides API (in form of REST and gRPC), so it is portable and can be used in different devices by using its API. It is easy to deploy and works well even for larger models.

Advantages of TensorFlow Serving:

RESTful API:

TensorFlow Serving supports two types of client request format in the form of RESTful API.

Here, we will use predict API, the URL format for this will be:

POST http://{host}:{port}/v1/models/${MODEL_NAME}[/versions/${VERSION}|/labels/${LABEL}]:predict

and the request body contains a JSON object in the form of :

{ // (Optional) Serving signature to use. // default : 'serving-default' "signature_name": ,

// Instance : for row format (list, array etc.), inputs: for columns format. // can have any one of them "instances": |<(nested)list>| "inputs": |<(nested)list>| }

gRPC API:

To use gRPC API, we install a package call tensorflow-serving-api using pip. More details about gRPC API endpoint are provided in code.

Implementation:

Code:

python3 `

General import

!pip install -Uq grpcio==1.26.0 import numpy as np import matplotlib.pyplot as plt import os import subprocess import requests import json

TensorFlow Imports

from tensorflow import keras from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten,Dense, Dropout from tensorflow.keras.models import Sequential,save_model from tensorflow.keras.optimizers import SGD from tensorflow.keras.utils import to_categorical from tensorflow.keras.datasets import cifar10

class_names =["airplane","automobile","bird","cat","deer","dog", "frog","horse", "ship","truck"]

load and preprocessdataset

def load_and_preprocess(): (x_train, y_train), (x_test,y_test) = cifar_10.load_data() y_train = to_categorical(y_train) y_test = to_categorical(y_test) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train = x_train/255 x_test = x_test/255 return (x_train, y_train), (x_test,y_test)

define model architecture

def get_model(): model = Sequential([ Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)), Conv2D(32, (3, 3), activation='relu', padding='same'), MaxPooling2D((2, 2)), Dropout(0.2), Conv2D(64, (3, 3), activation='relu', padding='same'), Conv2D(64, (3, 3), activation='relu', padding='same'), MaxPooling2D((2, 2)), Dropout(0.2), Flatten(), Dense(64, activation='relu'), Dense(10, activation='softmax') ])

model.compile(
  optimizer=SGD(learning_rate= 0.01 , momentum=0.1), 
  loss='categorical_crossentropy',
  metrics=['accuracy']
)
model.summary()
return model

train model

model = get_model() model.fit( x_train, y_train, epochs=100, validation_data=(x_test, y_test),

`

Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param #

conv2d_4 (Conv2D) (None, 32, 32, 32) 896
_________________________________________________________________ conv2d_5 (Conv2D) (None, 32, 32, 32) 9248
_________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 16, 16, 32) 0
_________________________________________________________________ dropout_2 (Dropout) (None, 16, 16, 32) 0
_________________________________________________________________ conv2d_6 (Conv2D) (None, 16, 16, 64) 18496
_________________________________________________________________ conv2d_7 (Conv2D) (None, 16, 16, 64) 36928
_________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 8, 8, 64) 0
_________________________________________________________________ dropout_3 (Dropout) (None, 8, 8, 64) 0
_________________________________________________________________ flatten_1 (Flatten) (None, 4096) 0
_________________________________________________________________ dense_2 (Dense) (None, 64) 262208
_________________________________________________________________ dense_3 (Dense) (None, 10) 650

Total params: 328,426 Trainable params: 328,426 Non-trainable params: 0


Epoch 1/100 1563/1563 [==============================] - 7s 5ms/step - loss: 2.0344 - accuracy: 0.2537 - val_loss: 1.7737 - val_accuracy: 0.3691 Epoch 2/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.6704 - accuracy: 0.4036 - val_loss: 1.5645 - val_accuracy: 0.4289 Epoch 3/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.4688 - accuracy: 0.4723 - val_loss: 1.3854 - val_accuracy: 0.4999 Epoch 4/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.3209 - accuracy: 0.5288 - val_loss: 1.2357 - val_accuracy: 0.5540 Epoch 5/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.2046 - accuracy: 0.5699 - val_loss: 1.1413 - val_accuracy: 0.5935 Epoch 6/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.1088 - accuracy: 0.6082 - val_loss: 1.2331 - val_accuracy: 0.5572 Epoch 7/100 1563/1563 [==============================] - 7s 4ms/step - loss: 1.0248 - accuracy: 0.6373 - val_loss: 1.0139 - val_accuracy: 0.6389 Epoch 8/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.9613 - accuracy: 0.6605 - val_loss: 0.9723 - val_accuracy: 0.6577 . . . . .

Epoch 90/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0775 - accuracy: 0.9734 - val_loss: 1.3356 - val_accuracy: 0.7473 Epoch 91/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0739 - accuracy: 0.9740 - val_loss: 1.2990 - val_accuracy: 0.7681 Epoch 92/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0743 - accuracy: 0.9739 - val_loss: 1.2629 - val_accuracy: 0.7655 Epoch 93/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0740 - accuracy: 0.9743 - val_loss: 1.3276 - val_accuracy: 0.7635 Epoch 94/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0724 - accuracy: 0.9746 - val_loss: 1.3179 - val_accuracy: 0.7656 Epoch 95/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0737 - accuracy: 0.9740 - val_loss: 1.3039 - val_accuracy: 0.7677 Epoch 96/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0736 - accuracy: 0.9734 - val_loss: 1.3243 - val_accuracy: 0.7653 Epoch 97/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0704 - accuracy: 0.9756 - val_loss: 1.3264 - val_accuracy: 0.7660 Epoch 98/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0693 - accuracy: 0.9757 - val_loss: 1.3284 - val_accuracy: 0.7658 Epoch 99/100 1563/1563 [==============================] - 7s 4ms/step - loss: 0.0668 - accuracy: 0.9764 - val_loss: 1.3649 - val_accuracy: 0.7636 Epoch 100/100 1563/1563 [==============================] - 7s 5ms/step - loss: 0.0710 - accuracy: 0.9749 - val_loss: 1.3206 - val_accuracy: 0.7682 <tensorflow.python.keras.callbacks.History at 0x7f36a042e7f0>

Code:

python3 `

import tempfile

MODEL_DIR = tempfile.gettempdir() version = 1 export_path = os.path.join(MODEL_DIR, str(version)) print('export_path = {}\n'.format(export_path))

save_model( model, export_path, overwrite=True, include_optimizer=True )

print('\nSaved model:') !ls -l {export_path}

The command display input and output kayers with signature and data type

These details are required when we make gRPC API call

!saved_model_cli show --dir {export_path} --all

Create a compressed model from the savedmodel .

!tar -cz -f model.tar.gz --owner=0 --group=0 -C /tmp/1/ .

`

Code:

python3 `

Install TensorFlow Serving using Aptitude [For Debian]

!echo "deb http://storage.googleapis.com/tensorflow-serving-apt" "stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list &&
!curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add - !apt update

Install TensorFlow Server

!apt-get install tensorflow-model-server

Run TensorFlow Serving on a new thread

os.environ["MODEL_DIR"] = MODEL_DIR

%%bash --bg nohup tensorflow_model_server
--rest_api_port=8501
--model_name=cifar_10
--model_base_path="${MODEL_DIR}" >server.log 2>&1

`

Starting job # 0 in a separate thread.

Code:

python3 ``

To TensorFlow Image from Docker Hub

!docker pull tensorflow/serving

Run our model

!docker run -p 8500:8500 -p 8501:8501
--mount type=bind,source=pwd/cifar_10/,target=/models/cifar_10
-e MODEL_NAME=cifar_10 -t tensorflow/serving

``

Code:

python3 `

import json import requests import sys from PIL import Image import numpy as np

def get_rest_url(model_name, host='127.0.0.1', port='8501', task='predict', version=None): """ This function takes hostname, port, task (b/w predict and classify) and version to generate the URL path for REST API""" # Our REST URL should be http://127.0.0.1:8501/v1/models/cifar_10/predict url = "http://{host}:{port}/v1/models/{model_name}".format(host=host, port=port, model_name=model_name) if version: url += 'versions/{version}'.format(version=version) url += ':{task}'.format(task=task) return url

def get_model_prediction(model_input, model_name='cifar_10', signature_name='serving_default'): """ This function sends request to the URL and get prediction in the form of response""" url = get_rest_url(model_name) image = Image.open(model_input) # convert image to array im = np.asarray(image) # add the 4th dimension im = np.expand_dims(im, axis=0) im= im/255 print("Image shape: ",im.shape) data = json.dumps({"signature_name": "serving_default", "instances": im.tolist()}) headers = {"content-type": "application/json"} # Send the post request and get response
rv = requests.post(url, data=data, headers=headers) return rv.json()['predictions']

if name == 'main': class_names =["airplane","automobile","bird","cat","deer" ,"dog","frog","horse", "ship","truck"] print("\nGenerate REST url ...") url = get_rest_url(model_name='cifar_10') print(url)

while True:
    print("\nEnter the image path [:q for Quit]")
    if sys.version_info[0] >= 3:
        path = str(input())
    if path == ':q':
        break
    model_prediction = get_model_prediction(path)
    print("The model predicted ...")
    print(class_names[np.argmax(model_prediction)])

`

Code:

python3 `

import sys import grpc from grpc.beta import implementations import tensorflow as tf from PIL import Image import numpy as np

import prediction service functions from TF-Serving API

from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2, get_model_metadata_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc

def get_stub(host='127.0.0.1', port='8500'): channel = grpc.insecure_channel('127.0.0.1:8500') stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) return stub

def get_model_prediction(model_input, stub, model_name='cifar_10', signature_name='serving_default'): """ input => (image path, url, model_name, signature) output the results in the form of tf.array"""

image = Image.open(model_input)
im =  np.asarray(image, dtype=np.float64)
im = (im/255)
im = np.expand_dims(im, axis=0)

print("Image shape: ",im.shape)
# We will be using Prediction Task so it uses predictRequest function from predict_pb2

request = predict_pb2.PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = signature_name
#pass Image input to input layer (Here it is named 'conv2d_4_input')
request.inputs['conv2d_4_input'].CopyFrom(tf.make_tensor_proto(im, dtype = tf.float32))

response = stub.Predict.future(request, 5.0)
# get results from final layer(dense_3)
return response.result().outputs["dense_3"].float_val

def get_model_version(model_name, stub): request = get_model_metadata_pb2.GetModelMetadataRequest() request.model_spec.name = 'cifar_10' request.metadata_field.append("signature_def") response = stub.GetModelMetadata(request, 10) # signature of loaded model is available here: response.metadata['signature_def'] return response.model_spec.version.value

if name == 'main': class_names =["airplane","automobile","bird","cat","deer","dog","frog","horse", "ship","truck"] print("\nCreate RPC connection ...") stub = get_stub() while True: print("\nEnter the image path [:q for Quit]") if sys.version_info[0] <= 3: path = raw_input() if sys.version_info[0] < 3 else input() if path == ':q': break model_input = str(path) model_prediction = get_model_prediction(model_input, stub) print(" Predictiom from Model ...") print(class_names[np.argmax(model_prediction)])

`

References: