Python Wrapper Benchmarking — seldon-core documentation (original) (raw)

Prequisites

Port forward to istio

kubectl port-forward $(kubectl get pods -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].metadata.name}') -n istio-system 8003:8080

Tests

TLDR

from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic def writetemplate(line, cell): with open(line, "w") as f: f.write(cell.format(**globals()))

VERSION = !cat ../../../version.txt VERSION = VERSION[0] VERSION

!kubectl create namespace seldon

Error from server (AlreadyExists): namespaces "seldon" already exists

!helm upgrade --install seldon-core seldon-core-operator --repo https://storage.googleapis.com/seldon-charts --version 1.9.0 --namespace seldon-system --set istio.enabled="true" --set istio.gateway="seldon-gateway.istio-system.svc.cluster.local"

Release "seldon-core" has been upgraded. Happy Helming! NAME: seldon-core LAST DEPLOYED: Thu Jul 1 14:03:55 2021 NAMESPACE: seldon-system STATUS: deployed REVISION: 2 TEST SUITE: None

Test with Predict method on Large Batch Size

The seldontest_predict has simply a predict method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.

%%writetemplate model.yaml apiVersion: machinelearning.seldon.io/v1 kind: SeldonDeployment metadata: name: seldon-model namespace: seldon spec: predictors:

!kubectl apply -f model.yaml

seldondeployment.machinelearning.seldon.io/seldon-model created

!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon

pod/seldon-model-default-0-classifier-5445bd4ccf-c2vdr condition met

Create payloads and associated vegeta configurations for

  1. ndarray
  2. tensor
  3. tftensor

We will create an array of 100,000 consecutive integers.

import json

sz = 100000 vals = list(range(sz)) valStr = f"{vals}" payload = '{"data": {"ndarray": [' + valStr + "]}}" with open("data_ndarray.json", "w") as f: f.write(payload) payload_tensor = ( '{"data":{"tensor":{"shape":[1,' + str(sz) + '],"values":' + valStr + "}}}" ) with open("data_tensor.json", "w") as f: f.write(payload_tensor)

import numpy as np import tensorflow as tf from google.protobuf import json_format

array = np.array(vals) tftensor = tf.make_tensor_proto(array) jStrTensor = json_format.MessageToJson(tftensor) jTensor = json.loads(jStrTensor) payload_tftensor = ( '{"data":{"tftensor":' + json.dumps(jTensor, separators=(",", ":")) + "}}" ) with open("data_tftensor.json", "w") as f: f.write(payload_tftensor)

import base64 import json

sample_string_bytes = payload_tensor.encode("ascii") base64_bytes = base64.b64encode(sample_string_bytes) base64_string = base64_bytes.decode("ascii") jqPayload = { "method": "POST", "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions", "body": base64_string, "header": {"Content-Type": ["application/json"]}, } with open("vegeta_tensor.json", "w") as f: f.write(json.dumps(jqPayload, separators=(",", ":"))) f.write("\n")

sample_string_bytes = payload.encode("ascii") base64_bytes = base64.b64encode(sample_string_bytes) base64_string = base64_bytes.decode("ascii") jqPayload = { "method": "POST", "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions", "body": base64_string, "header": {"Content-Type": ["application/json"]}, } with open("vegeta_ndarray.json", "w") as f: f.write(json.dumps(jqPayload, separators=(",", ":"))) f.write("\n")

sample_string_bytes = payload_tftensor.encode("ascii") base64_bytes = base64.b64encode(sample_string_bytes) base64_string = base64_bytes.decode("ascii") jqPayload = { "method": "POST", "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions", "body": base64_string, "header": {"Content-Type": ["application/json"]}, } with open("vegeta_tftensor.json", "w") as f: f.write(json.dumps(jqPayload, separators=(",", ":"))) f.write("\n")

Smoke test port-forward to check everything is working

!curl -X POST -H 'Content-Type: application/json'
-d '@./data_ndarray.json'
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

{"data":{"names":[],"ndarray":[1]},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}

!curl -X POST -H 'Content-Type: application/json'
-d '@./data_tensor.json'
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

{"data":{"names":[],"tensor":{"shape":[1],"values":[1]}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}

!curl -X POST -H 'Content-Type: application/json'
-d '@./data_tftensor.json'
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

{"data":{"names":[],"tftensor":{"dtype":"DT_INT64","int64Val":["1"],"tensorShape":{"dim":[{"size":"1"}]}}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}

Test REST

  1. ndarray
  2. tensor
  3. tftensor

This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

%%bash vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json | vegeta report -type=text

Requests [total, rate, throughput] 518, 51.76, 51.66 Duration [total, attack, wait] 10.027s, 10.008s, 19.333ms Latencies [min, mean, 50, 90, 95, 99, max] 17.337ms, 19.355ms, 19.136ms, 20.336ms, 21.214ms, 24.886ms, 27.831ms Bytes In [total, mean] 59570, 115.00 Bytes Out [total, mean] 356857970, 688915.00 Success [ratio] 100.00% Status Codes [code:count] 200:518 Error Set:

%%bash vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json | vegeta report -type=text

Requests [total, rate, throughput] 504, 50.35, 50.25 Duration [total, attack, wait] 10.03s, 10.01s, 19.353ms Latencies [min, mean, 50, 90, 95, 99, max] 17.885ms, 19.897ms, 19.616ms, 21.1ms, 22.205ms, 25.498ms, 34.99ms Bytes In [total, mean] 69048, 137.00 Bytes Out [total, mean] 347225760, 688940.00 Success [ratio] 100.00% Status Codes [code:count] 200:504 Error Set:

%%bash vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json | vegeta report -type=text

Requests [total, rate, throughput] 636, 63.55, 63.45 Duration [total, attack, wait] 10.023s, 10.008s, 14.782ms Latencies [min, mean, 50, 90, 95, 99, max] 13.646ms, 15.756ms, 15.461ms, 17.41ms, 18.729ms, 20.628ms, 23.465ms Bytes In [total, mean] 118932, 187.00 Bytes Out [total, mean] 678466356, 1066771.00 Success [ratio] 100.00% Status Codes [code:count] 200:636 Error Set:

Example results

ndarray tensor tftensor
19.8ms 19.7ms 16.2ms

Test gRPC

  1. ndarray
  2. tensor
  3. tftensor

%%bash ghz
--insecure
--proto ../../../proto/prediction.proto
--call seldon.protos.Seldon/Predict
--data-file=./data_ndarray.json
--qps=0
--cpus=1
--concurrency=1
--duration="10s"
--format summary
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}'
localhost:8003

Summary: Count: 24 Total: 10.13 s Slowest: 278.81 ms Fastest: 242.25 ms Average: 244.06 ms Requests/sec: 2.37

Response time histogram: 242.253 [1] |∎∎∎∎∎∎∎ 245.909 [2] |∎∎∎∎∎∎∎∎∎∎∎∎∎ 249.564 [4] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 253.219 [6] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 256.874 [4] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 260.530 [1] |∎∎∎∎∎∎∎ 264.185 [1] |∎∎∎∎∎∎∎ 267.840 [2] |∎∎∎∎∎∎∎∎∎∎∎∎∎ 271.496 [0] | 275.151 [1] |∎∎∎∎∎∎∎ 278.806 [1] |∎∎∎∎∎∎∎

Latency distribution: 10 % in 247.44 ms 25 % in 249.47 ms 50 % in 252.85 ms 75 % in 260.70 ms 90 % in 272.55 ms 95 % in 278.81 ms 0 % in 0 ns

Status code distribution: [OK] 23 responses [Canceled] 1 responses

Error distribution: [1] rpc error: code = Canceled desc = grpc: the client connection is closing

%%bash ghz
--insecure
--proto ../../../proto/prediction.proto
--call seldon.protos.Seldon/Predict
--data-file=./data_tensor.json
--qps=0
--cpus=1
--concurrency=1
--duration="10s"
--format summary
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}'
localhost:8003

Summary: Count: 92 Total: 10.10 s Slowest: 21.23 ms Fastest: 4.91 ms Average: 7.58 ms Requests/sec: 9.11

Response time histogram: 4.906 [1] |∎ 6.539 [55] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 8.171 [17] |∎∎∎∎∎∎∎∎∎∎∎∎ 9.804 [4] |∎∎∎ 11.436 [0] | 13.069 [4] |∎∎∎ 14.701 [3] |∎∎ 16.334 [3] |∎∎ 17.966 [0] | 19.599 [2] |∎ 21.232 [2] |∎

Latency distribution: 10 % in 5.51 ms 25 % in 5.70 ms 50 % in 6.14 ms 75 % in 7.09 ms 90 % in 14.14 ms 95 % in 18.77 ms 0 % in 0 ns

Status code distribution: [OK] 91 responses [Canceled] 1 responses

Error distribution: [1] rpc error: code = Canceled desc = grpc: the client connection is closing

%%bash ghz
--insecure
--proto ../../../proto/prediction.proto
--call seldon.protos.Seldon/Predict
--data-file=./data_tftensor.json
--qps=0
--cpus=1
--concurrency=1
--duration="10s"
--format summary
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}'
localhost:8003

Summary: Count: 425 Total: 10.04 s Slowest: 16.38 ms Fastest: 3.97 ms Average: 5.33 ms Requests/sec: 42.31

Response time histogram: 3.970 [1] | 5.211 [281] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 6.452 [91] |∎∎∎∎∎∎∎∎∎∎∎∎∎ 7.692 [25] |∎∎∎∎ 8.933 [8] |∎ 10.174 [6] |∎ 11.415 [7] |∎ 12.656 [2] | 13.896 [1] | 15.137 [1] | 16.378 [1] |

Latency distribution: 10 % in 4.34 ms 25 % in 4.54 ms 50 % in 4.89 ms 75 % in 5.52 ms 90 % in 6.79 ms 95 % in 8.30 ms 99 % in 11.71 ms

Status code distribution: [OK] 424 responses [Canceled] 1 responses

Error distribution: [1] rpc error: code = Canceled desc = grpc: the client connection is closing

Example results

ndarray tensor tftensor
253ms 8.4ms 5.5ms

Conclusions

!kubectl delete -f model.yaml

seldondeployment.machinelearning.seldon.io "seldon-model" deleted

Test Predct Raw

%%writetemplate model.yaml apiVersion: machinelearning.seldon.io/v1 kind: SeldonDeployment metadata: name: seldon-model namespace: seldon spec: predictors:

!kubectl apply -f model.yaml

seldondeployment.machinelearning.seldon.io/seldon-model created

!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon

pod/seldon-model-default-0-classifier-5dc8fbd597-kk7td condition met

Smoke test port-forward to check everything is working

!curl -X POST -H 'Content-Type: application/json'
-d '@./data_tftensor.json'
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

Test REST

  1. ndarray
  2. tensor
  3. tftensor

This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

%%bash vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json | vegeta report -type=text

Requests [total, rate, throughput] 724, 72.35, 72.25 Duration [total, attack, wait] 10.021s, 10.007s, 14.458ms Latencies [min, mean, 50, 90, 95, 99, max] 12.228ms, 13.838ms, 13.683ms, 14.641ms, 15.489ms, 17.888ms, 22.263ms Bytes In [total, mean] 2896, 4.00 Bytes Out [total, mean] 498774460, 688915.00 Success [ratio] 100.00% Status Codes [code:count] 200:724 Error Set:

%%bash vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json | vegeta report -type=text

Requests [total, rate, throughput] 724, 72.32, 72.22 Duration [total, attack, wait] 10.025s, 10.011s, 14.307ms Latencies [min, mean, 50, 90, 95, 99, max] 12.362ms, 13.844ms, 13.701ms, 14.655ms, 15.493ms, 17.976ms, 18.802ms Bytes In [total, mean] 2896, 4.00 Bytes Out [total, mean] 498792560, 688940.00 Success [ratio] 100.00% Status Codes [code:count] 200:724 Error Set:

%%bash vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json | vegeta report -type=text

Requests [total, rate, throughput] 901, 90.04, 89.93 Duration [total, attack, wait] 10.018s, 10.007s, 11.64ms Latencies [min, mean, 50, 90, 95, 99, max] 8.955ms, 11.116ms, 10.994ms, 12.099ms, 12.721ms, 15.208ms, 19.918ms Bytes In [total, mean] 3604, 4.00 Bytes Out [total, mean] 961160671, 1066771.00 Success [ratio] 100.00% Status Codes [code:count] 200:901 Error Set:

Example results

ndarray tensor tftensor
13.3ms 13.3ms 11.1ms

Test gRPC

  1. ndarray
  2. tensor
  3. tftensor

%%bash ghz
--insecure
--proto ../../../proto/prediction.proto
--call seldon.protos.Seldon/Predict
--data-file=./data_ndarray.json
--qps=0
--cpus=1
--concurrency=1
--duration="10s"
--format summary
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}'
localhost:8003

Summary: Count: 44 Total: 10.04 s Slowest: 69.07 ms Fastest: 44.44 ms Average: 46.03 ms Requests/sec: 4.38

Response time histogram: 44.440 [1] |∎ 46.904 [31] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 49.367 [6] |∎∎∎∎∎∎∎∎ 51.831 [2] |∎∎∎ 54.294 [2] |∎∎∎ 56.758 [0] | 59.221 [0] | 61.684 [0] | 64.148 [0] | 66.611 [0] | 69.075 [1] |∎

Latency distribution: 10 % in 45.05 ms 25 % in 45.40 ms 50 % in 46.30 ms 75 % in 47.34 ms 90 % in 50.16 ms 95 % in 53.38 ms 0 % in 0 ns

Status code distribution: [OK] 43 responses [Canceled] 1 responses

Error distribution: [1] rpc error: code = Canceled desc = grpc: the client connection is closing

%%bash ghz
--insecure
--proto ../../../proto/prediction.proto
--call seldon.protos.Seldon/Predict
--data-file=./data_tensor.json
--qps=0
--cpus=1
--concurrency=1
--duration="10s"
--format summary
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}'
localhost:8003

Summary: Count: 92 Total: 10.10 s Slowest: 19.81 ms Fastest: 4.93 ms Average: 7.91 ms Requests/sec: 9.11

Response time histogram: 4.932 [1] |∎ 6.419 [53] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 7.907 [12] |∎∎∎∎∎∎∎∎∎ 9.395 [5] |∎∎∎∎ 10.882 [4] |∎∎∎ 12.370 [1] |∎ 13.858 [3] |∎∎ 15.346 [3] |∎∎ 16.833 [2] |∎∎ 18.321 [3] |∎∎ 19.809 [4] |∎∎∎

Latency distribution: 10 % in 5.21 ms 25 % in 5.68 ms 50 % in 6.04 ms 75 % in 8.27 ms 90 % in 15.77 ms 95 % in 19.04 ms 0 % in 0 ns

Status code distribution: [OK] 91 responses [Canceled] 1 responses

Error distribution: [1] rpc error: code = Canceled desc = grpc: the client connection is closing

%%bash ghz
--insecure
--proto ../../../proto/prediction.proto
--call seldon.protos.Seldon/Predict
--data-file=./data_tftensor.json
--qps=0
--cpus=1
--concurrency=1
--duration="10s"
--format summary
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}'
localhost:8003

Summary: Count: 426 Total: 10.03 s Slowest: 11.74 ms Fastest: 3.67 ms Average: 5.02 ms Requests/sec: 42.48

Response time histogram: 3.668 [1] | 4.475 [174] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 5.282 [141] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 6.089 [43] |∎∎∎∎∎∎∎∎∎∎ 6.897 [30] |∎∎∎∎∎∎∎ 7.704 [16] |∎∎∎∎ 8.511 [6] |∎ 9.318 [8] |∎∎ 10.126 [2] | 10.933 [1] | 11.740 [3] |∎

Latency distribution: 10 % in 4.08 ms 25 % in 4.27 ms 50 % in 4.61 ms 75 % in 5.30 ms 90 % in 6.62 ms 95 % in 7.66 ms 99 % in 10.26 ms

Status code distribution: [OK] 425 responses [Canceled] 1 responses

Error distribution: [1] rpc error: code = Canceled desc = grpc: the client connection is closing

Example results

ndarray tensor tftensor
46ms 7.9ms 5.0ms

Conclusions

Test with Predict method on Small Batch Size

The seldontest_predict has simply a predict method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.

%%writetemplate model.yaml apiVersion: machinelearning.seldon.io/v1 kind: SeldonDeployment metadata: name: seldon-model namespace: seldon spec: predictors:

!kubectl apply -f model.yaml

seldondeployment.machinelearning.seldon.io/seldon-model configured

!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon

pod/seldon-model-default-0-classifier-5445bd4ccf-bgkcm condition met

Create payloads and associated vegeta configurations for

  1. ndarray
  2. tensor
  3. tftensor

We will create an array of 100,000 consecutive integers.

import json

sz = 1 vals = list(range(sz)) valStr = f"{vals}" payload = '{"data": {"ndarray": [' + valStr + "]}}" with open("data_ndarray.json", "w") as f: f.write(payload) payload_tensor = ( '{"data":{"tensor":{"shape":[1,' + str(sz) + '],"values":' + valStr + "}}}" ) with open("data_tensor.json", "w") as f: f.write(payload_tensor)

import numpy as np import tensorflow as tf from google.protobuf import json_format

array = np.array(vals) tftensor = tf.make_tensor_proto(array) jStrTensor = json_format.MessageToJson(tftensor) jTensor = json.loads(jStrTensor) payload_tftensor = ( '{"data":{"tftensor":' + json.dumps(jTensor, separators=(",", ":")) + "}}" ) with open("data_tftensor.json", "w") as f: f.write(payload_tftensor)

import base64 import json

sample_string_bytes = payload_tensor.encode("ascii") base64_bytes = base64.b64encode(sample_string_bytes) base64_string = base64_bytes.decode("ascii") jqPayload = { "method": "POST", "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions", "body": base64_string, "header": {"Content-Type": ["application/json"]}, } with open("vegeta_tensor.json", "w") as f: f.write(json.dumps(jqPayload, separators=(",", ":"))) f.write("\n")

sample_string_bytes = payload.encode("ascii") base64_bytes = base64.b64encode(sample_string_bytes) base64_string = base64_bytes.decode("ascii") jqPayload = { "method": "POST", "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions", "body": base64_string, "header": {"Content-Type": ["application/json"]}, } with open("vegeta_ndarray.json", "w") as f: f.write(json.dumps(jqPayload, separators=(",", ":"))) f.write("\n")

sample_string_bytes = payload_tftensor.encode("ascii") base64_bytes = base64.b64encode(sample_string_bytes) base64_string = base64_bytes.decode("ascii") jqPayload = { "method": "POST", "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions", "body": base64_string, "header": {"Content-Type": ["application/json"]}, } with open("vegeta_tftensor.json", "w") as f: f.write(json.dumps(jqPayload, separators=(",", ":"))) f.write("\n")

Smoke test port-forward to check everything is working

!curl -X POST -H 'Content-Type: application/json'
-d '@./data_tensor.json'
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

{"data":{"names":[],"tensor":{"shape":[1],"values":[1]}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}

Test REST

  1. ndarray
  2. tensor
  3. tftensor

This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

%%bash vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json | vegeta report -type=text

Requests [total, rate, throughput] 5538, 553.80, 553.67 Duration [total, attack, wait] 10.002s, 10s, 2.364ms Latencies [min, mean, 50, 90, 95, 99, max] 1.569ms, 1.804ms, 1.739ms, 1.984ms, 2.198ms, 2.861ms, 6.62ms Bytes In [total, mean] 636870, 115.00 Bytes Out [total, mean] 155064, 28.00 Success [ratio] 100.00% Status Codes [code:count] 200:5538 Error Set:

%%bash vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json | vegeta report -type=text

Requests [total, rate, throughput] 5557, 555.65, 555.55 Duration [total, attack, wait] 10.003s, 10.001s, 1.753ms Latencies [min, mean, 50, 90, 95, 99, max] 1.578ms, 1.798ms, 1.74ms, 1.925ms, 2.119ms, 2.981ms, 5.968ms Bytes In [total, mean] 761309, 137.00 Bytes Out [total, mean] 266736, 48.00 Success [ratio] 100.00% Status Codes [code:count] 200:5557 Error Set:

%%bash vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json | vegeta report -type=text

Requests [total, rate, throughput] 4548, 454.75, 454.65 Duration [total, attack, wait] 10.003s, 10.001s, 2.141ms Latencies [min, mean, 50, 90, 95, 99, max] 1.937ms, 2.197ms, 2.138ms, 2.351ms, 2.482ms, 3.215ms, 9.424ms Bytes In [total, mean] 850476, 187.00 Bytes Out [total, mean] 436608, 96.00 Success [ratio] 100.00% Status Codes [code:count] 200:4548 Error Set:

Example results

ndarray tensor tftensor
1.8ms 1.8ms 2.1ms

Test gRPC

  1. ndarray
  2. tensor
  3. tftensor

%%bash ghz
--insecure
--proto ../../../proto/prediction.proto
--call seldon.protos.Seldon/Predict
--data-file=./data_ndarray.json
--qps=0
--cpus=1
--concurrency=1
--duration="10s"
--format summary
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}'
localhost:8003

Summary: Count: 6506 Total: 10.01 s Slowest: 18.58 ms Fastest: 1.26 ms Average: 1.46 ms Requests/sec: 650.23

Response time histogram: 1.260 [1] | 2.992 [6465] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 4.724 [30] | 6.456 [5] | 8.187 [2] | 9.919 [1] | 11.651 [0] | 13.382 [0] | 15.114 [0] | 16.846 [0] | 18.578 [1] |

Latency distribution: 10 % in 1.33 ms 25 % in 1.36 ms 50 % in 1.39 ms 75 % in 1.45 ms 90 % in 1.58 ms 95 % in 1.79 ms 99 % in 2.50 ms

Status code distribution: [OK] 6505 responses [Unavailable] 1 responses

Error distribution: [1] rpc error: code = Unavailable desc = transport is closing

%%bash ghz
--insecure
--proto ../../../proto/prediction.proto
--call seldon.protos.Seldon/Predict
--data-file=./data_tensor.json
--qps=0
--cpus=1
--concurrency=1
--duration="10s"
--format summary
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}'
localhost:8003

Summary: Count: 6429 Total: 10.01 s Slowest: 16.30 ms Fastest: 1.29 ms Average: 1.49 ms Requests/sec: 642.56

Response time histogram: 1.287 [1] | 2.789 [6375] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 4.290 [36] | 5.792 [11] | 7.293 [2] | 8.795 [1] | 10.296 [0] | 11.798 [1] | 13.299 [0] | 14.801 [0] | 16.303 [1] |

Latency distribution: 10 % in 1.36 ms 25 % in 1.38 ms 50 % in 1.42 ms 75 % in 1.48 ms 90 % in 1.60 ms 95 % in 1.80 ms 99 % in 2.67 ms

Status code distribution: [OK] 6428 responses [Unavailable] 1 responses

Error distribution: [1] rpc error: code = Unavailable desc = transport is closing

%%bash ghz
--insecure
--proto ../../../proto/prediction.proto
--call seldon.protos.Seldon/Predict
--data-file=./data_tftensor.json
--qps=0
--cpus=1
--concurrency=1
--duration="10s"
--format summary
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}'
localhost:8003

Summary: Count: 6066 Total: 10.01 s Slowest: 9.38 ms Fastest: 1.39 ms Average: 1.57 ms Requests/sec: 606.20

Response time histogram: 1.387 [1] | 2.187 [5945] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 2.986 [84] |∎ 3.785 [20] | 4.585 [7] | 5.384 [2] | 6.183 [4] | 6.983 [0] | 7.782 [0] | 8.582 [1] | 9.381 [1] |

Latency distribution: 10 % in 1.46 ms 25 % in 1.48 ms 50 % in 1.52 ms 75 % in 1.57 ms 90 % in 1.66 ms 95 % in 1.81 ms 99 % in 2.61 ms

Status code distribution: [OK] 6065 responses [Unavailable] 1 responses

Error distribution: [1] rpc error: code = Unavailable desc = transport is closing

Example results

ndarray tensor tftensor
1.46ms 1.49ms 1.57ms

Conclusions

!kubectl delete -f model.yaml

seldondeployment.machinelearning.seldon.io "seldon-model" deleted