breaking: Change Model server to Torchserve for PyTorch Inference by dk19y · Pull Request #79 · aws/sagemaker-pytorch-inference-toolkit (original) (raw)

Use TorchServe in place of MMS for Pytorch Inference.

This PR depends on PR in sagemaker-inference-toolkit #58 to be merged before this can be supported. Hence existing integ tests should likely fail because of the same.

Tested with SageMaker local based on the buildspec.yaml file.

TOX_PARALLEL_NO_SPINNER=1
PY_COLORS=0
AWS_ACCESS_KEY_ID=XYZ
AWS_SECRET_ACCESS_KEY=XYZ
AWS_SESSION_TOKEN=XYZ

tox -e py36 -- test/integration/local --build-image -s

This would fail with the following error & create a docker image : sagemaker-pytorch-inference:1.5.0-cpu-py3

Attaching to tmpce8xadei_algo-1-j3bfm_1
algo-1-j3bfm_1  | Traceback (most recent call last):
algo-1-j3bfm_1  |   File "/usr/local/bin/dockerd-entrypoint.py", line 21, in <module>
algo-1-j3bfm_1  |     from sagemaker_pytorch_serving_container import serving
algo-1-j3bfm_1  |   File "/opt/conda/lib/python3.7/site-packages/sagemaker_pytorch_serving_container/serving.py", line 18, in <module>
algo-1-j3bfm_1  |     from sagemaker_inference import torchserve
algo-1-j3bfm_1  | ImportError: cannot import name 'torchserve' from 'sagemaker_inference' (/opt/conda/lib/python3.7/site-packages/sagemaker_inference/__init__.py)
tmpce8xadei_algo-1-j3bfm_1 exited with code 1
Aborting on container exit...
Exception in thread Thread-1:

Now install chanages from PR - sagemaker-inference-toolkit (#89) manualy to this container & commit it before running the test again this time without the --build-image flag.

tox -e py36 -- test/integration/local -s

ubuntu@ip-172-31-65-0:~/ts/sagemaker-pytorch-inference-toolkit$ tox -e py36 -- test/integration/local -s
GLOB sdist-make: /home/ubuntu/ts/sagemaker-pytorch-inference-toolkit/setup.py
py36 inst-nodeps: /home/ubuntu/ts/sagemaker-pytorch-inference-toolkit/.tox/dist/sagemaker_pytorch_inference-1.5.2.dev0.zip
py36 installed: apipkg==1.5,attrs==19.3.0,bcrypt==3.1.7,boto3==1.14.19,botocore==1.17.19,certifi==2020.6.20,cffi==1.14.0,chardet==3.0.4,click==7.1.2,coverage==5.2,cryptography==2.9.2,docutils==0.15.2,execnet==1.7.1,Flask==1.1.1,future==0.18.2,gevent==20.6.2,greenlet==0.4.16,gunicorn==20.0.4,idna==2.7,importlib-metadata==1.7.0,inotify-simple==1.2.1,itsdangerous==1.1.0,Jinja2==2.11.2,jmespath==0.10.0,MarkupSafe==1.1.1,mock==4.0.2,more-itertools==8.4.0,numpy==1.19.0,packaging==20.4,paramiko==2.7.1,Pillow==7.2.0,pkg-resources==0.0.0,pluggy==0.13.1,protobuf==3.12.2,protobuf3-to-dict==0.1.5,psutil==5.7.0,py==1.9.0,pycparser==2.20,PyNaCl==1.4.0,pyparsing==2.4.7,pytest==5.4.3,pytest-cov==2.10.0,pytest-forked==1.2.0,pytest-xdist==1.32.0,python-dateutil==2.8.1,PyYAML==5.3.1,requests==2.20.0,retrying==1.3.3,s3transfer==0.3.3,sagemaker==1.68.0,sagemaker-containers==2.8.6.post2,sagemaker-inference==1.3.2.post1,sagemaker-pytorch-inference @ file:///home/ubuntu/ts/sagemaker-pytorch-inference-toolkit/.tox/dist/sagemaker_pytorch_inference-1.5.2.dev0.zip,scipy==1.5.1,six==1.15.0,smdebug-rulesconfig==0.1.4,torch==1.5.1,torchvision==0.6.1,typing==3.7.4.1,urllib3==1.22,wcwidth==0.2.5,Werkzeug==1.0.1,zipp==3.1.0,zope.event==4.4,zope.interface==5.1.0
py36 runtests: PYTHONHASHSEED='2603046058'
py36 runtests: commands[0] | coverage run --rcfile .coveragerc --source sagemaker_pytorch_serving_container -m pytest test/integration/local -s
WARNING:root:pandas failed to import. Analytics features will be impaired or broken.
=========================================================================================================== test session starts ============================================================================================================
platform linux -- Python 3.6.9, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 -- /home/ubuntu/ts/sagemaker-pytorch-inference-toolkit/.tox/py36/bin/python3.6
cachedir: .pytest_cache
rootdir: /home/ubuntu/ts/sagemaker-pytorch-inference-toolkit, inifile: setup.cfg
plugins: forked-1.2.0, cov-2.10.0, xdist-1.32.0
collected 4 items

test/integration/local/test_serving.py::test_serve_json_npy WARNING:sagemaker:Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
WARNING:sagemaker:No framework_version specified, defaulting to version 0.4. framework_version will be required in SageMaker Python SDK v2. This is not the latest supported version. If you would like to use version 1.5.0, please add framework_version=1.5.0 to your constructor.
INFO:botocore.credentials:Found credentials in environment variables.
WARNING:sagemaker.local.image:Using the short-lived AWS credentials found in session. They might expire while running.
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ef1a18d0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ef1a1e10>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ef1a17b8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
Attaching to tmphbv2jo9s_algo-1-xr0n7_1
algo-1-xr0n7_1  | Model server started.
algo-1-xr0n7_1  | 2020-07-10 13:03:33,174 [INFO ] pool-1-thread-17 ACCESS_LOG - /172.19.0.1:50306 "GET /ping HTTP/1.1" 200 4
!algo-1-xr0n7_1  | 2020-07-10 13:03:33,674 [INFO ] W-9005-model_1 ACCESS_LOG - /172.19.0.1:50310 "POST /invocations HTTP/1.1" 200 132
algo-1-xr0n7_1  | 2020-07-10 13:03:34,124 [INFO ] W-9006-model_1 ACCESS_LOG - /172.19.0.1:50310 "POST /invocations HTTP/1.1" 200 125
algo-1-xr0n7_1  | 2020-07-10 13:03:34,551 [INFO ] W-9009-model_1 ACCESS_LOG - /172.19.0.1:50310 "POST /invocations HTTP/1.1" 200 122
algo-1-xr0n7_1  | 2020-07-10 13:03:34,718 [INFO ] W-9015-model_1 ACCESS_LOG - /172.19.0.1:50310 "POST /invocations HTTP/1.1" 200 37
algo-1-xr0n7_1  | 2020-07-10 13:03:34,889 [INFO ] W-9000-model_1 ACCESS_LOG - /172.19.0.1:50310 "POST /invocations HTTP/1.1" 200 43
algo-1-xr0n7_1  | 2020-07-10 13:03:35,052 [INFO ] W-9010-model_1 ACCESS_LOG - /172.19.0.1:50310 "POST /invocations HTTP/1.1" 200 28
Gracefully stopping... (press Ctrl+C again to force)
PASSED
test/integration/local/test_serving.py::test_serve_csv WARNING:sagemaker:Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
WARNING:sagemaker:No framework_version specified, defaulting to version 0.4. framework_version will be required in SageMaker Python SDK v2. This is not the latest supported version. If you would like to use version 1.5.0, please add framework_version=1.5.0 to your constructor.
WARNING:sagemaker.local.image:Using the short-lived AWS credentials found in session. They might expire while running.
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ec9c8080>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ec9e3240>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ec9e3c50>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
Attaching to tmp1evpzg0n_algo-1-1us1t_1
algo-1-1us1t_1  | Model server started.
algo-1-1us1t_1  | 2020-07-10 13:03:56,114 [INFO ] pool-1-thread-17 ACCESS_LOG - /172.19.0.1:50328 "GET /ping HTTP/1.1" 200 3
!algo-1-1us1t_1  | 2020-07-10 13:03:56,305 [INFO ] W-9009-model_1 ACCESS_LOG - /172.19.0.1:50332 "POST /invocations HTTP/1.1" 200 30
algo-1-1us1t_1  | 2020-07-10 13:03:56,493 [INFO ] W-9000-model_1 ACCESS_LOG - /172.19.0.1:50332 "POST /invocations HTTP/1.1" 200 47
algo-1-1us1t_1  | 2020-07-10 13:03:56,678 [INFO ] W-9013-model_1 ACCESS_LOG - /172.19.0.1:50332 "POST /invocations HTTP/1.1" 200 23
Gracefully stopping... (press Ctrl+C again to force)
PASSED
test/integration/local/test_serving.py::test_serve_cpu_model_on_gpu WARNING:sagemaker:Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
WARNING:sagemaker:No framework_version specified, defaulting to version 0.4. framework_version will be required in SageMaker Python SDK v2. This is not the latest supported version. If you would like to use version 1.5.0, please add framework_version=1.5.0 to your constructor.
WARNING:sagemaker.local.image:Using the short-lived AWS credentials found in session. They might expire while running.
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ec9e6b38>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ec9e6f28>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ec9e20b8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
Attaching to tmpwuf8pafw_algo-1-eqkyb_1
algo-1-eqkyb_1  | Model server started.
algo-1-eqkyb_1  | 2020-07-10 13:04:17,696 [INFO ] pool-1-thread-17 ACCESS_LOG - /172.19.0.1:50350 "GET /ping HTTP/1.1" 200 4
!algo-1-eqkyb_1  | 2020-07-10 13:04:17,886 [INFO ] W-9015-model_1 ACCESS_LOG - /172.19.0.1:50354 "POST /invocations HTTP/1.1" 200 30
Gracefully stopping... (press Ctrl+C again to force)
PASSED
test/integration/local/test_serving.py::test_serving_calls_model_fn_once WARNING:sagemaker:Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
WARNING:sagemaker:No framework_version specified, defaulting to version 0.4. framework_version will be required in SageMaker Python SDK v2. This is not the latest supported version. If you would like to use version 1.5.0, please add framework_version=1.5.0 to your constructor.
WARNING:sagemaker.local.image:Using the short-lived AWS credentials found in session. They might expire while running.
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ec870048>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ec870320>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20ec8700f0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
Attaching to tmpzfzvz320_algo-1-61bac_1
algo-1-61bac_1  | Model server started.
algo-1-61bac_1  | 2020-07-10 13:04:38,944 [INFO ] pool-1-thread-3 ACCESS_LOG - /172.19.0.1:50372 "GET /ping HTTP/1.1" 200 4
!algo-1-61bac_1  | 2020-07-10 13:04:38,968 [INFO ] W-9000-model_1 ACCESS_LOG - /172.19.0.1:50376 "POST /invocations HTTP/1.1" 200 3
algo-1-61bac_1  | 2020-07-10 13:04:38,975 [INFO ] W-9001-model_1 ACCESS_LOG - /172.19.0.1:50376 "POST /invocations HTTP/1.1" 200 1
algo-1-61bac_1  | 2020-07-10 13:04:38,981 [INFO ] W-9000-model_1 ACCESS_LOG - /172.19.0.1:50376 "POST /invocations HTTP/1.1" 200 1
Gracefully stopping... (press Ctrl+C again to force)
PASSED

============================================================================================================= warnings summary =============================================================================================================
test/integration/local/test_serving.py:69
  /home/ubuntu/ts/sagemaker-pytorch-inference-toolkit/test/integration/local/test_serving.py:69: PytestUnknownMarkWarning: Unknown pytest.mark.skip_cpu - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/latest/mark.html
    @pytest.mark.skip_cpu

-- Docs: https://docs.pytest.org/en/latest/warnings.html
================================================================================================= 4 passed, 1 warning in 91.55s (0:01:31) ==================================================================================================
Coverage.py warning: Module sagemaker_pytorch_serving_container was never imported. (module-not-imported)
Coverage.py warning: No data was collected. (no-data-collected)
py36 runtests: commands[1] | coverage report --fail-under=90 --include *sagemaker_pytorch_serving_container*
No data to report.
ERROR: InvocationError: '/home/ubuntu/ts/sagemaker-pytorch-inference-toolkit/.tox/py36/bin/coverage report --fail-under=90 --include *sagemaker_pytorch_serving_container*'
_________________________________________________________________________________________________________________ summary __________________________________________________________________________________________________________________```

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.