Launch (original) (raw)

Scale Launch

CI pypi

Simple, scalable, and high performance ML service deployment in python.

Example

Launch Usage

[](#%5F%5Fcodelineno-0-1)import os [](#%5F%5Fcodelineno-0-2)import time [](#%5F%5Fcodelineno-0-3)from launch import LaunchClient [](#%5F%5Fcodelineno-0-4)from launch import EndpointRequest [](#%5F%5Fcodelineno-0-5)from pydantic import BaseModel [](#%5F%5Fcodelineno-0-6)from rich import print [](#%5F%5Fcodelineno-0-7) [](#%5F%5Fcodelineno-0-8) [](#%5F%5Fcodelineno-0-9)class MyRequestSchema(BaseModel): [](#%5F%5Fcodelineno-0-10) x: int [](#%5F%5Fcodelineno-0-11) y: str [](#%5F%5Fcodelineno-0-12) [](#%5F%5Fcodelineno-0-13)class MyResponseSchema(BaseModel): [](#%5F%5Fcodelineno-0-14) __root__: int [](#%5F%5Fcodelineno-0-15) [](#%5F%5Fcodelineno-0-16) [](#%5F%5Fcodelineno-0-17)def my_load_predict_fn(model): [](#%5F%5Fcodelineno-0-18) def returns_model_of_x_plus_len_of_y(x: int, y: str) -> int: [](#%5F%5Fcodelineno-0-19) """MyRequestSchema -> MyResponseSchema""" [](#%5F%5Fcodelineno-0-20) assert isinstance(x, int) and isinstance(y, str) [](#%5F%5Fcodelineno-0-21) return model(x) + len(y) [](#%5F%5Fcodelineno-0-22) [](#%5F%5Fcodelineno-0-23) return returns_model_of_x_plus_len_of_y [](#%5F%5Fcodelineno-0-24) [](#%5F%5Fcodelineno-0-25) [](#%5F%5Fcodelineno-0-26)def my_load_model_fn(): [](#%5F%5Fcodelineno-0-27) def my_model(x): [](#%5F%5Fcodelineno-0-28) return x * 2 [](#%5F%5Fcodelineno-0-29) [](#%5F%5Fcodelineno-0-30) return my_model [](#%5F%5Fcodelineno-0-31) [](#%5F%5Fcodelineno-0-32)BUNDLE_PARAMS = { [](#%5F%5Fcodelineno-0-33) "model_bundle_name": "test-bundle", [](#%5F%5Fcodelineno-0-34) "load_predict_fn": my_load_predict_fn, [](#%5F%5Fcodelineno-0-35) "load_model_fn": my_load_model_fn, [](#%5F%5Fcodelineno-0-36) "request_schema": MyRequestSchema, [](#%5F%5Fcodelineno-0-37) "response_schema": MyResponseSchema, [](#%5F%5Fcodelineno-0-38) "requirements": ["pytest==7.2.1", "numpy"], # list your requirements here [](#%5F%5Fcodelineno-0-39) "pytorch_image_tag": "1.7.1-cuda11.0-cudnn8-runtime", [](#%5F%5Fcodelineno-0-40)} [](#%5F%5Fcodelineno-0-41) [](#%5F%5Fcodelineno-0-42)ENDPOINT_PARAMS = { [](#%5F%5Fcodelineno-0-43) "endpoint_name": "demo-endpoint", [](#%5F%5Fcodelineno-0-44) "model_bundle": "test-bundle", [](#%5F%5Fcodelineno-0-45) "cpus": 1, [](#%5F%5Fcodelineno-0-46) "min_workers": 0, [](#%5F%5Fcodelineno-0-47) "endpoint_type": "async", [](#%5F%5Fcodelineno-0-48) "update_if_exists": True, [](#%5F%5Fcodelineno-0-49) "labels": { [](#%5F%5Fcodelineno-0-50) "team": "MY_TEAM", [](#%5F%5Fcodelineno-0-51) "product": "launch", [](#%5F%5Fcodelineno-0-52) } [](#%5F%5Fcodelineno-0-53)} [](#%5F%5Fcodelineno-0-54) [](#%5F%5Fcodelineno-0-55)def predict_on_endpoint(request: MyRequestSchema) -> MyResponseSchema: [](#%5F%5Fcodelineno-0-56) # Wait for the endpoint to be ready first before submitting a task [](#%5F%5Fcodelineno-0-57) endpoint = client.get_model_endpoint(endpoint_name="demo-endpoint") [](#%5F%5Fcodelineno-0-58) while endpoint.status() != "READY": [](#%5F%5Fcodelineno-0-59) time.sleep(10) [](#%5F%5Fcodelineno-0-60) [](#%5F%5Fcodelineno-0-61) endpoint_request = EndpointRequest(args=request.dict(), return_pickled=False) [](#%5F%5Fcodelineno-0-62) [](#%5F%5Fcodelineno-0-63) future = endpoint.predict(request=endpoint_request) [](#%5F%5Fcodelineno-0-64) raw_response = future.get() [](#%5F%5Fcodelineno-0-65) [](#%5F%5Fcodelineno-0-66) response = MyResponseSchema.parse_raw(raw_response.result) [](#%5F%5Fcodelineno-0-67) return response [](#%5F%5Fcodelineno-0-68) [](#%5F%5Fcodelineno-0-69) [](#%5F%5Fcodelineno-0-70)client = LaunchClient(api_key=os.getenv("LAUNCH_API_KEY")) [](#%5F%5Fcodelineno-0-71) [](#%5F%5Fcodelineno-0-72)client.create_model_bundle_from_callable_v2(**BUNDLE_PARAMS) [](#%5F%5Fcodelineno-0-73)endpoint = client.create_model_endpoint(**ENDPOINT_PARAMS) [](#%5F%5Fcodelineno-0-74) [](#%5F%5Fcodelineno-0-75)request = MyRequestSchema(x=5, y="hello") [](#%5F%5Fcodelineno-0-76)response = predict_on_endpoint(request) [](#%5F%5Fcodelineno-0-77)print(response) [](#%5F%5Fcodelineno-0-78)""" [](#%5F%5Fcodelineno-0-79)MyResponseSchema(__root__=10) [](#%5F%5Fcodelineno-0-80)"""

What's going on here:

Notice that we specified min_workers=0, meaning that the endpoint will scale down to 0 workers when it's not being used.

Installation

To use Scale Launch, first install it using pip:

Installation

[](#%5F%5Fcodelineno-1-1)pip install -U scale-launch