Run distributed Services (original) (raw)

View this page

Edit this page

Toggle table of contents sidebar

BentoML provides a flexible framework for deploying machine learning models as Services. While a single Service often suffices for most use cases, it is useful to create multiple Services running in a distributed way in more complex scenarios.

This document provides guidance on creating and deploying a BentoML project with distributed Services.

Single and distributed Services

Using a single BentoML Service in service.py is typically sufficient for most use cases. This approach is straightforward, easy to manage, and works well when you only need to deploy a single model and the API logic is simple.

In deployment, a BentoML Service runs as multiple processes in a container. If you define multiple Services, they run as processes in different containers. This distributed approach is useful when dealing with more complex scenarios, such as:

Interservice communication

Distributed Services support complex, modular architectures through interservice communication. Different Services can interact with each other using the bentoml.depends() function. This allows for direct method calls between Services as if they were local class functions. Key features of interservice communication:

Basic usage

The following service.py file contains two Services with different hardware requirements. To declare a dependency, use the bentoml.depends() function by passing the dependent Service class as an argument. This creates a direct link between Services for easy method invocation:

service.py

import bentoml import numpy as np

@bentoml.service(resources={"cpu": "200m", "memory": "512Mi"}) class Preprocessing: # A dummy prepocessing Service @bentoml.api def preprocess(self, input_series: np.ndarray) -> np.ndarray: return input_series

@bentoml.service(resources={"cpu": "1", "memory": "2Gi"}) class IrisClassifier: # Load the model from the Model Store iris_model = bentoml.models.BentoModel("iris_sklearn:latest") # Declare the preprocessing Service as a dependency preprocessing = bentoml.depends(Preprocessing)

def __init__(self):
    import joblib

    self.model = joblib.load(self.iris_model.path_of("model.pkl"))

@bentoml.api
def classify(self, input_series: np.ndarray) -> np.ndarray:
    input_series = self.preprocessing.preprocess(input_series)
    return self.model.predict(input_series)

Once a dependency is declared, invoking methods on the dependent Service is similar to calling a local method. In other words, Service A can call Service B as if Service A were invoking a class level function on Service B. This abstracts away the complexities of network communication, serialization, and deserialization.

Using bentoml.depends() is a recommended way for creating a BentoML project with distributed Services. It enhances modularity as you can develop reusable, loosely coupled Services that can be maintained and scaled independently.

Depend on an external deployment

BentoML also allows you to set an external deployment as a dependency for a Service. This means the Service can call a remote model and its exposed API endpoints. To specify an external deployment, use the bentoml.depends() function, either by providing the deployment name on BentoCloud or the URL if it’s already running.

Specify the Deployment name on BentoCloud

You can also pass the cluster parameter to specify the cluster where your Deployment is running.

import bentoml

@bentoml.service class MyService: # cluster is optional if your Deployment is in a non-default cluster iris = bentoml.depends(deployment="iris-classifier-x6dewa", cluster="my_cluster_name")

@bentoml.api
def predict(self, input: np.ndarray) -> int:
    # Call the predict function from the remote Deployment
    return int(self.iris.predict(input)[0][0])

Specify the URL

If the external deployment is already running and its API is exposed via a public URL, you can reference it by specifying the url parameter. Note that url and deployment/cluster are mutually exclusive.

import bentoml

@bentoml.service class MyService: # Call the model deployed on BentoCloud by specifying its URL iris = bentoml.depends(url="https://<iris.example-url.bentoml.ai>")

# Call the model served elsewhere
# iris = bentoml.depends(url="http://192.168.1.1:3000")

@bentoml.api
def predict(self, input: np.ndarray) -> int:
    # Make a request to the external service hosted at the specified URL
    return int(self.iris.predict(input)[0][0])

Tip

We recommend you specify the class of the external Service when using bentoml.depends(). This makes it easier to validate the types and methods available on the remote Service.

import bentoml

@bentoml.service class MyService: # Specify the external Service class for type-safe integration iris = bentoml.depends(IrisClassifier, deployment="iris-classifier-x6dewa", cluster="my_cluster")

Deploy distributed Services

To deploy a project with distributed Services to BentoCloud, we recommend you use a separate configuration file and reference it in the BentoML CLI command or Python API for deployment.

Here is an example:

config-file.yaml

name: "deployment-name" bento: . description: "This project creates an AI agent application" envs: # Optional. If you specify environment variables here, they will be applied to all Services

Inference: # Service two instance_type: "cpu.1" scaling: max_replicas: 5 min_replicas: 1

To deploy these Services to BentoCloud, you can choose either the BentoML CLI or Python API:

BentoML CLI

bentoml deploy -f config-file.yaml

Python API

import bentoml bentoml.deployment.create(config_file="config-file.yaml")

Refer to Configure Deployments to see the available configuration fields.