GitHub - aws-samples/sample-vllm-on-eks-with-dlc (original) (raw)

Deploy Large Language Models on Amazon EKS using vLLM Deep Learning Containers

In this tutorial, you will learn to deploy Large Language Models (LLMs) on Amazon Elastic Kubernetes Service (Amazon EKS) using vLLM Deep Learning Containers (DLCs) ! 🎉🤗🚀✨

Organizations today face significant challenges when deploying LLMs efficiently at scale. These challenges include optimizing GPU resource utilization, managing network infrastructure, and providing efficient access to model weights. This tutorial addresses these challenges by leveraging AWS DLCs for vLLM, which provide pre-configured, optimized Docker environments that eliminate the complexity of building inference environments from scratch.

In this tutorial, you will build a scalable, high-performance inference system for serving models such as Qwen 2.5 0.5B Instruct using AWS-optimized containers and modern cloud-native technologies.

Quick Start

1. Setup (5 mins)

git clone https://github.com/aws-samples/sample-vllm-on-eks-with-dlc cd sample-vllm-on-eks-with-dlc/bash chmod +x config.sh && ./config.sh

2. Configure AWS Profile

Click to expand detailed IAM setup instructions

Navigate to IAM in the AWS console. In the left navigation panel, select Users and Create user.

Name it eks-admin-cli (or whatever you prefer), and click on Next twice, and then click on Create user.

In the left panel, select Policies and then Create policy. Click on the JSON tab.

Paste the following JSON policy, and then click Next.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "EKSAndInfra", "Effect": "Allow", "Action": [ "eks:", "ec2:", "elasticloadbalancing:", "fsx:", "cloudformation:" ], "Resource": "" }, { "Sid": "IAMManagement", "Effect": "Allow", "Action": [ "iam:" ], "Resource": "" } ] }

You can name it EKS-Infra-Admin-Policy or something similar. Click on Create policy.

Go back to Users and click on eks-admin-cli. In the Permissions tab, look for Add permissions, and then Attach policies directly. Search for EKS-Infra-Admin-Policy. Click on Next, and then on Add permissions.

Go back to the User, and navigate to the Security credentials tab. Click on Create access key.

Click on Command Line Interface (CLI), confirm that you want to use CLI access, and click on Next.

In your local machine or the EC2 instance, run the following script:

aws configure --profile vllm-profile

Warning
This workshop was designed to run in us-west-2. Please define your profile in that AWS region.

3. Deploy Infrastructure (15-20 mins)

Deploy EKS cluster

chmod +x create_cluster.sh && ./create_cluster.sh

Deploy GPU node group

chmod +x create_node_group.sh && ./create_node_group.sh

Setup high-performance storage

chmod +x storage.sh && ./storage.sh

4. Install Controllers (5 mins)

chmod +x controllers.sh && ./controllers.sh

5. Deploy vLLM Application (10-15 mins)

chmod +x application.sh && ./application.sh

Test Your Deployment

Replace <YOUR_ALB_ENDPOINT> with the endpoint from step 5:

curl -X POST http:///v1/chat/completions
-H "Content-Type: application/json"
-d '{ "model": "Qwen/Qwen2.5-0.5B-Instruct", "messages": [{"role": "user", "content": "Hello, how are you?"}], "max_tokens": 100 }'

Architecture Overview

TODO: Fill this README out!

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.