Serverless AI with Gemma 3 on Cloud Run (original) (raw)

Today, we introduced Gemma 3, a family of lightweight, open models built with the cutting-edge technology behind Gemini 2.0. The Gemma 3 family of models have been designed for speed and portability, empowering developers to build sophisticated AI applications at scale. Combined with Cloud Run, it has never been easier to deploy your serverless workloads with AI models.

In this post, we’ll explore the functionalities of Gemma 3, and how you can run it on Cloud Run.

Gemma 3: Power and efficiency for Cloud deployments

Gemma 3 is engineered for exceptional performance with lower memory footprints, making it ideal for cost-effective inference workloads.

Serverless inference with Gemma 3 and Cloud Run

Gemma 3 is a great fit for inference workloads on Cloud Run using Nvidia L4 GPUs. Cloud Run is Google Cloud's fully managed serverless platform, helping developers leverage container runtimes without having to concern themselves with the underlying infrastructure. Models scale to zero when inactive, and scale dynamically with demand. Not only does this optimize costs and performance, but you only pay for what you use.

For example, you could host an LLM on one Cloud Run service and a chat agent on another, enabling independent scaling and management. And with GPU acceleration, a Cloud Run service can be ready with the first AI inference results in under 30 seconds, with only 5 seconds to start an instance. This rapid deployment ensures that your applications deliver responsive user experiences. We also reduced the GPU price in Cloud Run down to ~$0.6/hr. And of course, if your service isn't receiving requests, it will scale down to zero.

Get started today

Cloud Run and Gemma 3 combine to create a powerful, cost-effective, and scalable solution for deploying advanced AI applications. Gemma 3 is supported by a variety of tools and frameworks, such as Hugging Face Transformers, Ollama, and vLLM.

To get started, visit this guide which will show you how to build a service with Gemma 3 on Cloud Run with Ollama.

Posted in