Using OpenAI libraries with Vertex AI (original) (raw)

The Chat Completions API works as an Open AI-compatible endpoint, designed to make it easier to interface with Gemini on Vertex AI by using the OpenAI libraries for Python and REST. If you're already using the OpenAI libraries, you can use this API as a low-cost way to switch between calling OpenAI models and Vertex AI hosted models to compare output, cost, and scalability, without changing your existing code. If you aren't already using the OpenAI libraries, we recommend that youuse the Google Gen AI SDK.

Supported models

The Chat Completions API supports both Gemini models and select self-deployed models from Model Garden.

Gemini models

The following models provide support for the Chat Completions API:

Self-deployed models from Model Garden

TheHugging Face Text Generation Interface (HF TGI)andVertex AI Model Garden prebuilt vLLMcontainers support the Chat Completions API. However, not every model deployed to these containers supports the Chat Completions API. The following table includes the most popular supported models by container:

HF TGI	vLLM
gemma-2-9b-it gemma-2-27b-it Meta-Llama-3.1-8B-Instruct Meta-Llama-3-8B-Instruct Mistral-7B-Instruct-v0.3 Mistral-Nemo-Instruct-2407	Gemma Llama 2 Llama 3 Mistral-7B Mistral Nemo

Supported parameters

For Google models, the Chat Completions API supports the following OpenAI parameters. For a description of each parameter, see OpenAI's documentation onCreating chat completions. Parameter support for third-party models varies by model. To see which parameters are supported, consult the model's documentation.

messages	System message User message: The text andimage_url types are supported. Theimage_url type supports images stored a Cloud Storage URI or a base64 encoding in the form"data:;base64,". To learn how to create a Cloud Storage bucket and upload a file to it, seeDiscover object storage. The detail option is not supported. Assistant message Tool message Function message: This field is deprecated, but supported for backwards compatibility.
model
max_completion_tokens	Alias for max_tokens.
max_tokens
n
frequency_penalty
presence_penalty
reasoning_effort	Configures how much time and how many tokens are used on a response. low: 1024 medium: 8192 high: 24576 As no thoughts are included in the response, only one ofreasoning_effort or extra_body.google.thinking_config may be specified.
response_format	json_object: Interpreted as passing "application/json" to the Gemini API. json_schema. Fully recursive schemas are not supported. additional_properties is supported. text: Interpreted as passing "text/plain" to the Gemini API. Any other MIME type is passed as is to the model, such as passing "application/json" directly.
seed	Corresponds to GenerationConfig.seed.
stop
stream
temperature
top_p
tools	type function name description parameters: Specify parameters by using theOpenAPI specification. This differs from the OpenAI parameters field, which is described as a JSON Schema object. To learn about keyword differences between OpenAPI and JSON Schema, see theOpenAPI guide.
tool_choice	none auto required: Corresponds to the mode ANY in theFunctionCallingConfig. validated: Corresponds to the mode VALIDATED in the FunctionCallingConfig. This is Google-specific.
web_search_options	Corresponds to the GoogleSearch tool. No sub-options are supported.
function_call	This field is deprecated, but supported for backwards compatibility.
functions	This field is deprecated, but supported for backwards compatibility.

If you pass any unsupported parameter, it is ignored.

Multimodal input parameters

The Chat Completions API supports select multimodal inputs.

input_audio	data: Any URI or valid blob format. We support all blob types, including image, audio, and video. Anything supported by GenerateContent is supported (HTTP, Cloud Storage, etc.). format: OpenAI supports both wav (audio/wav) and mp3 (audio/mp3). Using Gemini, all valid MIME types are supported.
image_url	data: Like input_audio, any URI or valid blob format is supported. Note that image_url as a URL will default to the image/* MIME-type and image_url as blob data can be used as any multimodal input. detail: Similar tomedia resolution, this determines the maximum tokens per image for the request. Note that while OpenAI's field is per-image, Gemini enforces the same detail across the request, and passing multiple detail types in one request will throw an error.

input_audio

data: Any URI or valid blob format. We support all blob types, including image, audio, and video. Anything supported by GenerateContent is supported (HTTP, Cloud Storage, etc.). format: OpenAI supports both wav (audio/wav) and mp3 (audio/mp3). Using Gemini, all valid MIME types are supported.

image_url

data: Like input_audio, any URI or valid blob format is supported. Note that image_url as a URL will default to the image/* MIME-type and image_url as blob data can be used as any multimodal input. detail: Similar tomedia resolution, this determines the maximum tokens per image for the request. Note that while OpenAI's field is per-image, Gemini enforces the same detail across the request, and passing multiple detail types in one request will throw an error.

In general, the data parameter can be a URI or a combination of MIME type and base64 encoded bytes in the form "data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>". For a full list of MIME types, see GenerateContent. For more information on OpenAI's base64 encoding, see their documentation.

For usage, see our multimodal input examples.

Gemini-specific parameters

There are several features supported by Gemini that are not available in OpenAI models. These features can still be passed in as parameters, but must be contained within anextra_content or extra_body or they will be ignored.

`extra_body` features

safety_settings	This corresponds to Gemini's SafetySetting.
cached_content	This corresponds to Gemini's GenerateContentRequest.cached_content.
thinking_config	This corresponds to Gemini's GenerationConfig.ThinkingConfig.
thought_tag_marker	Used to separate a model's thoughts from its responses for models with Thinking available. If not specified, no tags will be returned around the model's thoughts. If present, subsequent queries will strip the thought tags and mark the thoughts appropriately for context. This helps preserve the appropriate context for subsequent queries.

extra_part lets you specify additional settings at a per-Part level.

extra_content	A field for adding Gemini-specific content that shouldn't be ignored.
thought	This will explicitly mark if a field is a thought (and take precedence overthought_tag_marker). This should be used to specify whether a tool call is part of a thought or not.

What's next

Learn more aboutauthentication and credentialingwith the OpenAI-compatible syntax.
See examples of calling theChat Completions APIwith the OpenAI-compatible syntax.
See examples of calling theInference APIwith the OpenAI-compatible syntax.
See examples of calling theFunction Calling APIwith OpenAI-compatible syntax.
Learn more about the Gemini API.
Learn more about migrating from Azure OpenAI to the Gemini API.