Function Calling — NVIDIA NIM for Large Language Models (LLMs) (original) (raw)
You can connect NIM to external tools and services using function calling (also known as tool calling). By providing a list of available functions, NIM can choose to output function arguments for the relevant function(s) which you can execute to augment the prompt with relevant external information.
Function calling is controlled using the tool_choice
, tools
, and parallel_tool_calls
parameters. Only the following models support function calling, and only a subset of those models support parallel tool calling.
Supported Models#
Parameters#
To use function calling, modify the tool_choice
, tools
, and parallel_tool_calls
parameters.
tool_choice
options#
"none"
: Disables the use of tools."auto"
: Enables the model to decide whether to use tools and which ones to use."required"
: Forces the model to use a tool, but the model chooses which one.- Named tool choice: Forces the model to use a specific tool. It must be in the following format:
{
"type": "function",
"function": {
"name": "name of the tool goes here"
}
}
Note: tool_choice
can only be set when tools
is also set, and vice versa. These parameters work together to define and control the use of tools in the model’s responses. For further information on these parameters and their usage, see the OpenAI API documentation.
Example Workflows#
These examples showcase various ways to use function calling with NIM:
- Basic Function Calling: Demonstrates how to use a single function with automatic tool choice.
- Multiple Tools: Shows how to provide multiple tools, including one without parameters.
- Forced Tool Usage: Illustrates how to force the model to use a specific tool.
- Parallel Tool Calling: Exemplifies how to use parallel tool calling with a supporting model.
1. Basic Function Calling#
This example shows how to use a single function with automatic tool choice.
from openai import OpenAI
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used") MODEL_NAME = "meta/llama-3.1-70b-instruct"
Define available function
weather_tool = { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the user's location." } }, "required": ["location", "format"] } } }
messages = [ {"role": "user", "content": "Is it hot in Pittsburgh, PA right now?"} ]
chat_response = client.chat.completions.create( model=MODEL_NAME, messages=messages, tools=[weather_tool], tool_choice="auto", stream=False )
assistant_message = chat_response.choices[0].message messages.append(assistant_message)
print(assistant_message)
Example output:
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_abc123', function=Function(arguments='{"location": "Pittsburgh, PA", "format": "fahrenheit"}', name='get_current_weather'), type='function')])
Simulate external function call
tool_call_result = 88 tool_call_id = assistant_message.tool_calls[0].id tool_function_name = assistant_message.tool_calls[0].function.name messages.append({"role": "tool", "content": str(tool_call_result), "tool_call_id": tool_call_id, "name": tool_function_name})
chat_response = client.chat.completions.create( model=MODEL_NAME, messages=messages, tools=[weather_tool], tool_choice="auto", stream=False )
assistant_message = chat_response.choices[0].message print(assistant_message)
Example output:
ChatCompletionMessage(content='Based on the current temperature of 88°F (31°C) in Pittsburgh, PA, it is indeed quite hot right now. This temperature is generally considered warm to hot, especially if accompanied by high humidity, which is common in Pittsburgh during summer months.', role='assistant', function_call=None, tool_calls=None)
2. Multiple Tools#
You can also define more than one tool for tools
, including tools with no parameters, like the time_tool
below.
weather_tool = { # ... (same as in the previous example) }
time_tool = { "type": "function", "function": { "name": "get_current_time_nyc", "description": "Get the current time in NYC.", "parameters": {} } }
messages = [ {"role": "user", "content": "What's the current time in New York?"} ]
chat_response = client.chat.completions.create( model="meta/llama-3.1-70b-instruct", messages=messages, tools=[weather_tool, time_tool], tool_choice="auto", stream=False )
assistant_message = chat_response.choices[0].message print(assistant_message)
Example output:
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[
ChatCompletionMessageToolCall(id='call_ghi789', function=Function(arguments='{}', name='get_current_time_nyc'), type='function')
])
Process tool calls and generate final response as in the previous example
3. Named Tool Usage#
This example forces the model to use a specific tool.
chat_response = client.chat.completions.create( model="meta/llama-3.1-70b-instruct", messages=[{"role": "user", "content": "What's the weather in New York City like?"}], tools=[weather_tool], tool_choice={ "type": "function", "function": { "name": "get_current_weather" } }, stream=False )
assistant_message = chat_response.choices[0].message print(assistant_message)
Example output:
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_jkl012', function=Function(arguments='{"location": "New York, NY", "format": "fahrenheit"}', name='get_current_weather'), type='function')])
Process tool call and generate final response as in the previous examples
4. Parallel Tool Calling#
Some models are able to respond with multiple tool calls in one message. This example demonstrates parallel tool calling using a model that supports it.
chat_response = client.chat.completions.create( model="mistralai/Mistral-7B-Instruct-v0.3", messages=[{"role": "user", "content": "What's the weather and time in New York?"}], tools=[weather_tool, time_tool], tool_choice="auto", parallel_tool_calls=True, stream=False )
assistant_message = chat_response.choices[0].message print(assistant_message)
Example output:
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[
ChatCompletionMessageToolCall(id='call_mno345', function=Function(arguments='{"location": "New York, NY", "format": "fahrenheit"}', name='get_current_weather'), type='function'),
ChatCompletionMessageToolCall(id='call_pqr678', function=Function(arguments='{}', name='get_current_time'), type='function')
])
Process multiple tool calls in parallel
tool_results = [] for tool_call in assistant_message.tool_calls: if tool_call.function.name == "get_current_weather": # Simulate weather API call weather_result = "75°F" tool_results.append({"role": "tool", "content": weather_result, "tool_call_id": tool_call.id, "name": tool_call.function.name}) elif tool_call.function.name == "get_current_time": # Simulate time API call time_result = "2:30 PM EDT" tool_results.append({"role": "tool", "content": time_result, "tool_call_id": tool_call.id, "name": tool_call.function.name})
Add tool results to messages
messages.extend(tool_results)
Generate final response based on all tool call results
Note that not all models support parallel tool calls
chat_response = client.chat.completions.create( model="mistralai/Mistral-7B-Instruct-v0.3", messages=messages, tools=[weather_tool, time_tool], tool_choice="auto", stream=False )
final_response = chat_response.choices[0].message print(final_response)