Introduction to external functions | Snowflake Documentation (original) (raw)

This topic describes external functions, which call executable code that is developed, maintained, stored, and executed outside Snowflake.

This topic helps you:

What is an external function?

An external function calls code that is executed outside Snowflake.

The remotely executed code is known as a remote service.

Information sent to a remote service is usually relayed through a proxy service.

Snowflake stores security-related external function information in an API integration.

The diagram below shows the basic information flow from a client program, through Snowflake, and to the remote service:

../_images/external-functions-overview-07.png

Each of the key components is described in more detail below.

External Function:

An external function is a type of UDF. Unlike other UDFs, an external function does not contain its own code; instead, the external function calls code that is stored and executed outside Snowflake.

Inside Snowflake, the external function is stored as a database object that contains information that Snowflake uses to call the remote service. This stored information includes the URL of theproxy servicethat relays information to and from the remote service. This information is specified as part of the CREATE EXTERNAL FUNCTION command.

The database object that represents the external function is created in a specific database and schema. The external function can be called using dot notation to represent the fully-qualified name. For example:

select my_database.my_schema.my_external_function(col1) from table1;

Remote Service:

The remotely executed code is known as a remote service.

The remote service must act like a function. For example, it must return a value.

Snowflake supports scalar external functions; the remote service must return exactly one row for each row received.

To be called by the Snowflake external function feature, the remote service must:

For example, a remote service can be implemented as:

Proxy Service:

Snowflake does not call aremote service directly. Instead, Snowflake calls a proxy service, which relays the data to the remote service.

The proxy service can increase security by authenticating requests to the remote service.

The proxy service can support subscription-based billing for a remote service. For example, the proxy service can verify that a caller to the remote service is a paid subscriber.

The proxy service also relays the response from the remote service back to Snowflake.

Examples of proxy services include:

API Integration:

An integration is a Snowflake object that provides an interface between Snowflake and third-party services. An API integration stores information, such as security information, that is needed to work with a proxy service or remote service.

An API integration is created with the CREATE API INTEGRATION command.

Users can write and call their own remote services, or call remote services written by third parties. These remote services can be written using any HTTP server stack, including cloud serverless compute services such as AWS Lambda.

From the perspective of a user running a SQL statement, an external function behaves like any otherUDF . External functions follow these rules:

How external functions work

Snowflake does not call a remote service directly. Instead, Snowflake calls the remote service through a cloud provider’s native HTTPS proxy service, for example API Gateway on AWS.

The main steps to call an external function are:

  1. A user’s client program passes Snowflake a SQL statement that calls an external function.
  2. When evaluating the external function as part of the query execution, Snowflake reads the external function definition and the corresponding API integration information.
    • The information from the external function definition includes:
      * The URL of the proxy service.
      * The name of the corresponding API integration.
    • The information from the API integration includes:
      * The proxy service resource to use. The resource contains information about the remote service, such as the location of that service.
      * The authentication information for that proxy service resource.

Snowflake then composes an HTTP POST command that includes:

  1. The proxy service receives the POST and then processes and forwards the request to the actual remote service. You can loosely think of the proxy service and resource as a “relay function” that calls the remote service.
  2. The remote service processes the data and returns the result, which is passed back through the chain to the original SQL statement.
  3. If the remote service responds with an HTTP code to signalasynchronous processing, then Snowflake sends one or more HTTP GET requests to retrieve the result from the remote service. Snowflake continues to send GET requests as long as it receives the response code to keep requesting, or until the external function times out or returns an error.

Typically, when a query has a large number of rows to send to a remote service, the rows are split into batches. Batches typically allow more parallelism and faster queries. In some cases, batches reduce overloading of the remote service.

A remote service returns 1 batch of rows for each batch received. For a scalar external function, the number of rows in the returned batch is equal to the number of rows in the received batch.

Each batch has a unique batch ID, which is included in each request sent from Snowflake to the remote service.

Retry operations (e.g. due to timeouts) are typically done at the batch level.

Advantages of external functions

External functions have the following advantages over other UDFs:

Limitations of external functions

External functions have the following limitations, which can be loosely grouped into creation-time limitations and execution-time limitations.

Creation-time limitations and requirements

Execution-time limitations and issues

Billing for external functions usage

Using external functions incurs normal costs associated with:

In addition, you might need to pay indirect or third-party charges, including charges by the provider of the remote service. Charges can vary from vendor to vendor.

Note

Data sent via Amazon API Gateway Private Endpoints incurs AWS PrivateLink charges for both ingress and egress.

Supported platforms

Platforms that support calling an external function

In general, an external function can be called from a Snowflake account on any cloud platform that Snowflake supports:

Exceptions are listed below:

The SQL syntax for calling an external function is the same on all platforms.

The SQL statements (CREATE EXTERNAL FUNCTION andCREATE API INTEGRATION) that configure access to these services are the same for all platforms. However, the clauses within these statements vary, depending upon the platforms hosting the proxy service and the remote service.

Platforms that support creating an external function’s remote service and proxy service

Although an external function can be called from any platform, the external function’s remote service and proxy service must each be created on specific supported platforms.

In many cases, the platform and account for the remote service are the same as the platform and account for the proxy service. However, that is not required. For example, a SQL query could call an Azure Function (remote service) via an AWS API Gateway (proxy service). The SQL query itself could be running on a Snowflake instance running on GCP.

Platforms that support a remote service

You need an HTTP server stack to host the remote service. Any HTTP server stack that can support the remote service should be compatible with external functions.

To create your remote service, you typically need:

Snowflake provides instructions for creating a remote service as:

Platforms that support a proxy service

You need an instance of a native HTTP proxy service on a cloud platform.

To configure your proxy service, you typically need:

The following HTTPS proxy services are supported:

The sections below contain platform-specific information that users should be aware of before choosing a platform.

Platform-specific restrictions

AWS: