Remote service input and output data formats (original) (raw)

When Snowflake sends data to a remote service, or receives data from a remote service, the data must be formatted correctly. This topic provides information about the proper data formats. Data received from and returned to Snowflake must also be of an appropriate data type.

When executing an external function, for example, Snowflake sends and expects data in the format described here. It sends the data to a proxy service, not directly to the remote service (for more, see Introduction to external functions). Therefore, the proxy service must receive (and return) data in a Snowflake-compatible format. Although typically the proxy service passes data through unchanged, the proxy can reformat data (both sending and receiving) to meet the needs of both the remote service and Snowflake.

For simplicity, and to help illustrate the formats that Snowflake expects to send and receive, most of the examples in this section assume that the remote service reads and writes data in the same format as Snowflake expects, and the proxy service passes data through unchanged in both directions.

Data format sent by Snowflake

Each HTTP request from Snowflake is a POST or a GET.

Body format

The body of the POST request contains the data, serialized in JSON format.

The schema for the JSON is:

For examples of extracting data in a remote service on each platform, see:

Optionally, the JSON can be compressed for transmission over the network. Compression is documented in CREATE EXTERNAL FUNCTION.

Body example

Here’s an example of a serialized request for an external function with the signaturef(integer, varchar, timestamp). Note that the first column is the row number within the batch, and the next three values are the arguments to the external function.

{ "data": [ [0, 10, "Alex", "2014-01-01 16:00:00"], [1, 20, "Steve", "2015-01-01 16:00:00"], [2, 30, "Alice", "2016-01-01 16:00:00"], [3, 40, "Adrian", "2017-01-01 16:00:00"] ] }

Data format received by Snowflake

Body format

When a remote service finishes processing a batch, the remote service should send data back to Snowflake in a JSON format similar to the format of the data sent by Snowflake.

The JSON response returned to Snowflake should contain one row for each row sent by Snowflake. Each returned row contains two values:

So that Snowflake can correlate the response with the request, the row numbers in the returned data must correspond to the row numbers in the data that Snowflake sent and must be returned in the same order as they were received.

Body access example

The following JSON example shows two rows containing an OBJECT value, each preceded by a row number:

{ "data": [ [ 0, { "City" : "Warsaw", "latitude" : 52.23, "longitude" : 21.01 } ], [ 1, { "City" : "Toronto", "latitude" : 43.65, "longitude" : -79.38 } ] ] }

To compose one of these returned rows with Python, you might use the following code:

... row_number = 0 output_value = {}

output_value["city"] = "Warsaw" output_value["latitude"] = 21.01 output_value["longitude"] = 52.23 row_to_return = [row_number, output_value] ...

To access the OBJECT value of a returned row with SQL, use the notation described inTraversing Semi-structured Data. For example:

select val:city, val:latitude, val:longitude from (select ext_func_city_lat_long(city_name) as val from table_of_city_names);

Header format

The response can also contain the following optional HTTP headers:

Optionally, the JSON can be compressed for transmission over the network. Compression is documented in CREATE EXTERNAL FUNCTION.

For information about timeouts and retries, see Account for timeout errors andDo not assume that the remote service is passed each row exactly once.

Status code

The response also contains an HTTP status code. Snowflake recognizes the following HTTP status codes:

Code Description
200 Batch processed successfully.
202 Batch received and still being processed.

Other values are treated as errors.

Response creation example

The example Python code below returns a proper response, including the HTTP response code, the processed data, and an MD5 header (which is optional).

This example is based on an AWS Lambda function. Some code might need customization for different platforms.

import json import hashlib import base64

def handler(event, context):

# The return value should contain an array of arrays (one inner array
# per input row for a scalar function).
array_of_rows_to_return = [ ]

...

json_compatible_string_to_return = json.dumps({"data" : array_of_rows_to_return})

# Calculate MD5 checksum for the response
md5digest = hashlib.md5(json_compatible_string_to_return.encode('utf-8')).digest()
response_headers = {
    'Content-MD5' : base64.b64encode(md5digest)
}

# Return the HTTP status code, the processed data, and the headers
# (including the Content-MD5 header).
return {
    'statusCode': 200,
    'body': json_compatible_string_to_return,
    'headers': response_headers
}