14. Using VECTOR Data — python-oracledb 3.2.0b1 documentation (original) (raw)

Oracle Database 23ai introduced a new data type VECTOR for artificial intelligence and machine learning search operations. The VECTOR data type is a homogeneous array of 8-bit signed integers, 8-bit unsigned integers, 32-bit floating-point numbers, or 64-bit floating-point numbers.

With the VECTOR data type, you can define the number of dimensions for the data and the storage format for each dimension value in the vector. The possible storage formats include:

Vectors can also be defined with an arbitrary number of dimensions and formats. This allows you to specify vectors of different dimensions with the various storage formats mentioned above. For example:

CREATE TABLE vector_table ( vec_data vector )

14.1. Using FLOAT32, FLOAT64, and INT8 Vectors

To create a table with three columns for vector data:

CREATE TABLE vector_table ( v32 vector(3, float32), v64 vector(3, float64), v8 vector(3, int8) )

In this example, each column can store vector data of three dimensions where each dimension value is of the specified format.

14.1.1. Inserting FLOAT32, FLOAT64, and INT8 Vectors

With python-oracledb, vector data can be inserted usingPython array.array()objects. Python arrays of type float (32-bit), double (64-bit), or int8_t (8-bit signed integer) are used as bind values when inserting vector columns. For example:

import array

vector_data_32 = array.array("f", [1.625, 1.5, 1.0]) # 32-bit float vector_data_64 = array.array("d", [11.25, 11.75, 11.5]) # 64-bit float vector_data_8 = array.array("b", [1, 2, 3]) # 8-bit signed integer

cursor.execute( "insert into vector_table (v32, v64, v8) values (:1, :2, :3)", [vector_data_32, vector_data_64, vector_data_8] )

14.1.2. Fetching FLOAT32, FLOAT64, and INT8 Vectors

With python-oracledb, vector columns of int8, float32, and float64 format are fetched as Python array.array() types. For example:

cursor.execute("select * from vector_table") for row in cursor: print(row)

This prints an output such as:

(array("f", [1.625, 1.5, 1.0]), array("d", [11.25, 11.75, 11.5]), array("b", [1, 2, 3]))

The FetchInfo object that is returned as part of the fetched metadata contains attributes FetchInfo.vector_dimensions andFetchInfo.vector_format which return the number of dimensions of the vector column and the format of each dimension value in the vector column respectively.

You can convert the vector data fetched from array.array() to a Python list by using the following output type handler:

def output_type_handler(cursor, metadata): if metadata.type_code is oracledb.DB_TYPE_VECTOR: return cursor.var(metadata.type_code, arraysize=cursor.arraysize, outconverter=list)

connection.outputtypehandler = output_type_handler

cursor.execute("select * from vector_table") for row in cursor: print(row)

For each vector column, the database will now return a Python list representation of each row’s value as shown below:

([1.625, 1.5, 1.0], [11.25, 11.75, 11.5], [1, 2, 3])

See Inserting Vectors with NumPy for an example of using an input type handler.

If you are using python-oracledb Thick mode with older versions of Oracle Client libraries than 23ai, see thissection.

See vector.py for a runnable example.

14.2. Using BINARY Vectors

A Binary vector format is supported when you are using Oracle Database 23.5, or later. The binary format represents each dimension value as a binary value (0 or 1). Binary vectors require less memory storage. For example, a 16 dimensional vector with binary format requires only 2 bytes of storage while a 16 dimensional vector with int8 format requires 16 bytes of storage.

Binary vectors are represented as 8-bit unsigned integers. For the binary format, you must define the number of dimensions as a multiple of 8.

To create a table with one column for vector data:

CREATE TABLE vector_binary_table ( vb vector(24, binary) )

In this example, the VB column can store vector data of 24 dimensions where each dimension value is represented as a single bit. Note that the number of dimensions 24 is a multiple of 8.

If you specify a vector dimension that is not a multiple of 8, then you will get ORA-51813.

14.2.1. Inserting BINARY Vectors

Python arrays of type uint8_t (8-bit unsigned integer) are used as bind values when inserting vector columns. The length of uint8_t arrays must be equal to the number of dimensions divided by 8. For example, if the number of dimensions for a vector column is 24, then the length of the array must be 3. The values in uint8_t arrays can range from 0 to 255. For example:

import array

vector_data_vb = array.array("B", [180, 150, 100]) # 8-bit unsigned integer

cursor.execute( "insert into vector_binary_table values (:1)", [vector_data_vb] )

14.2.2. Fetching BINARY Vectors

With python-oracledb, vector columns of binary format are fetched as Python array.array() types. For example:

cursor.execute("select * from vector_binary_table") for row in cursor: print(row)

This prints an output such as:

(array("B", [180, 150, 100]))

The FetchInfo object that is returned as part of the fetched metadata contains attributes FetchInfo.vector_dimensions andFetchInfo.vector_format which return the number of dimensions of the vector column and the format of each dimension value in the vector column respectively.

You can convert the vector data fetched from a connection to a Python list by using this output type handler. For each vector column, the database will now return a Python list representation of each row’s value.

If you are using python-oracledb Thick mode with older versions of Oracle Client libraries than 23ai, see thissection.

14.3. Using SPARSE Vectors

A Sparse vector is a vector which has zero value for most of its dimensions. This vector only physically stores the non-zero values. For more information on sparse vectors, see the Oracle AI Vector search User’s Guide.

Sparse vectors are supported when you are using Oracle Database 23.7 or later.

Sparse vectors are represented by the total number of vector dimensions, an array of indices, and an array of values where each value’s location in the vector is indicated by the corresponding indices array position. All other vector values are treated as zero. The storage formats that can be used with sparse vectors are float32, float64, and int8. Note that the binary storage format cannot be used with sparse vectors.

For example, a string representation could be:

[25, [5, 8, 11], [25.25, 6.125, 8.25]]

In this example, the sparse vector has 25 dimensions. Only indices 5, 8, and 11 have values which are 25.25, 6.125, and 8.25 respectively. All of the other values are zero.

In Oracle Database, you can define a column for a sparse vector using the following format:

VECTOR(number_of_dimensions, dimension_storage_format, sparse)

For example, to create a table with three columns for sparse vectors:

CREATE TABLE vector_sparse_table ( float32sparsecol vector(25, float32, sparse), float64sparsecol vector(30, float64, sparse), int8sparsecol vector(35, int8, sparse) )

In this example:

14.3.1. Inserting SPARSE Vectors

With python-oracledb, sparse vector data can be inserted usingSparseVector objects. The SparseVector objects are used when fetching vectors, and as bind values when inserting sparse vector columns. For example to insert data:

import array

32-bit float sparse vector

float32_val = oracledb.SparseVector( 25, [6, 10, 18], array.array('f', [26.25, 129.625, 579.875]) )

64-bit float sparse vector

float64_val = oracledb.SparseVector( 30, [9, 16, 24], array.array('d', [19.125, 78.5, 977.375]) )

8-bit signed integer sparse vector

int8_val = oracledb.SparseVector( 35, [10, 20, 30], array.array('b', [26, 125, -37]) )

cursor.execute( "insert into vector_sparse_table values (:1, :2, :3)", [float32_val, float64_val, int8_val] )

14.3.2. Fetching Sparse Vectors

With python-oracledb, sparse vector columns are fetched as SparseVector objects:

cursor.execute("select * from vector_sparse_table") for row in cursor: print(row)

This prints:

(oracledb.SparseVector(25, array('I', [6, 10, 18]), array('f', [26.25, 129.625, 579.875])), oracledb.SparseVector(30, array('I', [9, 16, 24]), array('d', [19.125, 78.5, 977.375])), oracledb.SparseVector(35, array('I', [10, 20, 30]), array('b', [26, 125, -37])))

Depending on context, the SparseVector type will be treated as a string:

cursor.execute("select * from vector_sparse_table") for float32_val, float64_val, int8_val in cursor: print("float32:", float32_val) print("float64:", float64_val) print("int8:", int8_val)

This prints:

float32: [25, [6, 10, 18], [26.25, 129.625, 579.875]] float64: [30, [9, 16, 24], [19.125, 78.5, 977.375]] int8: [35, [10, 20, 30], [26, 125, -37]]

Values can also be explicitly passed to str(), if needed.

SPARSE Vector Metadata

The FetchInfo object that is returned as part of the fetched metadata contains attributes FetchInfo.vector_dimensions,FetchInfo.vector_format, and FetchInfo.vector_is_sparse which return the number of dimensions of the vector column, the format of each dimension value in the vector column, and a boolean which determines whether the vector is sparse or not.

14.4. Using python-oracledb Thick Mode with Older Versions of Oracle Client Libraries

If you are using python-oracledb Thick mode with versions of Oracle Client libraries older than 23ai, then you must use strings when inserting vectors. The vector columns are fetched as Python lists.

14.4.1. Inserting Vectors with Older Oracle Client Versions

To insert vectors of int8, float32, float64, and unit8 format when using Oracle Client versions older than 23ai, you must use strings as shown below:

vector_data_32 = "[1.625, 1.5, 1.0]" vector_data_64 = "[11.25, 11.75, 11.5]" vector_data_8 = "[1, 2, 3]" vector_data_vb = "[180, 150, 100]"

cursor.execute( "insert into vector_table (v32, v64, v8, vb) values (:1, :2, :3, :4)", [vector_data_32, vector_data_64, vector_data_8, vector_data_vb] )

14.4.2. Fetching Vectors with Older Oracle Client Versions

With Oracle Client versions older than 23ai, the vector columns are fetched as Python lists. For example:

cursor.execute("select * from vector_table") for row in cursor: print(row)

This prints an output such as:

([1.625, 1.5, 1.0], [11.25, 11.75, 11.5], [1, 2, 3], [180, 150, 100])

See vector_string.py for a runnable example.

14.5. Using NumPy

Vector data can be used with Python’s NumPy package types. To use NumPy’s ndarray type, install NumPy, for example withpip install numpy, and import the module in your code.

14.5.1. Inserting Vectors with NumPy

To insert vectors, you must convert NumPy ndarray types to array types. This conversion can be done by using an input type handler. For example:

def numpy_converter_in(value): if value.dtype == numpy.float64: dtype = "d" elif value.dtype == numpy.float32: dtype = "f" elif value.dtype == numpy.uint8: dtype = "B" else: dtype = "b" return array.array(dtype, value)

def input_type_handler(cursor, value, arraysize): if isinstance(value, numpy.ndarray): return cursor.var( oracledb.DB_TYPE_VECTOR, arraysize=arraysize, inconverter=numpy_converter_in, )

Using it in an INSERT statement:

vector_data_32 = numpy.array([1.625, 1.5, 1.0], dtype=numpy.float32) vector_data_64 = numpy.array([11.25, 11.75, 11.5], dtype=numpy.float64) vector_data_8 = numpy.array([1, 2, 3], dtype=numpy.int8) vector_data_vb = numpy.array([180, 150, 100], dtype=numpy.uint8)

connection.inputtypehandler = input_type_handler

cursor.execute( "insert into vector_table (v32, v64, v8, vb) values (:1, :2, :3, :4)", [vector_data_32, vector_data_64, vector_data_8, vector_data_vb], )

14.5.2. Fetching Vectors with NumPy

To fetch vector data as an ndarray type, you can convert the array type to an ndarray type by using an output type handler. For example:

def numpy_converter_out(value): return numpy.array(value, copy=False, dtype=value.typecode)

def output_type_handler(cursor, metadata): if metadata.type_code is oracledb.DB_TYPE_VECTOR: return cursor.var( metadata.type_code, arraysize=cursor.arraysize, outconverter=numpy_converter_out, )

Using it to query the columns:

connection.outputtypehandler = output_type_handler

cursor.execute("select * from vector_table") for row in cursor: print(row)

This prints an output such as:

(array([1.625, 1.5, 1.0], dtype=float32), array([11.25, 11.75, 11.5], dtype=float64), array([1, 2, 3], dtype=int8), array([180, 150, 100], dtype=uint8))

See vector_numpy.py for a runnable example.