10. Tuning python-oracledb — python-oracledb 3.2.0b1 documentation (original) (raw)

Some general tuning tips are:

10.1. Tuning Fetch Performance

To tune queries, you can adjust python-oracledb’s internal buffer sizes to improve the speed of fetching rows across the network from the database, and to optimize memory usage. This can reduce round-trips which helps performance and scalability. Tune “array fetching” withCursor.arraysize and tune “row prefetching” withCursor.prefetchrows. Set these before callingCursor.execute(). The value used for prefetching can also be set in anoraaccess.xml file, see Optional Oracle Client Configuration File. In python-oracledb Thick mode, the internal buffers allocated for prefetchrows and arraysize are separate, so increasing both settings will require more Python process memory. Queries that return LOBs and similar types will never prefetch rows, so theprefetchrows value is ignored in those cases.

The internal buffer sizes do not affect how or when rows are returned to your application regardless of which python-oracledb method is used to fetch query results. They do not affect the minimum or maximum number of rows returned by a query.

The difference between row prefetching and array fetching is when the internal buffering occurs. Internally python-oracledb performs separate “execute SQL statement” and “fetch data” steps. Prefetching allows query results to be returned to the application when the acknowledgment of successful statement execution is returned from the database. This means that the subsequent internal “fetch data” operation does not always need to make a round-trip to the database because rows are already buffered in python-oracledb or in the Oracle Client libraries. An overhead of prefetching when using the python-oracledb Thick mode is the need for additional data copies from Oracle Client’s prefetch buffer when fetching the first batch of rows. This cost may outweigh the benefits of using prefetching in some cases.

10.1.1. Choosing values for arraysize and prefetchrows

The best Cursor.arraysize and Cursor.prefetchrows values can be found by experimenting with your application under the expected load of normal application use. The reduction of round-trips may help performance and overall system scalability. The documentation in round-trips shows how to measure round-trips.

Here are some suggestions for tuning:

The following table shows the number of round-trips required to fetch various numbers of rows with different prefetchrows and arraysize values.

Table 10.1 Effect of prefetchrows and arraysize on the number of round-trips

Number of rows prefetchrows arraysize Round-trips
1 2 100 1
100 2 100 2
1000 2 100 11
10000 2 100 101
10000 2 1000 11
10000 1000 1000 11
20 20 20 2
20 21 20 1

10.1.1.1. Application Default Prefetchrows and Arraysize Values

Application-wide defaults can be set using defaults.prefetchrows anddefaults.arraysize, for example:

import oracledb

oracledb.defaults.prefetchrows = 1000 oracledb.defaults.arraysize = 1000

When using python-oracledb in the Thick mode, prefetching can also be tuned in an external oraaccess.xml file, which may be useful for tuning an application when modifying its code is not feasible.

Setting the sizes with oracledb.defaults attributes or withoraaccess.xml will affect the whole application, so it should not be the first tuning choice.

10.1.1.2. Changing Prefetchrows and Arraysize for Re-executed Statements

In python-oracledb, the arraysize and prefetchrows values are only examined when a statement is executed the first time. To change the values for a re-executed statement, create a new cursor. For example, to changearraysize:

array_sizes = (10, 100, 1000) for size in array_sizes: cursor = connection.cursor() cursor.arraysize = size start = time.time() cursor.execute(sql).fetchall() elapsed = time.time() - start print("Time for", size, elapsed, "seconds")

10.1.1.3. Avoiding Premature Prefetching

There are two cases that will benefit from setting prefetchrows to zero:

10.1.2. Tuning Fetching from REF CURSORS

The internal buffering and performance of fetching data from REF CURSORS can be tuned by setting the value of arraysize before rows are fetched from the cursor. The prefetchrows value is ignored when fetching from REF CURSORS.

For example:

ref_cursor = connection.cursor() cursor.callproc("myrefcursorproc", [ref_cursor])

ref_cursor.arraysize = 1000 print("Sum of IntCol for", num_rows, "rows:") for row in ref_cursor: sum_rows += row[0] print(sum_rows)

The arraysize value can also be set before calling the procedure:

ref_cursor = connection.cursor() ref_cursor.arraysize = 1000

cursor.callproc("myrefcursorproc", [ref_cursor]) for row in ref_cursor: . . .

Also see Avoiding Premature Prefetching.

10.1.3. Tuning Fetching for DataFrames

When fetching data frames withConnection.fetch_df_all() or Connection.fetch_df_batches(), tuning of data transfer across the network is controlled by the methodsarraysize or size parameters, respectively.

Any defaults.prefetchrows value is ignored since these methods always set the internal prefetch size to the relevant arraysize or size value.

10.2. Database Round-trips

A round-trip is defined as the travel of a message from python-oracledb to the database and back. Calling each python-oracledb function, or accessing each attribute, will require zero or more round-trips. For example, inserting a simple row involves sending data to the database and getting a success response back. This is a round-trip. Along with tuning an application’s architecture and tuning its SQL statements, a general performance and scalability goal is to minimize round-trips because they impact application performance and overall system scalability.

Some general tips for reducing round-trips are:

10.2.1. Finding the Number of Round-Trips

Oracle’s Automatic Workload Repository(AWR) reports show ‘SQL*Net roundtrips to/from client’ and are useful for finding the overall behavior of a system.

Sometimes you may wish to find the number of round-trips used for a specific application. Snapshots of the V$SESSTAT view taken before and after doing some work can be used for this:

Get the connection's session id

def get_session_id(connection): sql = "select sys_context('userenv','sid') from dual" result, = connection.cursor().execute(sql).fetchone() return result

Get the number of round-trips a session has made so far

def get_round_trips(systemconn, sid): sql = """select ss.value from v$sesstat ss, v$statname sn where ss.sid = :sid and ss.statistic# = sn.statistic# and sn.name like '%roundtrip%client%'""" round_trips, = systemconn.cursor().execute(sql, [sid]).fetchone() return round_trips

systemconn = oracledb.connect(user="system", password=spw, dsn=cs) connection = oracledb.connect(user=un, password=pw, dsn=cs)

sid = get_session_id(connection) round_trips_before = get_round_trips(systemconn, sid)

Do some "work"

cursor.execute("select ...") rows = cursor.fetchall()

round_trips_after = get_round_trips(systemconn, sid)

print(f"Round-trips required for query: {round_trips_after - round_trips_before}")

Note that V$SESSTAT is not accurate for pipelined database operations.

10.3. Statement Caching

Python-oracledb’s Cursor.execute() and Cursor.executemany()methods use statement caching to make re-execution of statements efficient. Statement caching lets Oracle Database cursors be used without re-parsing the statement. Statement caching also reduces metadata transfer costs between python-oracledb and the database. Performance and scalability are improved.

The python-oracledb Thick mode uses Oracle Call Interface statement caching, whereas the Thin mode uses its own implementation.

Each standalone or pooled connection has its own cache of statements with a default size of 20. The default size of the statement cache can be changed using the defaults.stmtcachesize attribute. The size can be set when creating connection pools or standalone connections. In general, set the statement cache size to the size of the working set of statements being executed by the application. To manually tune the cache, monitor the general application load and the Automatic Workload Repository (AWR) “bytes sent via SQL*Net to client” values. The latter statistic should benefit from not shipping statement metadata to python-oracledb. Adjust the statement cache size to your satisfaction. With Oracle Database 12c (or later), the Thick mode statement cache size can be automatically tuned using an oraaccess.xml file.

10.3.1. Setting the Statement Cache

The statement cache size can be set globally withdefaults.stmtcachesize:

import oracledb

oracledb.defaults.stmtcachesize = 40

The value can be overridden in an oracledb.connect() call, or when creating a pool with oracledb.create_pool(). For example:

oracledb.create_pool(user="scott", password=userpwd, dsn="dbhost.example.com/orclpb", min=2, max=5, increment=1, stmtcachesize=50)

When python-oracledb Thick mode uses Oracle Client 21 (or later), changing the cache size with ConnectionPool.reconfigure() does not immediately affect connections previously acquired and currently in use. When those connections are subsequently released to the pool and re-acquired, they will then use the new value. When the Thick mode uses Oracle Client prior to version 21, changing the pool’s statement cache size has no effect on connections that already exist in the pool but will affect new connections that are subsequently created, for example when the pool grows.

10.3.2. Tuning the Statement Cache

In general, set the statement cache to the size of the working set of statements being executed by the application. SODAinternally makes SQL calls, so tuning the cache is also beneficial for SODA applications.

In python-oracledb Thick mode with Oracle Client Libraries 12c (or later), the statement cache size can be automatically tuned with the Oracle Client Configuration oraaccess.xml file.

For manual tuning use views like V$SYSSTAT:

SELECT value FROM V$SYSSTAT WHERE name = 'parse count (total)'

Find the value before and after running application load to give the number of statement parses during the load test. Alter the statement cache size and repeat the test until you find a minimal number of parses.

If you have Automatic Workload Repository (AWR) reports you can monitor general application load and the “bytes sent via SQL*Net to client” values. The latter statistic should benefit from not shipping statement metadata to python-oracledb. Adjust the statement cache size and re-run the test to find the best cache size.

10.3.3. Disabling the Statement Cache

Statement caching can be disabled by setting the cache size to 0:

oracledb.defaults.stmtcachesize = 0

Disabling the cache may be beneficial when the quantity or order of statements causes cache entries to be flushed before they get a chance to be reused. For example if there are more distinct statements than cache slots, and the order of statement execution causes older statements to be flushed from the cache before the statements are re-executed.

Disabling the statement cache may also be helpful in test and development environments. The statement cache can become invalid if connections remain open and database schema objects are recreated. Applications can then receive errors such as ORA-3106. However, after a statement execution error is returned once to the application, python-oracledb automatically drops that statement from the cache. This lets subsequent re-executions of the statement on that connection to succeed.

When it is inconvenient to pass statement text through an application, theCursor.prepare() call can be used to avoid statement re-parsing. If the cache_statement parameter in the Cursor.prepare() method is True and the statement cache size is greater than 0, then the statements will be added to the cache, if not already present. If the cache_statementparameter in the Cursor.prepare() method is False and the statement cache size is greater than 0, then the statement will be removed from the statement cache (if present) or will not be cached (if not present). The subsequent execute() calls use the value None instead of the SQL text.

This feature can prevent a rarely executed statement from flushing a potential more frequently executed one from a full cache. For example, if a statement will only ever be executed once:

cursor.prepare("select user from dual", cache_statement = False) cursor.execute(None)

Alternatively,

sql = "select user from dual" cursor.prepare(sql, cache_statement=False) cursor.execute(sql)

Statements passed to prepare() are also stored in the statement cache.

10.4. Client Result Caching (CRC)

Python-oracledb applications can use Oracle Database’s Client Result Cache. The CRC enables client-side caching of SQL query (SELECT statement) results in client memory for immediate use when the same query is re-executed. This is useful for reducing the cost of queries for small, mostly static, lookup tables, such as for postal codes. CRC reduces network round-trips, and also reduces database server CPU usage.

The cache is at the application process level. Access and invalidation is managed by the Oracle Client libraries. This removes the need for extra application logic, or external utilities, to implement a cache.

CRC can be enabled by setting the database parameters CLIENT_RESULT_CACHE_SIZE and CLIENT_RESULT_CACHE_LAG, and then restarting the database, for example:

SQL> ALTER SYSTEM SET CLIENT_RESULT_CACHE_LAG = 3000 SCOPE=SPFILE; SQL> ALTER SYSTEM SET CLIENT_RESULT_CACHE_SIZE = 64K SCOPE=SPFILE; SQL> STARTUP FORCE

CRC can alternatively be configured in an oraaccess.xmlor sqlnet.ora file on the Python host, see Client Configuration Parameters.

Tables can then be created, or altered, so repeated queries use CRC. This allows existing applications to use CRC without needing modification. For example:

SQL> CREATE TABLE cities (id number, name varchar2(40)) RESULT_CACHE (MODE FORCE); SQL> ALTER TABLE locations RESULT_CACHE (MODE FORCE);

Alternatively, hints can be used in SQL statements. For example:

SELECT /*+ result_cache */ postal_code FROM locations