Debezium connector for PostgreSQL :: Debezium Documentation (original) (raw)

Enumerates a comma-separated list of the symbolic names of the custom converter instances that the connector can use. For example,

isbn

You must set the converters property to enable the connector to use a custom converter.

For each converter that you configure for a connector, you must also add a .type property, which specifies the fully-qualified name of the class that implements the converter interface. The .type property uses the following format:

_<converterSymbolicName>_.type

For example,

isbn.type: io.debezium.test.IsbnConverter

If you want to further control the behavior of a configured converter, you can add one or more configuration parameters to pass values to the converter. To associate any additional configuration parameter with a converter, prefix the parameter names with the symbolic name of the converter.
For example,

isbn.schema.name: io.debezium.postgresql.type.Isbn

Specifies the transaction isolation level and the type of locking, if any, that the connector applies when it reads data during an initial snapshot or ad hoc blocking snapshot.

Each isolation level strikes a different balance between optimizing concurrency and performance on the one hand, and maximizing data consistency and accuracy on the other. Snapshots that use stricter isolation levels result in higher quality, more consistent data, but the cost of the improvement is decreased performance due to longer lock times and fewer concurrent transactions. Less restrictive isolation levels can increase efficiency, but at the expense of inconsistent data. For more information about transaction isolation levels in PostgreSQL, see the PostgreSQL documentation.

Specify one of the following isolation levels:

serializable

The default, and most restrictive isolation level. This option prevents serialization anomalies and provides the highest degree of data integrity.

To ensure the data consistency of captured tables, a snapshot runs in a transaction that uses a repeatable read isolation level, blocking concurrent DDL changes on the tables, and locking the database to index creation. When this option is set, users or administrators cannot perform certain operations, such as creating a table index, until the snapshot concludes. The entire range of table keys remains locked until the snapshot completes. This option matches the snapshot behavior that was available in the connector before the introduction of this property.

repeatable_read

Prevents other transactions from updating table rows during the snapshot. New records captured by the snapshot can appear twice; first, as part of the initial snapshot, and then again in the streaming phase. However, this level of consistency is tolerable for database mirroring. Ensures data consistency between the tables being scanned and blocking DDL on the selected tables, and concurrent index creation throughout the database. Allows for serialization anomalies.

read_committed

In PostgreSQL, there is no difference between the behavior of the Read Uncommitted and Read Committed isolation modes. As a result, for this property, the read_committed option effectively provides the least restrictive level of isolation. Setting this option sacrifices some consistency for initial and ad hoc blocking snapshots, but provides better database performance for other users during the snapshot.

In general, this transaction consistency level is appropriate for data mirroring. Other transactions cannot update table rows during the snapshot. However, minor data inconsistencies can occur when a record is added during the initial snapshot, and the connector later recaptures the record after the streaming phase begins.

read_uncommitted

Nominally, this option offers the least restrictive level of isolation. However, as explained in the description for the read-committed option, for the Debezium PostgreSQL connector, this option provides the same level of isolation as the read_committed option.

Specifies the criteria for performing a snapshot when the connector starts:

always

The connector performs a snapshot every time that it starts. The snapshot includes the structure and data of the captured tables. Specify this value to populate topics with a complete representation of the data from the captured tables every time that the connector starts. After the snapshot completes, the connector begins to stream event records for subsequent database changes.

initial

The connector performs a snapshot only when no offsets have been recorded for the logical server name.

initial_only

The connector performs an initial snapshot and then stops, without processing any subsequent changes.

no_data

The connector never performs snapshots. When a connector is configured this way, after it starts, it behaves as follows:

If there is a previously stored LSN in the Kafka offsets topic, the connector continues streaming changes from that position. If no LSN is stored, the connector starts streaming changes from the point in time when the PostgreSQL logical replication slot was created on the server. Use this snapshot mode only when you know all data of interest is still reflected in the WAL.

never

Deprecated see no_data.

when_needed

After the connector starts, it performs a snapshot only if it detects one of the following circumstances:

It cannot detect any topic offsets.
A previously recorded offset specifies a log position that is not available on the server.

configuration_based

With this option, you control snapshot behavior through a set of connector properties that have the prefix 'snapshot.mode.configuration.based'.

custom

The connector performs a snapshot according to the implementation specified by the snapshot.mode.custom.name property, which defines a custom implementation of the io.debezium.spi.snapshot.Snapshotter interface.

For more information, see the table of snapshot.mode options.

If the snapshot.mode is set to configuration_based, set this property to specify whether the connector includes table data when it performs a snapshot.

If the snapshot.mode is set to configuration_based, set this property to specify whether the connector includes the table schema when it performs a snapshot.

If the snapshot.mode is set to configuration_based, set this property to specify whether the connector begins to stream change events after a snapshot completes.

If the snapshot.mode is set to configuration_based, set this property to specify whether the connector includes table schema in a snapshot if the schema history topic is not available.

If the snapshot.mode is set to configuration_based, this property specifies whether the connector attempts to snapshot table data if it does not find the last committed offset in the transaction log.
Set the value to true to instruct the connector to perform a new snapshot.

When snapshot.mode is set as custom, use this setting to specify the name of the custom implementation provided in the name() method that is defined by the 'io.debezium.spi.snapshot.Snapshotter' interface. The provided implementation is called after a connector restart to determine whether to perform a snapshot. For more information, see custom snapshotter SPI.

Specifies how the connector holds locks on tables while performing a schema snapshot.
Set one of the following options:

shared

The connector holds a table lock that prevents exclusive table access during the initial portion phase of the snapshot in which database schemas and other metadata are read. After the initial phase, the snapshot no longer requires table locks.

none

The connector avoids locks entirely.

| | Do not use this mode if schema changes might occur during the snapshot. | | -------------------------------------------------------------------------- |

custom

The connector performs a snapshot according to the implementation specified by the snapshot.locking.mode.custom.name property, which is a custom implementation of the io.debezium.spi.snapshot.SnapshotLock interface.

When snapshot.locking.mode is set to custom, use this setting to specify the name of the custom implementation provided in the name() method that is defined by the 'io.debezium.spi.snapshot.SnapshotLock' interface. For more information, see custom snapshotter SPI.

Specifies how the connector queries data while performing a snapshot.
Set one of the following options:

select_all

The connector performs a select all query by default, optionally adjusting the columns selected based on the column include and exclude list configurations.

custom

The connector performs a snapshot query according to the implementation specified by the snapshot.query.mode.custom.name property, which defines a custom implementation of the io.debezium.spi.snapshot.SnapshotQuery interface.

This setting enables you to manage snapshot content in a more flexible manner compared to using the snapshot.select.statement.overrides property.

When snapshot.query.mode is set as custom, use this setting to specify the name of the custom implementation provided in the name() method that is defined by the 'io.debezium.spi.snapshot.SnapshotQuery' interface. For more information, see custom snapshotter SPI.

All tables specified in table.include.list

An optional, comma-separated list of regular expressions that match the fully-qualified names (_<schemaName>.<tableName>_) of the tables to include in a snapshot. The specified items must be named in the connector’s table.include.list property. This property takes effect only if the connector’s snapshot.mode property is set to a value other than never.
This property does not affect the behavior of incremental snapshots.

To match the name of a table, Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the table; it does not match substrings that might be present in a table name.

Positive integer value that specifies the maximum amount of time (in milliseconds) to wait to obtain table locks when performing a snapshot. If the connector cannot acquire table locks in this time interval, the snapshot fails. How the connector performs snapshots provides details.

Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.

The property contains a comma-separated list of fully-qualified table names in the form _<schemaName>.<tableName>_. For example,

"snapshot.select.statement.overrides": "inventory.products,customers.orders"

For each table in the list, add a further configuration property that specifies the SELECT statement for the connector to run on the table when it takes a snapshot. The specified SELECT statement determines the subset of table rows to include in the snapshot. Use the following format to specify the name of this SELECT statement property:

snapshot.select.statement.overrides._<schemaName>_._<tableName>_. For example,snapshot.select.statement.overrides.customers.orders.

Example:

From a customers.orders table that includes the soft-delete column, delete_flag, add the following properties if you want a snapshot to include only those records that are not soft-deleted:

"snapshot.select.statement.overrides": "customer.orders", "snapshot.select.statement.overrides.customer.orders": "SELECT * FROM customers.orders WHERE delete_flag = 0 ORDER BY id DESC"

In the resulting snapshot, the connector includes only the records for which delete_flag = 0.

Specifies how the connector should react to exceptions during processing of events:

fail propagates the exception, indicates the offset of the problematic event, and causes the connector to stop.

warn logs the offset of the problematic event, skips that event, and continues processing.

skip skips the problematic event and continues processing.

Positive integer value that specifies the maximum size of each batch of events that the connector processes.

Positive integer value that specifies the maximum number of records that the blocking queue can hold. When Debezium reads events streamed from the database, it places the events in the blocking queue before it writes them to Kafka. The blocking queue can provide backpressure for reading change events from the database in cases where the connector ingests messages faster than it can write them to Kafka, or when Kafka becomes unavailable. Events that are held in the queue are disregarded when the connector periodically records offsets. Always set the value of max.queue.size to be larger than the value of max.batch.size.

A long integer value that specifies the maximum volume of the blocking queue in bytes. By default, volume limits are not specified for the blocking queue. To specify the number of bytes that the queue can consume, set this property to a positive long value.
If max.queue.size is also set, writing to the queue is blocked when the size of the queue reaches the limit specified by either property. For example, if you set max.queue.size=1000, and max.queue.size.in.bytes=5000, writing to the queue is blocked after the queue contains 1000 records, or after the volume of the records in the queue reaches 5000 bytes.

Positive integer value that specifies the number of milliseconds the connector should wait for new change events to appear before it starts processing a batch of events. Defaults to 500 milliseconds.

Specifies connector behavior when the connector encounters a field whose data type is unknown. The default behavior is that the connector omits the field from the change event and logs a warning.

Set this property to true if you want the change event to contain an opaque binary representation of the field. This lets consumers decode the field. You can control the exact representation by setting the binary handling mode property.

| | Consumers risk backward compatibility issues when include.unknown.datatypes is set to true. Not only may the database-specific binary representation change between releases, but if the data type is eventually supported by Debezium, the data type will be sent downstream in a logical type, which would require adjustments by consumers. In general, when encountering unsupported data types, create a feature request so that support can be added. | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

A semicolon separated list of SQL statements that the connector executes when it establishes a JDBC connection to the database. To use a semicolon as a character and not as a delimiter, specify two consecutive semicolons, ;;.

The connector may establish JDBC connections at its own discretion. Consequently, this property is useful for configuration of session parameters only, and not for executing DML statements.

The connector does not execute these statements when it creates a connection for reading the transaction log.

Frequency for sending replication connection status updates to the server, given in milliseconds.
The property also controls how frequently the database status is checked to detect a dead connection in case the database was shut down.

Controls how frequently the connector sends heartbeat messages to a Kafka topic. The default behavior is that the connector does not send heartbeat messages.

Heartbeat messages are useful for monitoring whether the connector is receiving change events from the database. Heartbeat messages might help decrease the number of change events that need to be re-sent when a connector restarts. To send heartbeat messages, set this property to a positive integer, which indicates the number of milliseconds between heartbeat messages.

Heartbeat messages are needed when there are many updates in a database that is being tracked but only a tiny number of updates are related to the table(s) and schema(s) for which the connector is capturing changes. In this situation, the connector reads from the database transaction log as usual but rarely emits change records to Kafka. This means that no offset updates are committed to Kafka and the connector does not have an opportunity to send the latest retrieved LSN to the database. The database retains WAL files that contain events that have already been processed by the connector. Sending heartbeat messages enables the connector to send the latest retrieved LSN to the database, which allows the database to reclaim disk space being used by no longer needed WAL files.

Specifies a query that the connector executes on the source database when the connector sends a heartbeat message.

This is useful for resolving the situation described in WAL disk space consumption, where capturing changes from a low-traffic database on the same host as a high-traffic database prevents Debezium from processing WAL records and thus acknowledging WAL positions with the database. To address this situation, create a heartbeat table in the low-traffic database, and set this property to a statement that inserts records into that table, for example:

INSERT INTO test_heartbeat_table (text) VALUES ('test_heartbeat')

This allows the connector to receive changes from the low-traffic database and acknowledge their LSNs, which prevents unbounded WAL growth on the database host.

Specify the conditions that trigger a refresh of the in-memory schema for a table.

columns_diff is the safest mode. It ensures that the in-memory schema stays in sync with the database table’s schema at all times.

columns_diff_exclude_unchanged_toast instructs the connector to refresh the in-memory schema cache if there is a discrepancy with the schema derived from the incoming message, unless unchanged TOASTable data fully accounts for the discrepancy.

This setting can significantly improve connector performance if there are frequently-updated tables that have TOASTed data that are rarely part of updates. However, it is possible for the in-memory schema to become outdated if TOASTable columns are dropped from the table.

An interval in milliseconds that the connector should wait before performing a snapshot when the connector starts. If you are starting multiple connectors in a cluster, this property is useful for avoiding snapshot interruptions, which might cause re-balancing of connectors.

Specifies the time, in milliseconds, that the connector delays the start of the streaming process after it completes a snapshot. Setting a delay interval helps to prevent the connector from restarting snapshots in the event that a failure occurs immediately after the snapshot completes, but before the streaming process begins. Set a delay value that is higher than the value of the offset.flush.interval.ms property that is set for the Kafka Connect worker.

During a snapshot, the connector reads table content in batches of rows. This property specifies the maximum number of rows in a batch.

Semicolon separated list of parameters to pass to the configured logical decoding plug-in. For example, add-tables=public.table,public.table2;include-lsn=true.

If connecting to a replication slot fails, this is the maximum number of consecutive attempts to connect.

The number of milliseconds to wait between retry attempts when the connector fails to connect to a replication slot.

__debezium_unavailable_value

Specifies the constant that the connector provides to indicate that the original value is a toasted value that is not provided by the database. If the setting of unavailable.value.placeholder starts with the hex: prefix it is expected that the rest of the string represents hexadecimally encoded octets. For more information, see toasted values.

Determines whether the connector generates events with transaction boundaries and enriches change event envelopes with transaction metadata. Specify true if you want the connector to do this. For more information, see Transaction metadata.

Determines whether the connector should commit the LSN of the processed records in the source PostgreSQL database so that the WAL logs can be deleted. Specify false if you do not want the connector to commit the LSN of processed records.

| | If you set the value of this property to false, Debezium does not acknowledge the LSN. Failure to acknowledge the LSN can lead to uncontrolled growth of the WAL logs, which stresses storage capacity, and could result degraded performance, and even data loss. To maintain normal service, if you set this property to false, you must configure some other mechanism to commit the LSN. | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

The number of milliseconds to wait before restarting a connector after a retriable error occurs.

A comma-separated list of the operation types that you want the connector to skip during streaming. You can configure the connector to skip the following types of operations:

c (insert/create)
u (update)
d (delete)
t (truncate)

Set the value to none if you do not want the connector to skip any operations.

Fully-qualified name of the data collection that is used to send signals to the connector.
Use the following format to specify the collection name:
_<schemaName>_._<tableName>_

List of the signaling channel names that are enabled for the connector. By default, the following channels are available:

source
kafka
file
jmxOptionally, you can also implement a custom signaling channel.

List of notification channel names that are enabled for the connector. By default, the following channels are available:

sink
log
jmxOptionally, you can also implement a custom notification channel.

The maximum number of rows that the connector fetches and reads into memory during an incremental snapshot chunk. Increasing the chunk size provides greater efficiency, because the snapshot runs fewer snapshot queries of a greater size. However, larger chunk sizes also require more memory to buffer the snapshot data. Adjust the chunk size to a value that provides the best performance in your environment.

Specifies the watermarking mechanism that the connector uses during an incremental snapshot to deduplicate events that might be captured by an incremental snapshot and then recaptured after streaming resumes.
You can specify one of the following options:

insert_insert

When you send a signal to initiate an incremental snapshot, for every chunk that Debezium reads during the snapshot, it writes an entry to the signaling data collection to record the signal to open the snapshot window. After the snapshot completes, Debezium inserts a second entry to record the closing of the window.

insert_delete

When you send a signal to initiate an incremental snapshot, for every chunk that Debezium reads, it writes a single entry to the signaling data collection to record the signal to open the snapshot window. After the snapshot completes, this entry is removed. No entry is created for the signal to close the snapshot window. Set this option to prevent rapid growth of the signaling data collection.

Specifies whether a connector writes watermarks to the signal data collection to track the progress of an incremental snapshot. Set the value to true to enable a connector that has a read-only connection to the database to use an incremental snapshot watermarking strategy that does not require writing to the signal data collection.

How often, in milliseconds, the XMIN will be read from the replication slot. The XMIN value provides the lower bounds of where a new replication slot could start from. The default value of 0 disables tracking XMIN tracking.

io.debezium.schema.SchemaTopicNamingStrategy

The name of the TopicNamingStrategy class that should be used to determine the topic name for data change, schema change, transaction, heartbeat event etc., defaults to SchemaTopicNamingStrategy.

Specify the delimiter for topic name, defaults to ..

The size used for holding the topic names in bounded concurrent hash map. This cache will help to determine the topic name corresponding to a given data collection.

Controls the name of the topic to which the connector sends heartbeat messages. The topic name has this pattern:

topic.heartbeat.prefix.topic.prefix

For example, if the topic prefix is fulfillment, the default topic name is __debezium-heartbeat.fulfillment.

Controls the name of the topic to which the connector sends transaction metadata messages. The topic name has this pattern:

topic.prefix.topic.transaction

For example, if the topic prefix is fulfillment, the default topic name is fulfillment.transaction.

Specifies the number of threads that the connector uses when performing an initial snapshot. To enable parallel initial snapshots, set the property to a value greater than 1. In a parallel initial snapshot, the connector processes multiple tables concurrently.

| | When you enable parallel initial snapshots, the threads that perform each table snapshot can require varying times to complete their work. If a snapshot for one table requires significantly more time to complete than the snapshots for other tables, threads that have completed their work sit idle. In some environments, a network device such as a load balancer or firewall, terminates connections that remain idle for an extended interval. After the snapshot completes, the connector is unable to close the connection, resulting in an exception, and an incomplete snapshot, even in cases where the connector successfully transmitted all snapshot data.If you experience this problem, revert the value of snapshot.max.threads to 1, and retry the snapshot. | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |

Defines tags that customize MBean object names by adding metadata that provides contextual information. Specify a comma-separated list of key-value pairs. Each key represents a tag for the MBean object name, and the corresponding value represents a value for the key, for example,
k1=v1,k2=v2

The connector appends the specified tags to the base MBean object name. Tags can help you to organize and categorize metrics data. You can define tags to identify particular application instances, environments, regions, versions, and so forth. For more information, see Customized MBean names.

Specifies how the connector responds after an operation that results in a retriable error, such as a connection error.
Set one of the following options:

-1

No limit. The connector always restarts automatically, and retries the operation, regardless of the number of previous failures.

0

Disabled. The connector fails immediately, and never retries the operation. User intervention is required to restart the connector.

> 0

The connector restarts automatically until it reaches the specified maximum number of retries. After the next failure, the connector stops, and user intervention is required to restart it.

Specifies the time, in milliseconds, that the connector waits for a query to complete. Set the value to 0 (zero) to remove the timeout limit.