IBM DB2 Batch Source - CDAP Documentation (original) (raw)

Results will update as you type.

You‘re viewing this with anonymous access, so some content might be blocked.

The IBM DB2 Batch Source plugin is available in the Hub.

Plugin version: 1.7.0

Reads from a DB2 using a configurable SQL query. Outputs one record for each row returned by the query. For example, you may want to create daily snapshots of a database table by using this source and writing to a partitioned table on BigQuery.

Configuration

Property Macro Enabled? Version Introduced Description
Property Macro Enabled? Version Introduced Description
Reference Name No Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.
Driver Name No Required. Name of the JDBC driver to use.Default is db211.
Host Yes Required. Host that DB2 is running on.Default is localhost.
Port Yes Required. Port that DB2 is listening to.Default is 50000.
Database Yes Required. DB2 database name.
Import Query Yes Required. The SELECT query to use to import data from the specified table. You can specify an arbitrary number of columns to import, or import all columns using *. The Query should contain the ‘$CONDITIONS’ string. For example, ‘SELECT * FROM table WHERE CONDITIONS’.The‘CONDITIONS’. The ‘CONDITIONS’.TheCONDITIONS’ string will be replaced by Split-by Field Name field limits specified by the bounding query. The ‘$CONDITIONS’ string is not required if Number of Splits to Generate is set to 1.
Bounding Query Yes Required. Bounding Query should return the minimum and maximum of the values of the Split-by Field Name field. For example, SELECT MIN(id),MAX(id) FROM table. Not required if Number of Splits to Generate is set to 1.
Split-By Field Name Yes Field Name which will be used to generate splits. Not required if Number of Splits to Generate is set to one.
Number of Splits to Generate Yes Number of splits to generate.Default is 1.
Fetch Size Yes 6.6.0/1.7.0 Optional. The number of rows to fetch at a time per split. Larger Fetch Size can result in faster import with the trade-off of higher memory usage.Default is 1000.
Username Yes Optional. User identity for connecting to the specified database.
Password Yes Optional. Password to use to connect to the specified database.
Connection Arguments Yes Optional. A list of arbitrary string key/value pairs as connection arguments. These arguments will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.

Data Type Mappings

DB2 Data Type CDAP Schema Data Type
DB2 Data Type CDAP Schema Data Type
SMALLINT int
INTEGER int
BIGINT long
DECIMAL(p,s) or NUMERIC(p,s) decimal
DECFLOAT string
REAL float
DOUBLE double
CHAR string
VARCHAR string
CHAR(n) FOR BIT DATA bytes
VARCHAR(n) FOR BIT DATA bytes
BINARY bytes
VARBINARY bytes
GRAPHIC string
VARGRAPHIC string
CLOB string
BLOB bytes
DCLOB string
DATE date
TIME time
TIMESTAMP timestamp

Example

Suppose you want to read data from DB2 database named “prod” that is running on “localhost”, port 50000, as “sa” user with “Test11” password (Ensure that the driver for DB2 is installed. You can also provide driver name for some specific driver, otherwise “db211” will be used), then configure plugin with:

Property Value
Reference Name src1
Driver Name db211
Host localhost
Port 50000
Database prod
Import Query "select id, name, email, phone from users;"
Number of Splits 1
Username sa
Password Test11

For example, if the ‘id’ column is a primary key of type int and the other columns are non-nullable varchars, output records will have this schema:

Field Name Type
id int
name string
email string
phone string