dask.dataframe.read_sql_table — Dask documentation (original) (raw)

dask.dataframe.read_sql_table¶

dask.dataframe.read_sql_table(table_name, con, index_col, divisions=None, npartitions=None, limits=None, columns=None, bytes_per_chunk='256 MiB', head_rows=5, schema=None, meta=None, engine_kwargs=None, **kwargs)[source]¶

Read SQL database table into a DataFrame.

If neither divisions or npartitions is given, the memory footprint of the first few rows will be determined, and partitions of size ~256MB will be used.

Parameters

table_namestr

Name of SQL table in database.

constr

Full sqlalchemy URI for the database connection

index_colstr

Column which becomes the index, and defines the partitioning. Should be a indexed column in the SQL server, and any orderable type. If the type is number or time, then partition boundaries can be inferred fromnpartitions or bytes_per_chunk; otherwise must supply explicitdivisions.

columnssequence of str or SqlAlchemy column or None

Which columns to select; if None, gets all. Note can be a mix of str and SqlAlchemy columns

schemastr or None

Pass this to sqlalchemy to select which DB schema to use within the URI connection

divisions: sequence

Values of the index column to split the table by. If given, this will override npartitions and bytes_per_chunk. The divisions are the value boundaries of the index column used to define the partitions. For example, divisions=list('acegikmoqsuwz') could be used to partition a string column lexographically into 12 partitions, with the implicit assumption that each partition contains similar numbers of records.

npartitionsint

Number of partitions, if divisions is not given. Will split the values of the index column linearly between limits, if given, or the column max/min. The index column must be numeric or time for this to work

limits: 2-tuple or None

Manually give upper and lower range of values for use with npartitions; if None, first fetches max/min from the DB. Upper limit, if given, is inclusive.

bytes_per_chunkstr or int

If both divisions and npartitions is None, this is the target size of each partition, in bytes

head_rowsint

How many rows to load for inferring the data-types, and memory per row

metaempty DataFrame or None

If provided, do not attempt to infer dtypes, but use these, coercing all chunks on load

engine_kwargsdict or None

Specific db engine parameters for sqlalchemy

kwargsdict

Additional parameters to pass to pd.read_sql()

Returns

dask.dataframe

Examples

df = dd.read_sql_table('accounts', 'sqlite:///path/to/bank.db', ... npartitions=10, index_col='id')