dask.dataframe.read_sql_table — Dask documentation (original) (raw)
dask.dataframe.read_sql_table¶
dask.dataframe.read_sql_table(table_name, con, index_col, divisions=None, npartitions=None, limits=None, columns=None, bytes_per_chunk='256 MiB', head_rows=5, schema=None, meta=None, engine_kwargs=None, **kwargs)[source]¶
Read SQL database table into a DataFrame.
If neither divisions
or npartitions
is given, the memory footprint of the first few rows will be determined, and partitions of size ~256MB will be used.
Parameters
table_namestr
Name of SQL table in database.
constr
Full sqlalchemy URI for the database connection
index_colstr
Column which becomes the index, and defines the partitioning. Should be a indexed column in the SQL server, and any orderable type. If the type is number or time, then partition boundaries can be inferred fromnpartitions
or bytes_per_chunk
; otherwise must supply explicitdivisions
.
columnssequence of str or SqlAlchemy column or None
Which columns to select; if None, gets all. Note can be a mix of str and SqlAlchemy columns
schemastr or None
Pass this to sqlalchemy to select which DB schema to use within the URI connection
divisions: sequence
Values of the index column to split the table by. If given, this will override npartitions
and bytes_per_chunk
. The divisions are the value boundaries of the index column used to define the partitions. For example, divisions=list('acegikmoqsuwz')
could be used to partition a string column lexographically into 12 partitions, with the implicit assumption that each partition contains similar numbers of records.
npartitionsint
Number of partitions, if divisions
is not given. Will split the values of the index column linearly between limits
, if given, or the column max/min. The index column must be numeric or time for this to work
limits: 2-tuple or None
Manually give upper and lower range of values for use with npartitions
; if None, first fetches max/min from the DB. Upper limit, if given, is inclusive.
bytes_per_chunkstr or int
If both divisions
and npartitions
is None, this is the target size of each partition, in bytes
head_rowsint
How many rows to load for inferring the data-types, and memory per row
metaempty DataFrame or None
If provided, do not attempt to infer dtypes, but use these, coercing all chunks on load
engine_kwargsdict or None
Specific db engine parameters for sqlalchemy
kwargsdict
Additional parameters to pass to pd.read_sql()
Returns
dask.dataframe
Examples
df = dd.read_sql_table('accounts', 'sqlite:///path/to/bank.db', ... npartitions=10, index_col='id')