dask.dataframe.read_hdf — Dask documentation (original) (raw)
dask.dataframe.read_hdf#
dask.dataframe.read_hdf(pattern, key, start=0, stop=None, columns=None, chunksize=1000000, sorted_index=False, lock=True, mode='r')[source]#
Read HDF files into a Dask DataFrame
Read hdf files into a dask dataframe. This function is likepandas.read_hdf, except it can read from a single large file, or from multiple files, or from multiple keys from the same file.
Parameters:
patternstring, pathlib.Path, list
File pattern (string), pathlib.Path, buffer to read from, or list of file paths. Can contain wildcards.
keygroup identifier in the store. Can contain wildcards
startoptional, integer (defaults to 0), row number to start at
stopoptional, integer (defaults to None, the last row), row number to
stop at
columnslist of columns, optional
A list of columns that if not None, will limit the return columns (default is None)
chunksizepositive integer, optional
Maximal number of rows per partition (default is 1000000).
sorted_indexboolean, optional
Option to specify whether or not the input hdf files have a sorted index (default is False).
lockboolean, optional
Option to use a lock to prevent concurrency issues (default is True).
mode{‘a’, ‘r’, ‘r+’}, default ‘r’. Mode to use when opening file(s).
‘r’
Read-only; no data can be modified.
‘a’
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
‘r+’
It is similar to ‘a’, but the file must already exist.
Returns:
dask.DataFrame
Examples
Load single file
dd.read_hdf('myfile.1.hdf5', '/x')
Load multiple files
dd.read_hdf('myfile.*.hdf5', '/x')
dd.read_hdf(['myfile.1.hdf5', 'myfile.2.hdf5'], '/x')
Load multiple datasets
dd.read_hdf('myfile.1.hdf5', '/*')