ENH Enable streaming from S3 by stephen-hoover · Pull Request #11073 · pandas-dev/pandas (original) (raw)
File reading from AWS S3: Modify the get_filepath_or_buffer
function such that it only opens the connection to S3, rather than reading the entire file at once. This allows partial reads (e.g. through the nrows
argument) or chunked reading (e.g. through the chunksize
argument) without needing to download the entire file first.
I wasn't sure what the best place was to put the OnceThroughKey
. (Suggestions for better names welcome.) I don't like putting an entire class inside a function like that, but this keeps the boto
dependency contained.
The readline
function, and modifying next
such that it returns lines, was necessary to allow the Python engine to read uncompressed CSVs.
The Python 2 standard library's gzip
module needs a seek
and tell
function on its inputs, so I reverted to the old behavior there.
Partially addresses #11070 .