Uniform file IO API and consolidated codebase · Issue #15008 · pandas-dev/pandas (original) (raw)
There are at least three things that many of the IO methods must deal with: reading from URL, reading/writing to a compressed format, and different text encodings. It would be great if all io functions where these factors were relevant could use the same code (consolidated codebase) and expose the same options (uniform API).
In #14576, we consolidated the codebase but more consolidation is possible. In io.common.py, there are three functions that must be sequentially called to get a file-like object: get_filepath_or_buffer
, _infer_compression
, and _get_handle
. This should be consolidated into a single function, which can then delegate to sub functions.
Currently, pandas supports the following io methods. First for reading:
- read_csv
- read_excel
- read_hdf
- read_feather
- read_sql
- read_json
- read_msgpack (experimental)
- read_html
- read_gbq (experimental)
- read_stata
- read_sas
- read_clipboard
- read_pickle
And then for writing:
- to_csv
- to_excel
- to_hdf
- to_feather
- to_sql
- to_json
- to_msgpack (experimental)
- to_html
- to_gbq (experimental)
- to_stata
- to_clipboard
- to_pickle
Some of these should definitely use the consilidated/uniform API, such as read_csv
, read_html
, read_pickle
, read_excel
.
Some functions perhaps should be kept separate, such as read_feather
or read_clipboard
.