ENH: support defaultdict in read_csv dtype parameter · Issue #41574 · pandas-dev/pandas (original) (raw)
I have a large csv file with 15k columns with non-obvious names, of which 14990 are floating point numbers. I'd like to load them as floats without read_csv
having to spend the time divining their types.
dtype
allows providing a dict
, but making one with all the column names is tedious and not always possible. The obvious solution is to provide a defaultdict
, with a default of np.float32
, and including entries for the other column types. Unfortunately currently, the default is silently ignored by read_csv
. Presumably read_csv
is not directly querying the dictionary, but rather checking first whether an item is there.
If this is not possible, it would be helpful to include a warning to the user, or at least some mention in the documentation, that defaultdict is not supported. It took me a long time to figure out why my floats weren't being treated as float32 and why read_csv was still trying to determine the types of these columns.