ENH: support defaultdict in read_csv dtype parameter · Issue #41574 · pandas-dev/pandas (original) (raw)

I have a large csv file with 15k columns with non-obvious names, of which 14990 are floating point numbers. I'd like to load them as floats without read_csv having to spend the time divining their types.

dtype allows providing a dict, but making one with all the column names is tedious and not always possible. The obvious solution is to provide a defaultdict, with a default of np.float32, and including entries for the other column types. Unfortunately currently, the default is silently ignored by read_csv. Presumably read_csv is not directly querying the dictionary, but rather checking first whether an item is there.

If this is not possible, it would be helpful to include a warning to the user, or at least some mention in the documentation, that defaultdict is not supported. It took me a long time to figure out why my floats weren't being treated as float32 and why read_csv was still trying to determine the types of these columns.