BUG/TST: Fix infer_dtype for Period array-likes and general ExtensionArrays by jorisvandenbossche · Pull Request #37367 · pandas-dev/pandas (original) (raw)

So I did a rough search / inventory of the different use cases internally of infer_dtypes. The main groups I see:

So from that, I think that it will actually be good to change that infer_dtype(EA) never raises an error (as it does now on master for unknown array types).

The question is then which value? Infer by converting to object dtype numpy array, return an existing value "mixed", or return a new value like "unknown-array" ? Or let the EA dtype register something?

Given the potential expensive nature of coercing to object dtype, that might be something to avoid.
Given that "mixed" is already being used and has some use cases (eg the validation for the str accessor), it might be better to not re-use that.

So two ideas:

  1. if we get an EA, can just return values.dtype.name or something like that
  2. part of the register_dtype process could add stuff to _TYPE_MAP (may just be a more complicated version of 1?)

If we let the EA control this, I think it only can make sense if they return one of the existing categories? (what would we otherwise ever do with it, except ignoring it?)
So that would rule out the first option, I think?

Long term, it might be useful to let the dtype register its "inferred_dtype", but then I think we should first have a better idea of some specific use cases for which this would be used/useful (currently, many use case I checked was to do some dtype inference when not yet having an array like to start with).

So on the short term, maybe we can use the "unknown-array" return value? That would also not be used in practice, so it would mean it is basically ignored, but then at least without raising an error.