DOC: Add info on dtype strings · Issue #30590 · pandas-dev/pandas (original) (raw)
Problem description
I've been studying the new string
, boolean
and Intxx
dtypes and think it would be worthwhile to add something about the strings that you are allowed to use with extension arrays in specifying the dtypes. It could be an additional column in the dtypes table here:
https://dev.pandas.io/docs/getting_started/basics.html#dtypes
I think the following table is correct:
Data Type | Array | Possible Strings |
---|---|---|
DatetimeTZDtype | DatetimeArray | 'datetime64[ns, ]' |
CategoricalDtype | Categorical | 'category' |
PeriodDtype | PeriodArray | 'period[]' or 'Period[]' |
SparseDtype | SparseArray | 'Sparse', 'Sparse[int]', 'Sparse[int32, 0]', 'Sparse[int64, 0]', 'Sparse[float64, nan]', 'Sparse[float32, nan]' |
IntervalDtype | IntervalArray | 'interval', 'Interval', 'Interval[<np.numeric>]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]' |
Int64Dtype (and others) | IntegerArray | 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64' |
StringDtype | StringArray | 'string' |
BooleanDtype | BooleanArray | 'boolean' |
I also think we may want to make it clear that if you specify a string not in that table, it needs to be a string acceptable as a numpy
dtype.
If people like @TomAugspurger and @jorisvandenbossche think this is useful, I'll add a column to that table in the docs (or maybe have to use a separate table because of the length of the last column above).
Also, should we consider allowing 'Boolean'
and 'String'
and 'Category'
, i.e. type names with a leading capital letter? We're inconsistent in terms of what case is allowed in different places for the strings representing dtypes (see period/Period
and interval/Interval
)