convert numeric column to dedicated pd.StringDtype()
· Issue #31204 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
pd.Series(range(5, 10), dtype="Int64").astype("string")
raises TypeError: data type not understood
while
pd.Series(range(5, 10)).astype("string")
raises ValueError: StringArray requires a sequence of strings or missing values.
If you first do astype(str)
:
pd.Series(range(5, 10)).astype(str).astype("string")
and
pd.Series(range(5, 10), dtype="Int64").astype(str).astype("string")
work as expected:
0 5
1 6
2 7
3 8
4 9
dtype: string
While astype(object)
raises in both cases ValueError: StringArray requires a sequence of strings or missing values.
Problem description
I can understand the ValueError
, since you don't feed strings to the StringArray
. Best for me would be if the astype("string")
converts it to strings, or if the astype(str)
would return a StringArray
, but in any case, I would expect both pd.Series(range(5, 10), dtype="Int64").astype("string")
and pd.Series(range(5, 10)).astype("string")
to raise the same error.
Expected Output
0 5
1 6
2 7
3 8
4 9
dtype: string
or
ValueError: StringArray requires a sequence of strings or missing values.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]