convert numeric column to dedicated pd.StringDtype() · Issue #31204 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

pd.Series(range(5, 10), dtype="Int64").astype("string")

raises TypeError: data type not understood

while

pd.Series(range(5, 10)).astype("string")

raises ValueError: StringArray requires a sequence of strings or missing values.

If you first do astype(str):

pd.Series(range(5, 10)).astype(str).astype("string")

and

pd.Series(range(5, 10), dtype="Int64").astype(str).astype("string")

work as expected:

0    5
1    6
2    7
3    8
4    9
dtype: string

While astype(object) raises in both cases ValueError: StringArray requires a sequence of strings or missing values.

Problem description

I can understand the ValueError, since you don't feed strings to the StringArray. Best for me would be if the astype("string") converts it to strings, or if the astype(str) would return a StringArray, but in any case, I would expect both pd.Series(range(5, 10), dtype="Int64").astype("string") and pd.Series(range(5, 10)).astype("string") to raise the same error.

Expected Output

0    5
1    6
2    7
3    8
4    9
dtype: string

or

ValueError: StringArray requires a sequence of strings or missing values.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]