BUG: series.astype(str) vs Index.astype(str) inconsistency · Issue #45326 · pandas-dev/pandas (original) (raw)

@jbrockmendel

ser = pd.Series([b'foo'], dtype=object) idx = pd.Index(ser)

In [3]: ser.astype(str)[0] Out[3]: "b'foo'"

In [4]: idx.astype(str)[0] Out[4]: 'foo'

Series goes through astype_nansafe and from there through ensure_string_array. Index just calls calls the ndarray's .astype method. The Index behavior is specifically tested in test_astype_str_from_bytes xref #38607.

Index.astype can raise on non-ascii bytestrs:

val =  'あ'
bval = val.encode("UTF-8")

ser2 = pd.Series([bval], dtype=object)
idx2 = pd.Index(ser2)

In [21]: ser2.astype(str)[0]
Out[21]: "b'\\xe3\\x81\\x82'"

In [22]: idx2.astype(str)
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
[...]
TypeError: Cannot cast Index to dtype <U0

These behaviors should match. I don't really have a preference between them.