BUG: Regression cannot create Series with dtype 'S|' or 'bytes' · Issue #39474 · pandas-dev/pandas (original) (raw)


Code Sample, a copy-pastable example

s = pd.Series(['foo', 'bar', 'baz']) assert s.astype(bytes).dtype.kind == 'S', "kind is not S!" # works in 1.1.x, 1.2.0, does not work in 1.2.1 (kind is 'O')

Problem description

Setting the type of a series to bytes changes str instances to bytes, but keeps the type as object. This seems to be a regression, in prior versions the type was changed to bytesXX. In pandas 1.2.1:

In [1]: import pandas as pd

In [2]: pd.version
Out[2]: '1.2.1'

In [3]: s = pd.Series(['foo', 'bar', 'baz'])

In [4]: s.astype('|S') Out[4]: 0 b'foo' 1 b'bar' 2 b'baz' dtype: object

In [5]: s.astype('bytes') Out[5]: 0 b'foo' 1 b'bar' 2 b'baz' dtype: object

In [6]: s.astype(bytes) Out[6]: 0 b'foo' 1 b'bar' 2 b'baz' dtype: object

Expected Output

I expected this output, produced with 1.2.0:

In [1]: import pandas as pd

In [2]: pd.version Out[2]: '1.2.0'

In [3]: s = pd.Series(['foo', 'bar', 'baz'])

In [4]: s.astype('|S') Out[4]: 0 b'foo' 1 b'bar' 2 b'baz' dtype: bytes24

In [5]: s.astype('bytes') Out[5]: 0 b'foo' 1 b'bar' 2 b'baz' dtype: bytes24

In [6]: s.astype(bytes) Out[6]: 0 b'foo' 1 b'bar' 2 b'baz' dtype: bytes24

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 9d598a5
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.7.17-1rodete4-amd64
Version : #1 SMP Debian 5.7.17-1rodete4 (2020-10-01)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.1
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.2.1
setuptools : 49.2.1
Cython : 0.29.21
pytest : 4.6.11
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.17.1
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.20
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None