Transposing dataframe loses dtype and ExtensionArray · Issue #22120 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

from cyberpandas import IPArray import cyberpandas import pandas as pd

df = pd.DataFrame({"address": IPArray(['192.168.1.1', '192.168.1.10']),"address1": IPArray(['192.168.1.1', '192.168.1.10'])}) print(df.info()) print(df.T.T.info()) <class 'pandas.core.frame.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 2 columns): address 2 non-null ip address1 2 non-null ip dtypes: ip(2) memory usage: 144.0 bytes None <class 'pandas.core.frame.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 2 columns): address 2 non-null object address1 2 non-null object dtypes: object(2) memory usage: 112.0+ bytes None

print(type(df.address.values)) print(type(df.T.T.address.values)) <class 'cyberpandas.ip_array.IPArray'> <class 'numpy.ndarray'>

################# df = pd.DataFrame({"A":["a","b","c","a"]})

df["B"] = df["A"].astype('category') print(df.info()) print(df.T.T.info()) <class 'pandas.core.frame.DataFrame'> RangeIndex: 4 entries, 0 to 3 Data columns (total 2 columns): A 4 non-null object B 4 non-null category dtypes: category(1), object(1) memory usage: 220.0+ bytes None <class 'pandas.core.frame.DataFrame'> RangeIndex: 4 entries, 0 to 3 Data columns (total 2 columns): A 4 non-null object B 4 non-null object dtypes: object(2) memory usage: 144.0+ bytes None

Problem description

When transposing a df containing a non standard dtype, the dtype is lost, and the ExtensionArray becomes an ndarray. I believe this occurs because the ExtensionArray is converted to an np array in the process of transposing the df, which does not keep the dtype/ExtensionArray.

Expected Output

print(df.info()) to be the same as
print(df.T.T.info())

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0.dev0+369.gbb451e89f
pytest: 3.6.3
pip: 10.0.1
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.4
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.2.3
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.10
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None