wide_to_long should verify uniqueness · Issue #16382 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
Your code here
This code produces stacked tables in 0.19.2 but respectively an error and a blank table in 0.20.1
import pandas as pd pd.version seed = [] for i in range(14): seed.append([1, 2, 3, 4, 5]) seed.append([1] * 5) test_df = pd.DataFrame(seed).T test_df.columns = ["A_A1", "B_B1", "A_A2", "B_B2", "A_A3", "B_B3", "A_A4", "B_B4", "A_A5", "B_B5", "A_A6", "B_B6", "A_A7", "B_B7", "x"] test_df
0.19.2: a table with the 'i' field all as '1'. 0.20.1: NotImplementedError
pd.wide_to_long(test_df, ["A_A", "B_B"], i="x", j="colname")
0.19.2: a table: 0.20.1: an empty table
this line "clears" the error above in 0.20.1 by assigning each row a unique identifier.
test_df["x"] = test_df.apply(lambda row: row["A_A1"], axis=1) pd.wide_to_long(test_df, ["A_", "B_"], i="x", j="colname")
Problem description
Changelog lists "performance improvements" for pd.wide_to_long but this is not an improvement for me; for these corner cases I would rather have the old behavior. Are these not mainstream enough to support?
Expected Output
A_A B_B
x colname
1 1 1 1
1 1 2
1 1 3
1 1 4
1 1 5
1 2 1
1 2 2
1 2 3
1 2 4
1 2 5
1 3 1
1 3 2
1 3 3
1 3 4
1 3 5
1 4 1
1 4 2
1 4 3
1 4 4
1 4 5
1 5 1
1 5 2
1 5 3
1 5 4
1 5 5
2 1 1
2 1 2
2 1 3
2 1 4
2 1 5
... ... ...
6 5 1
6 5 2
6 5 3
6 5 4
6 5 5
7 1 1
7 1 2
7 1 3
7 1 4
7 1 5
7 2 1
7 2 2
7 2 3
7 2 4
7 2 5
7 3 1
7 3 2
7 3 3
7 3 4
7 3 5
7 4 1
7 4 2
7 4 3
7 4 4
7 4 5
7 5 1
7 5 2
7 5 3
7 5 4
7 5 5
175 rows x 2 cols
2)
A_ B_
x colname
1 A1 1.0 NaN
2 A1 2.0 NaN
3 A1 3.0 NaN
4 A1 4.0 NaN
5 A1 5.0 NaN
1 A2 1.0 NaN
2 A2 2.0 NaN
3 A2 3.0 NaN
4 A2 4.0 NaN
5 A2 5.0 NaN
1 A3 1.0 NaN
2 A3 2.0 NaN
3 A3 3.0 NaN
4 A3 4.0 NaN
5 A3 5.0 NaN
1 A4 1.0 NaN
2 A4 2.0 NaN
3 A4 3.0 NaN
4 A4 4.0 NaN
5 A4 5.0 NaN
1 A5 1.0 NaN
2 A5 2.0 NaN
3 A5 3.0 NaN
4 A5 4.0 NaN
5 A5 5.0 NaN
1 A6 1.0 NaN
2 A6 2.0 NaN
3 A6 3.0 NaN
4 A6 4.0 NaN
5 A6 5.0 NaN
... ... ... ...
1 B2 NaN 1.0
2 B2 NaN 2.0
3 B2 NaN 3.0
4 B2 NaN 4.0
5 B2 NaN 5.0
1 B3 NaN 1.0
2 B3 NaN 2.0
3 B3 NaN 3.0
4 B3 NaN 4.0
5 B3 NaN 5.0
1 B4 NaN 1.0
2 B4 NaN 2.0
3 B4 NaN 3.0
4 B4 NaN 4.0
5 B4 NaN 5.0
1 B5 NaN 1.0
2 B5 NaN 2.0
3 B5 NaN 3.0
4 B5 NaN 4.0
5 B5 NaN 5.0
1 B6 NaN 1.0
2 B6 NaN 2.0
3 B6 NaN 3.0
4 B6 NaN 4.0
5 B6 NaN 5.0
1 B7 NaN 1.0
2 B7 NaN 2.0
3 B7 NaN 3.0
4 B7 NaN 4.0
5 B7 NaN 5.0
70 rows x 2 columns
Output of pd.show_versions()
# Paste the output here pd.show_versions() here # this is after upgrading.
INSTALLED VERSIONS
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
feather: None
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None