wide_to_long should verify uniqueness · Issue #16382 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

Your code here

This code produces stacked tables in 0.19.2 but respectively an error and a blank table in 0.20.1

import pandas as pd pd.version seed = [] for i in range(14): seed.append([1, 2, 3, 4, 5]) seed.append([1] * 5) test_df = pd.DataFrame(seed).T test_df.columns = ["A_A1", "B_B1", "A_A2", "B_B2", "A_A3", "B_B3", "A_A4", "B_B4", "A_A5", "B_B5", "A_A6", "B_B6", "A_A7", "B_B7", "x"] test_df

0.19.2: a table with the 'i' field all as '1'. 0.20.1: NotImplementedError

pd.wide_to_long(test_df, ["A_A", "B_B"], i="x", j="colname")

0.19.2: a table: 0.20.1: an empty table

this line "clears" the error above in 0.20.1 by assigning each row a unique identifier.

test_df["x"] = test_df.apply(lambda row: row["A_A1"], axis=1) pd.wide_to_long(test_df, ["A_", "B_"], i="x", j="colname")

Problem description

Changelog lists "performance improvements" for pd.wide_to_long but this is not an improvement for me; for these corner cases I would rather have the old behavior. Are these not mainstream enough to support?

Expected Output

         A_A 	B_B
x 	colname 		
1 	1 	1 	1
1 	1 	2
1 	1 	3
1 	1 	4
1 	1 	5
1 	2 	1
1 	2 	2
1 	2 	3
1 	2 	4
1 	2 	5
1 	3 	1
1 	3 	2
1 	3 	3
1 	3 	4
1 	3 	5
1 	4 	1
1 	4 	2
1 	4 	3
1 	4 	4
1 	4 	5
1 	5 	1
1 	5 	2
1 	5 	3
1 	5 	4
1 	5 	5
2 	1 	1
2 	1 	2
2 	1 	3
2 	1 	4
2 	1 	5
... 	... 	...
6 	5 	1
6 	5 	2
6 	5 	3
6 	5 	4
6 	5 	5
7 	1 	1
7 	1 	2
7 	1 	3
7 	1 	4
7 	1 	5
7 	2 	1
7 	2 	2
7 	2 	3
7 	2 	4
7 	2 	5
7 	3 	1
7 	3 	2
7 	3 	3
7 	3 	4
7 	3 	5
7 	4 	1
7 	4 	2
7 	4 	3
7 	4 	4
7 	4 	5
7 	5 	1
7 	5 	2
7 	5 	3
7 	5 	4
7 	5 	5

175 rows x 2 cols

2)

         A_ 	B_
x 	colname 		
1 	A1 	1.0 	NaN
2 	A1 	2.0 	NaN
3 	A1 	3.0 	NaN
4 	A1 	4.0 	NaN
5 	A1 	5.0 	NaN
1 	A2 	1.0 	NaN
2 	A2 	2.0 	NaN
3 	A2 	3.0 	NaN
4 	A2 	4.0 	NaN
5 	A2 	5.0 	NaN
1 	A3 	1.0 	NaN
2 	A3 	2.0 	NaN
3 	A3 	3.0 	NaN
4 	A3 	4.0 	NaN
5 	A3 	5.0 	NaN
1 	A4 	1.0 	NaN
2 	A4 	2.0 	NaN
3 	A4 	3.0 	NaN
4 	A4 	4.0 	NaN
5 	A4 	5.0 	NaN
1 	A5 	1.0 	NaN
2 	A5 	2.0 	NaN
3 	A5 	3.0 	NaN
4 	A5 	4.0 	NaN
5 	A5 	5.0 	NaN
1 	A6 	1.0 	NaN
2 	A6 	2.0 	NaN
3 	A6 	3.0 	NaN
4 	A6 	4.0 	NaN
5 	A6 	5.0 	NaN
... 	... 	... 	...
1 	B2 	NaN 	1.0
2 	B2 	NaN 	2.0
3 	B2 	NaN 	3.0
4 	B2 	NaN 	4.0
5 	B2 	NaN 	5.0
1 	B3 	NaN 	1.0
2 	B3 	NaN 	2.0
3 	B3 	NaN 	3.0
4 	B3 	NaN 	4.0
5 	B3 	NaN 	5.0
1 	B4 	NaN 	1.0
2 	B4 	NaN 	2.0
3 	B4 	NaN 	3.0
4 	B4 	NaN 	4.0
5 	B4 	NaN 	5.0
1 	B5 	NaN 	1.0
2 	B5 	NaN 	2.0
3 	B5 	NaN 	3.0
4 	B5 	NaN 	4.0
5 	B5 	NaN 	5.0
1 	B6 	NaN 	1.0
2 	B6 	NaN 	2.0
3 	B6 	NaN 	3.0
4 	B6 	NaN 	4.0
5 	B6 	NaN 	5.0
1 	B7 	NaN 	1.0
2 	B7 	NaN 	2.0
3 	B7 	NaN 	3.0
4 	B7 	NaN 	4.0
5 	B7 	NaN 	5.0

70 rows x 2 columns

pd_wide_to_long_changes.zip

Output of pd.show_versions()

# Paste the output here pd.show_versions() here # this is after upgrading.

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
feather: None
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None