assert_frame_equal cannot handle differences in with unicode data · Issue #20503 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

Your code here

assert_frame_equal(pd.DataFrame(['á']), pd.DataFrame(['á'])) # => True

assert_frame_equal(pd.DataFrame(['á']), pd.DataFrame(['é']))

UnicodeEncodeError Traceback (most recent call last) in () ----> 1 assert_frame_equal(pd.DataFrame(['á']), pd.DataFrame(['é']))

.env/lib/python2.7/site-packages/pandas/util/testing.pyc in assert_frame_equal(left, right, check_dtype, check_index_type, check_column_type, check_frame_type, check_less_precise, check_names, by_blocks, check_exact, check_datetimelike_compat, check_categorical, check_like, obj) 1311 check_datetimelike_compat=check_datetimelike_compat, 1312 check_categorical=check_categorical, -> 1313 obj='DataFrame.iloc[:, {0}]'.format(i)) 1314 1315

.env/lib/python2.7/site-packages/pandas/util/testing.pyc in assert_series_equal(left, right, check_dtype, check_index_type, check_series_type, check_less_precise, check_names, check_exact, check_datetimelike_compat, check_categorical, obj) 1179 check_less_precise=check_less_precise, 1180 check_dtype=check_dtype, -> 1181 obj='{0}'.format(obj)) 1182 1183 # metadata comparison

pandas/src/testing.pyx in pandas._testing.assert_almost_equal (pandas/src/testing.c:4156)()

pandas/src/testing.pyx in pandas._testing.assert_almost_equal (pandas/src/testing.c:3274)()

.env/lib/python2.7/site-packages/pandas/util/testing.pyc in raise_assert_detail(obj, message, left, right, diff) 1011 {1} 1012 [left]: {2} -> 1013 [right]: {3}""".format(obj, message, left, right) 1014 1015 if diff is not None:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 1: ordinal not in range(128)

Problem description

The difference is caught correctly by assert_frame_equals but it raises an error when trying to print the differences.

Expected Output

AssertionError: DataFrame.iloc[:, 0] are different

DataFrame.iloc[:, 0] values are different (100.0 %) [left]: [á] [right]: [é]

Output of pd.show_versions()

python 2.7.15

------------------
commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None