assert_frame_equal cannot handle differences in with unicode data · Issue #20503 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
Your code here
assert_frame_equal(pd.DataFrame(['á']), pd.DataFrame(['á'])) # => True
assert_frame_equal(pd.DataFrame(['á']), pd.DataFrame(['é']))
UnicodeEncodeError Traceback (most recent call last) in () ----> 1 assert_frame_equal(pd.DataFrame(['á']), pd.DataFrame(['é']))
.env/lib/python2.7/site-packages/pandas/util/testing.pyc in assert_frame_equal(left, right, check_dtype, check_index_type, check_column_type, check_frame_type, check_less_precise, check_names, by_blocks, check_exact, check_datetimelike_compat, check_categorical, check_like, obj) 1311 check_datetimelike_compat=check_datetimelike_compat, 1312 check_categorical=check_categorical, -> 1313 obj='DataFrame.iloc[:, {0}]'.format(i)) 1314 1315
.env/lib/python2.7/site-packages/pandas/util/testing.pyc in assert_series_equal(left, right, check_dtype, check_index_type, check_series_type, check_less_precise, check_names, check_exact, check_datetimelike_compat, check_categorical, obj) 1179 check_less_precise=check_less_precise, 1180 check_dtype=check_dtype, -> 1181 obj='{0}'.format(obj)) 1182 1183 # metadata comparison
pandas/src/testing.pyx in pandas._testing.assert_almost_equal (pandas/src/testing.c:4156)()
pandas/src/testing.pyx in pandas._testing.assert_almost_equal (pandas/src/testing.c:3274)()
.env/lib/python2.7/site-packages/pandas/util/testing.pyc in raise_assert_detail(obj, message, left, right, diff) 1011 {1} 1012 [left]: {2} -> 1013 [right]: {3}""".format(obj, message, left, right) 1014 1015 if diff is not None:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 1: ordinal not in range(128)
Problem description
The difference is caught correctly by assert_frame_equals
but it raises an error when trying to print the differences.
Expected Output
AssertionError: DataFrame.iloc[:, 0] are different
DataFrame.iloc[:, 0] values are different (100.0 %) [left]: [á] [right]: [é]
Output of pd.show_versions()
python 2.7.15
------------------
commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None