BUG: DataFrame.to_string()
- customer formatter not called for all values in column · Issue #45177 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd import datetime
print('==============================') print('pd.version=',pd.version)
mylist = ['abcd',123,None,456.78,float('nan'),'wxyz',1234567,' ',datetime.datetime.now(),-321,None,3.1415926536]
df = pd.DataFrame(dict(MyColumn=mylist))
print('==============================') print(df) print('==============================')
def myformatter(value): print('type(value)=',type(value),' value=',value) #return f'{value:<10}' return '%-10.10s' % value
f = {'MyColumn': myformatter}
s = df.to_string(formatters=f,float_format=myformatter,na_rep='na_rep', index=False,justify='left')
print('==============================') print(s) print('==============================')
Issue Description
Per Pandas documentation for DataFrame.to_string, the formatters
parameter is a
list, tuple, or dict of one-parameter functions ... Formatter functions to apply to columns’ elements by position or name.
note: "apply to columns’ elements" (it does not say "apply to only some elements")
The "Reproducible example" code demonstrates that the formatter is not called for all elements in the column. See output below.
Specifically it appears that customer formatters are not called for the following types: float
, NoneType
, and NaN
. This is not a big deal for float
because the parameter float_format
allows one to install a formatter function for floats as well. However this prevents the user from custom formatting NoneType
and NaN
with their formatting function.
In the reproducible example given, the user has chosen to format the column as left justified, however because None
and NaN
values are not passed to the customer formatter, the user is unable to accomplish this task.
Here is the output from the above reproducible example:
==============================
pd.__version__= 1.3.5
==============================
MyColumn
0 abcd
1 123
2 None
3 456.78
4 NaN
5 wxyz
6 1234567
7
8 2022-01-03 14:37:24.075486
9 -321
10 None
11 3.141593
==============================
type(value)= <class 'str'> value= abcd
type(value)= <class 'int'> value= 123
type(value)= <class 'float'> value= 456.78
type(value)= <class 'str'> value= wxyz
type(value)= <class 'int'> value= 1234567
type(value)= <class 'str'> value=
type(value)= <class 'datetime.datetime'> value= 2022-01-03 14:37:24.075486
type(value)= <class 'int'> value= -321
type(value)= <class 'float'> value= 3.1415926536
==============================
MyColumn
abcd
123
None
456.78
na_rep
wxyz
1234567
2022-01-03
-321
None
3.14159265
==============================
Expected Behavior
Custom formatter functions should be called for all elements in the specified column.
Since installation of custom formatters is already split between two paramters (formatters
and float_format
) it may be reasonable to pass NoneType
to the formatters
and NaN
to the float_format
formatter. Alternatively there should be a way to pass every element to one single formatter function for the column, regardless of the element's type.
Installed Versions
INSTALLED VERSIONS
commit : 66e3805
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.60.1-microsoft-standard-WSL2
Version : #1 SMP Wed Aug 25 23:20:18 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.5
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 21.3.1
setuptools : 45.2.0
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 8.0.0.dev
pandas_datareader: 0.10.0
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None