BUG: DataFrame.to_string() - customer formatter not called for all values in column · Issue #45177 · pandas-dev/pandas (original) (raw)

Pandas version checks

Reproducible Example

import pandas as pd import datetime

print('==============================') print('pd.version=',pd.version)

mylist = ['abcd',123,None,456.78,float('nan'),'wxyz',1234567,' ',datetime.datetime.now(),-321,None,3.1415926536]

df = pd.DataFrame(dict(MyColumn=mylist))

print('==============================') print(df) print('==============================')

def myformatter(value): print('type(value)=',type(value),' value=',value) #return f'{value:<10}' return '%-10.10s' % value

f = {'MyColumn': myformatter}

s = df.to_string(formatters=f,float_format=myformatter,na_rep='na_rep', index=False,justify='left')

print('==============================') print(s) print('==============================')

Issue Description

Per Pandas documentation for DataFrame.to_string, the formatters parameter is a

list, tuple, or dict of one-parameter functions ... Formatter functions to apply to columns’ elements by position or name.

note: "apply to columns’ elements" (it does not say "apply to only some elements")

The "Reproducible example" code demonstrates that the formatter is not called for all elements in the column. See output below.

Specifically it appears that customer formatters are not called for the following types: float, NoneType, and NaN. This is not a big deal for float because the parameter float_format allows one to install a formatter function for floats as well. However this prevents the user from custom formatting NoneType and NaN with their formatting function.

In the reproducible example given, the user has chosen to format the column as left justified, however because None and NaN values are not passed to the customer formatter, the user is unable to accomplish this task.

Here is the output from the above reproducible example:

==============================
pd.__version__= 1.3.5
==============================
                      MyColumn
0                         abcd
1                          123
2                         None
3                       456.78
4                          NaN
5                         wxyz
6                      1234567
7
8   2022-01-03 14:37:24.075486
9                         -321
10                        None
11                    3.141593
==============================
type(value)= <class 'str'>  value= abcd
type(value)= <class 'int'>  value= 123
type(value)= <class 'float'>  value= 456.78
type(value)= <class 'str'>  value= wxyz
type(value)= <class 'int'>  value= 1234567
type(value)= <class 'str'>  value=
type(value)= <class 'datetime.datetime'>  value= 2022-01-03 14:37:24.075486
type(value)= <class 'int'>  value= -321
type(value)= <class 'float'>  value= 3.1415926536
==============================
MyColumn
abcd
123
      None
456.78
    na_rep
wxyz
1234567

2022-01-03
-321
      None
3.14159265
==============================

Expected Behavior

Custom formatter functions should be called for all elements in the specified column.

Since installation of custom formatters is already split between two paramters (formatters and float_format) it may be reasonable to pass NoneType to the formatters and NaN to the float_format formatter. Alternatively there should be a way to pass every element to one single formatter function for the column, regardless of the element's type.

Installed Versions

INSTALLED VERSIONS

commit : 66e3805
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.60.1-microsoft-standard-WSL2
Version : #1 SMP Wed Aug 25 23:20:18 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.5
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 21.3.1
setuptools : 45.2.0
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 8.0.0.dev
pandas_datareader: 0.10.0
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None