ENH: revamp null count supression for large frames in df.info() · Pull Request #5974 · pandas-dev/pandas (original) (raw)
#5550 deprecated options.display.max_info_rows, but df.info is still there
for the user to invoke and the null count can be very slow.
Un-deprecte the option and revamp df.info
to do the right thing.
Now that @cpcloud add per column dtypes it will always show them,
and just supress the counts if needed, where previously if max_info_rows was
exceeded, it didn't even print the column names.
In [4]: df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1000000 entries, C00410118 to C00431445
Data columns (total 18 columns):
cmte_id 1000000 non-null object
cand_id 1000000 non-null object
cand_nm 1000000 non-null object
contbr_nm 999975 non-null object
contbr_city 1000000 non-null object
contbr_st 999850 non-null object
contbr_zip 992087 non-null object
contbr_employer 994533 non-null object
contbr_occupation 1000000 non-null float64
contb_receipt_amt 1000000 non-null object
contb_receipt_dt 18038 non-null object
receipt_desc 383960 non-null object
memo_cd 391290 non-null object
memo_text 1000000 non-null object
form_tp 1000000 non-null int64
file_num 1000000 non-null object
tran_id 999998 non-null object
election_tp 0 non-null float64
dtypes: float64(2), int64(1), object(15)
In [5]: pd.options.display.max_info_rows
Out[5]: 1690785
In [6]: pd.options.display.max_info_rows=999999
In [7]: df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1000000 entries, C00410118 to C00431445
Data columns (total 18 columns):
cmte_id object
cand_id object
cand_nm object
contbr_nm object
contbr_city object
contbr_st object
contbr_zip object
contbr_employer object
contbr_occupation float64
contb_receipt_amt object
contb_receipt_dt object
receipt_desc object
memo_cd object
memo_text object
form_tp int64
file_num object
tran_id object
election_tp float64