DISCUSSION: Add format parameter to .astype when converting to str dtype · Issue #17211 · pandas-dev/pandas (original) (raw)

I propose adding a string formatting possibility to .astype when converting to str dtype: I think it's reasonable to expect that you can choose the string format when converting to a string dtype, as you're basically freezing a representation of your series, and just using .astype(str) for this is often too crude.

This possibility should take shape of a format parameter to .astype, that can take a string and can only be used when converting to string dtype. This would lessen the reliance on .apply for converting non-strings to more complex strings and make such conversions more readable (IMO) and maybe faster (as we're avoiding .apply which is slow, though Im not too knowledgable on such optimizations).

The current procedure for converting to a complex string goes like this:

In [1] ser = pd.Series([-1, 1.234]) In [2] ser.apply("{:+.1f} $".format) 0 -1.0 $ 1 +1.2 $ dtype: object

I propose to make this possible:

In [3] ser.astype(str, format="{:+.1f} $") 0 -1.0 $ 1 +1.2 $ dtype: object

If the dtype parameter is not str, setting of the format parameter should raise an exception. If format is not set, the current behaviour will be used. The proposed change is therefore backward compatible.

Also to consider:

Allowing a placeholder name

Should a placeholder name be available? Then you could do:

In [4] ser = pd.Series(pd.date_range('2017-03', periods=2, freq='M')) In [x] ser.astype(str, format="Y{value.dt.year}-Q{value.dt.quarter}") 0 Y2017-Q1 1 Y2017-Q2 dtype: object

(Note that we above have an implicit parameter on .astype with a default value "value", so adding a placeholder name is transparent. Note also the above behaviour is present in ser.dt.strftime, but please look at the principle rather than the concrete example).

A downside to allowing a placeholder name could be the potential for abuse (stuffing too much into the format string) and possibly losing the option to vectorize (though this is not my expertize).

Adding a .format method

It could also be considered adding a .str.format or .format method to DataFrame/Series.

If .format is added to the .str namespace it would only be usable for string dataframes/series (which I'd be quite ok with, if the format parameter is also available on .astype for other data types).

Alternatively, such a method could be available directly on all DataFrames/Series. Then you'd do ser.format('{:+.1f}') rather than ser.astype(str, format='{:+.1f}'). IMO though, it would be inconsistent to have such a string conversion method directly on pandas objects, but not for other types. Why have .format but not .to_numeric as a dataframes/series method?

IMO therefore, astype(str, format=...) combined with a .str.format method is better than adding a new .format method for this. So: