Issue 18738: String formatting (% and str.format) issues with Enum (original) (raw)

Created on 2013-08-14 13:48 by ethan.furman, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (71)

msg195160 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 13:48

While .format() works fine with enum, %-formatting does not:

--> class AF(enum.IntEnum): ... IPv4 = 1 ... IPv6 = 2 ...

--> AF.IPv4 <AF.IPv4: 1>

--> '%s' % AF.IPv4 'AF.IPv4'

--> '%r' % AF.IPv4 '<AF.IPv4: 1>'

--> '%d' % AF.IPv4 'AF.IPv4'

--> '%i' % AF.IPv4 'AF.IPv4'

--> '%x' % AF.IPv4 '1'

--> '%o' % AF.IPv4 '1'

Hex and octal work, decimal and integer do not.

msg195162 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2013-08-14 13:55

.format() is surprised too.

'{:}'.format(AF.IPv4) 'AF.IPv4' '{:10}'.format(AF.IPv4) ' 1'

msg195164 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-14 13:59

In Objects/unicodeobject.c when it gets into mainformatlong, an IntEnum is recognized as an integer (passes PyLong_Check) and goes into formatlong. There, in the cases of 'd', 'i', 'u' it has:

case 'u':
    /* Special-case boolean: we want 0/1 */
    if (PyBool_Check(val))
        result = PyNumber_ToBase(val, 10);
    else
        result = Py_TYPE(val)->tp_str(val);
    break;

So tp_str is invoked...

msg195168 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-14 14:12

On Wed, Aug 14, 2013 at 6:55 AM, Serhiy Storchaka <report@bugs.python.org>wrote:

Serhiy Storchaka added the comment:

.format() is surprised too.

'{:}'.format(AF.IPv4) 'AF.IPv4' '{:10}'.format(AF.IPv4) ' 1'

Oh, this looks like a bug in str.format (and probably the %i behavior too). Python allows subclassing int and providing custom str/repr. The formatting tools should behave sensically in this situation.

msg195170 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 14:14

'{:}'.format(AF.IPv4) is incorrect, but '{:10}'.format(AF.IPv4) behaves as it should.

msg195171 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 14:16

Looks like Objects/unicodeobject.c needs the same enhancement that Modules/_json.c received: convert int/float subclasses into actual ints/floats before continuing.

msg195172 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-14 14:18

Ethan, str.format uses format. We don't seem to provide a custom one.

msg195173 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 14:22

Gotcha. I'm on it.

msg195179 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 15:24

The %-formatting needs to be handled by str, correct?

What is '{:}' supposed to mean?

msg195180 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 15:28

What is '{:}' supposed to mean?

It should be the same as '{}'. That is, an empty format string.

msg195184 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 16:11

Which is the same as '%s', yes? In that case, the current behavior of '{:}'.format(AF.IPv4) is correct, isn't it?

msg195189 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 17:50

I think:

'{:}'.format(AF.IPv4) 'AF.IPv4'

is correct, assuming str(AF.IPv4) is 'AF.IPv4'. I'm not sure what:

'{:10}'.format(AF.IPv4)

should produce. There's a special case for an empty format string calling str().

I think that specifying format() would be best, except then you need to decide what sort of format specification language you want to support, and deal with all of the implementation details. Or, maybe just have Enum's format be:

def __format__(self, fmt):
    return format(str(self), fmt)

which makes the format specification language match str's.

%-formatting is a whole different thing, of course.

msg195190 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 18:25

Eric V. Smith added the comment:

I think that specifying format() would be best, except then you need to decide what sort of format specification language you want to support, and deal with all of the implementation details. Or, maybe just have Enum's format be:

 def __format__(self, fmt):
     return format(str(self), fmt)

which makes the format specification language match str's.

I disagree. A subclass shouldn't have to write code to provide the /same/ behavior as its superclass, just code for different behavior. In the cases of '%d' % int_subclass or '{:d}'.format(int_subclass)

str should be smart enough to actually produce the numeric value, not rely on the subclass' repr.

msg195192 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 18:39

I think that specifying format() would be best, except then you need to decide what sort of format specification language you want to support, and deal with all of the implementation details. Or, maybe just have Enum's format be:

 def __format__(self, fmt):
     return format(str(self), fmt)

which makes the format specification language match str's.

I disagree. A subclass shouldn't have to write code to provide the /same/ behavior as its superclass, just code for different behavior. In the cases of '%d' % int_subclass or '{:d}'.format(int_subclass)

str should be smart enough to actually produce the numeric value, not rely on the subclass' repr.

I'm not sure which "str" you mean here: "str should be smart enough to actually produce the numeric value".

For the format version, what gets called is:

int_subclass.format('d'), which is int.format(int_subclass, 'd'), which produces '1', assuming int(int_subclass) is 1.

So, there's no "str" involved anywhere, except the one on which .format() is called ('{:d}'), and it doesn't know about the types of any arguments or what the format specifiers mean, so it can't make any decisions.

Which is why it's easier to think of this in terms of: format(int_subclass, 'd')

instead of: '{:d}'.format(int_subclass)

It's int_subclass, and only int_subclass, that gets to decide what the format specifier means. We can either let it fall back to int.format (the default), or str.format (which I suggest above), or it can do it's own custom thing with the format specifier.

msg195194 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 18:48

Eric V. Smith added the comment:

For the format version, what gets called is:

int_subclass.format('d'), which is int.format(int_subclass, 'd'), which produces '1', assuming int(int_subclass) is 1.

Ah, I didn't realize. Thanks.

So, there's no "str" involved anywhere, except the one on which .format() is called ('{:d}'), and it doesn't know about the types of any arguments or what the format specifiers mean, so it can't make any decisions.

As far as format goes, I don't think there is a problem. It's behaving just like it should (which makes sense, since IntEnum is derived from int and is already using int's format by default).

The problem, then, is just with %-formatting, which is squarely a str (aka Objects/unicodeobject.c) issue.

msg195196 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-14 18:55

On Wed, Aug 14, 2013 at 11:48 AM, Ethan Furman <report@bugs.python.org>wrote:

Ethan Furman added the comment:

Eric V. Smith added the comment:

For the format version, what gets called is:

int_subclass.format('d'), which is int.format(int_subclass, 'd'), which produces '1', assuming int(int_subclass) is 1.

Ah, I didn't realize. Thanks.

So, there's no "str" involved anywhere, except the one on which .format() is called ('{:d}'), and it doesn't know about the types of any arguments or what the format specifiers mean, so it can't make any decisions.

As far as format goes, I don't think there is a problem. It's behaving just like it should (which makes sense, since IntEnum is derived from int and is already using int's format by default).

I'm not sure I understand. The discrepancy between {:} and {:10} is clearly a problem.

msg195198 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 18:56

For format, I think the question is "should an IntEnum format like an int, with the wacky exception of a specifier of '', or should it always format like a str?"

I assumed we'd want it to look like the str() version of itself, always. But it's debatable.

I agree the %-formatting question is different, and I further think there's not much we can do there.

msg195199 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-14 19:00

On Wed, Aug 14, 2013 at 11:56 AM, Eric V. Smith <report@bugs.python.org>wrote:

Eric V. Smith added the comment:

For format, I think the question is "should an IntEnum format like an int, with the wacky exception of a specifier of '', or should it always format like a str?"

I assumed we'd want it to look like the str() version of itself, always. But it's debatable.

I agree the %-formatting question is different, and I further think there's not much we can do there.

How about always being a string, unless integer formatting "%i" / {d} is explicitly requested?

The alternative (always a string) is also fine, but the behavior should be consistent. Certainly field-width-justification ({:}) can't affect the formatting.

msg195201 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 19:07

Eric V. Smith added the comment:

I assumed we'd want it to look like the str() version of itself, always. But it's debatable.

An IntEnum's str and repr should be (and any format or % codes that are the equivalent) the Enum str and repr. The % and format codes that specifically call for a numeric representation should give that numeric representation (format is good here, % is not).

For format, I think the question is "should an IntEnum format like an int, with the wacky exception of a specifier of '', or should it always format like a str?"

I think for format we should treat IntEnums as ints unless the s or r codes are specifically used.

I agree the %-formatting question is different, and I further think there's not much we can do there.

We can have unicodeobject.c convert int (and float) subclasses to actual ints and floats before getting the numeric value (we just did this to _json.c so it could serialize IntEnums).

msg195202 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-14 19:12

On Wed, Aug 14, 2013 at 12:07 PM, Ethan Furman <report@bugs.python.org>wrote:

Ethan Furman added the comment:

Eric V. Smith added the comment:

I assumed we'd want it to look like the str() version of itself, always. But it's debatable.

An IntEnum's str and repr should be (and any format or % codes that are the equivalent) the Enum str and repr. The % and format codes that specifically call for a numeric representation should give that numeric representation (format is good here, % is not).

For format, I think the question is "should an IntEnum format like an int, with the wacky exception of a specifier of '', or should it always format like a str?"

I think for format we should treat IntEnums as ints unless the s or r codes are specifically used.

As I wrote above, I rather see it differently. The original intent of Enums was to have string representation in most cases. So this should be the default, since most of the time this is what the user wants. No one really passes explicit s/r codes. In the minority, specialized cases, where the user wants to force int-like formatting, then the number should be given and not the member name. This would also be consistent with non-decimal formatting options like %x and the .format equivalent.

msg195205 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 19:34

I think IntEnum should act like a str for format() purposes. After all, having a useful string representation is a prime reason it exists. If you want it to act like a str() sometimes, and an int() at others, you're going to have to parse the format specifier and figure out what to do. It might be as easy as:

def format(self, fmt): if len(fmt) >= 1 and fmt[-1] in 'oOxXdD': # treat like an int return format(self.value, fmt) else: # treat like a string format(str(self), fmt)

But I haven't completely thought it through or tested it.

Or, couldn't we just say it's always str, and if you want to treat it like an int then use .value?

msg195206 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 19:42

Okay, I see your points. I can certainly agree with going with the str representation when no numeric code is specified. But, if a numeric code is specified (x, b, d, etc.) then the numeric value should be used.

Agreed?

msg195207 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 19:45

On 08/14/2013 11:55 AM, Eli Bendersky wrote:

I'm not sure I understand. The discrepancy between {:} and {:10} is clearly a problem.

Ah, you're right.

msg195209 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-14 20:04

Okay, I see your points. I can certainly agree with going with the str representation when no numeric code is specified. But, if a numeric code is specified (x, b, d, etc.) then the numeric value should be used.

Agreed?

Yes. I suggest to wait a day or two for others to react (night in Europe, etc). If this sounds good to everyone then it may make sense to split this issue to one for str.format and another for legacy % formatting, because the implementation is likely to be different.

msg195212 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 20:19

If IntEnum.format is going to parse the format string, it's a little fragile. For example, say we modify int.format to understand a "Z" presentation type. Who's going to remember to update IntEnum.format?

For reference, the existing integer formats are: "bdoxXn".

msg195213 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 20:32

What I'm hoping is to get agreement on what the behavior should be (unspecified format codes use str or repr, specified numeric codes use the value), and then persuade folks that int (or PyLong) is where that behavior should be kept (so int is subclass friendly) -- then IntEnum (and other int subclasses) don't have to worry about it.

msg195214 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 20:57

I don't think it's possible for int (PyLong) to handle a decision to format itself as a string. Personally, I'd like this:

format(3, 's') Traceback (most recent call last): File "", line 1, in ValueError: Unknown format code 's' for object of type 'int'

To continue to be an error.

This is exactly why the format protocol was added: so a type could make a decision on how it should format itself. My only concern is the fragility of the proposed solution.

msg195215 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 21:13

--> class Test(enum.IntEnum): ... one = 1 ... two = 2 ...

--> '{}'.format(Test.one) 'Test.one'

--> '{:d}'.format(Test.one) '1'

--> '{:}'.format(Test.one) 'Test.one'

--> '{:10}'.format(Test.one) ' 1'

Sometimes the str is used, and sometimes the value of the int is used. I suggesting we tighten that up.

msg195217 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 21:22

The value of int is always used, except when the format string is empty. PEP 3101 explicitly requires this behavior. "For all built-in types, an empty format specification will produce the equivalent of str(value)." The "built-in type" here refers to int, since IntEnum is derived from int (at least I think it is: I haven't followed the metaclass and multiple inheritance completely).

If you want it to be different, you need to implement format for IntEnum or a base class (or the metaclass? again, I haven't checked).

msg195218 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 21:26

So what you're saying is that '{:}' is empty, but '{:10}' is not?

msg195220 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 21:38

So what you're saying is that '{:}' is empty, but '{:10}' is not?

Yes, exactly. The part before the colon says which argument to .format() to use. The empty string there means "use the next one". The part after the colon is the format specifier. In the first example above, there's an empty string after the colon, and in the second example there's a "10" after the colon.

Which is why it's really easier to use: format(obj, '') and format(obj, '10') instead of .format examples. By using the built-in format, you only need to write the format specifier, not the ''.format() "which argument am I processing" stuff with the braces and colons.

msg195221 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-14 21:45

On Wed, Aug 14, 2013 at 2:38 PM, Eric V. Smith <report@bugs.python.org>wrote:

Eric V. Smith added the comment:

So what you're saying is that '{:}' is empty, but '{:10}' is not?

Yes, exactly. The part before the colon says which argument to .format() to use. The empty string there means "use the next one". The part after the colon is the format specifier. In the first example above, there's an empty string after the colon, and in the second example there's a "10" after the colon.

Which is why it's really easier to use: format(obj, '') and format(obj, '10') instead of .format examples. By using the built-in format, you only need to write the format specifier, not the ''.format() "which argument am I processing" stuff with the braces and colons.

Eric, I'd have to disagree with this part. Placing strictly formal interpretation of "empty" aside, it seems to me unacceptable that field-width affects the interpretation of the string. This appears more like bug in the .format implementation than the original intention. I suspect that at this point it may be useful to take this discussion to pydev to get more opinions.

msg195223 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 22:06

It's not whether a field width is specified that makes it "empty" or not, it's where there's anything in the format specifier at all. I'm trying to simplify the conversation by using format() instead of str.format(), but I'm not succeeding!

Going back to str.format examples:

'{}'.format(Test.one)

equivalent to format(Test.one, '')

result is Test.one.format('')

'{:d}'.format(Test.one)

equivalent to format(Test.one, 'd')

result is Test.one.format('d')

'{:}'.format(Test.one)

equivalent to format(Test.one, '')

result is Test.one.format('')

'{:10}'.format(Test.one)

equivalent to format(Test.one, '10')

result is Test.one.format('10')

In all of these cases, since there is no Test.one.format, int.format is called. int.format contains logic (Python/formatter_unicode.c, line 1422) that says "if the format specifier is empty, return str(self), otherwise do the int formatting". This is in order to comply with the previously mentioned PEP requirement. That's the only place where there's any "treat this as a str instead of an int" logic.

In order to avoid that logic, and cause more format specifiers to result in str-like behavior, we'll need to implement an format somewhere (IntEnum, I guess) that makes the "int or str" decision.

msg195224 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 22:12

In order to avoid that logic, and cause more format specifiers to result in str-like behavior, we'll need to implement an format somewhere (IntEnum, I guess) that makes the "int or str" decision.

If this is the way format is supposed to work, then I'll add format to IntEnum with simple logic that says if not letter format code is present, use string formatting, otherwise use int formatting. That should be future proof.

msg195225 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-14 22:22

In order to avoid that logic, and cause more format specifiers to result in str-like behavior, we'll need to implement an format somewhere (IntEnum, I guess) that makes the "int or str" decision.

If this is the way format is supposed to work, then I'll add format to IntEnum with simple logic that says if not letter format code is present, use string formatting, otherwise use int formatting.

Yes, that's exactly how it's supposed to be used, and why format exists at all.

That should be future proof.

Agreed.

It does mean that a few things that look like int format specifiers won't be, but I don't think it's a big loss.

For example, '+' only makes sense on an int, but with your proposal it would be a str format specifier:

format(42, '+') '+42' format('42', '+') Traceback (most recent call last): File "", line 1, in ValueError: Sign not allowed in string format specifier

Again, I don't think any of these would be a big deal. But it does mean that there are places that could take an int that can't take an IntEnum.

msg195227 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-14 22:47

Drat. IntEnum is supposed to be a drop-in replacement for int. I guess I'll have to consider more than just the letter code to decide whether to go with int.format or str.format.

msg195292 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-15 21:29

With this in the Enum class all members simply forward formatting to the mixed-in type, meaning that IntEnum behaves exactly like int for formatting purposes. Another way of saying that is that the name attribute of an IntEnum member will only be used when the !r or !s codes are used.

If we are all good with that, then the question remaining is how should pure enums handle formatting? Should they also go the value route, or should they go with the name?

msg195293 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-15 21:46

The whole point of IntEnum and replacing stdlib constants with it was friendly str & repr out of the box. This means that "just printing out" an enum member should have a nice string representation. And "just printing out" means:

print(member) "%s" % member "{}".format(member)

!s/!r are quite esoteric - IntEnum should behave in the nicest way possible out of the box.

Let's just rig IntEnum's format to do the right thing and not worry about Enum itself. I hope that mixin-with-Enum cases are rare (and most are IntEnum anyway), and in such rare cases users are free to lift the implementation from IntEnum.

msg195296 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-15 22:20

On 8/15/2013 5:46 PM, Eli Bendersky wrote:

The whole point of IntEnum and replacing stdlib constants with it was friendly str & repr out of the box. This means that "just printing out" an enum member should have a nice string representation. And "just printing out" means:

print(member) "%s" % member "{}".format(member)

100% agreed.

!s/!r are quite esoteric - IntEnum should behave in the nicest way possible out of the box.

Not only that, they're not part of the format protocol, anyway.

Let's just rig IntEnum's format to do the right thing and not worry about Enum itself. I hope that mixin-with-Enum cases are rare (and most are IntEnum anyway), and in such rare cases users are free to lift the implementation from IntEnum.

Agreed.

And the next question is: what is "the right thing"? Does it always appear to be a str? Or sometimes str and sometimes int? And how far deviant from plain int can it be?

I think the answers should be:

I think we might want to consider the same thing for bool.format, but that's another issue. Maybe int could grow a format_derived_type method implements the above, and bool and IntEnum could set their format to be that. Which probably points out that the original int.format design is flawed (as Nick pointed out), but too late to change it.

Or, thinking even more outside the box, maybe int.format could implement the above logic if it knows it's working on a derived class. That would change bool as well as all other int-derived types, but leave int itself alone. More breakage, but potentially more useful. But again, we should open another issue if we want to pursue this approach.

msg195298 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-15 22:37

On Thu, Aug 15, 2013 at 3:20 PM, Eric V. Smith <report@bugs.python.org>wrote:

Eric V. Smith added the comment:

On 8/15/2013 5:46 PM, Eli Bendersky wrote:

The whole point of IntEnum and replacing stdlib constants with it was friendly str & repr out of the box. This means that "just printing out" an enum member should have a nice string representation. And "just printing out" means:

print(member) "%s" % member "{}".format(member)

100% agreed.

!s/!r are quite esoteric - IntEnum should behave in the nicest way possible out of the box.

Not only that, they're not part of the format protocol, anyway.

Let's just rig IntEnum's format to do the right thing and not worry about Enum itself. I hope that mixin-with-Enum cases are rare (and most are IntEnum anyway), and in such rare cases users are free to lift the implementation from IntEnum.

Agreed.

And the next question is: what is "the right thing"? Does it always appear to be a str? Or sometimes str and sometimes int? And how far deviant from plain int can it be?

I think the answers should be:

Sounds good to me. One of IntEnum's raison d'ĂȘtres is to be an integer with a nice string representation. So it makes sense to make it show itself as a string, unless expicitly asked for an int. Float-conversion is dubious, but I agree that following int's lead here is harmless and least surprising.

Naturally, compatibility with % formatting is desired. '%s' is str, '%i' is int(). * * *

msg195299 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-15 22:56

Eli Bendersky added the comment: Naturally, compatibility with % formatting is desired. '%s' is str, '%i' is int().

Can we solve that one on this issue, or do we need to make another?

msg195300 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-15 23:00

I guess there are merits to keeping it all in the same place.

msg195302 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-15 23:09

Eric V. Smith added the comment:

I think the answers should be:

Hmmm. How about defining the characters that will be supported for string interpretation, and if there are any other characters in format spec then go int (or whatever the mix-in type is)? I'm thinking "<^>01234566789". Anything else ("+", all letter codes, etc.) gets the normal (host-type) treatment.

msg195303 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-15 23:59

On 8/15/2013 7:09 PM, Ethan Furman wrote:

Ethan Furman added the comment:

Eric V. Smith added the comment:

I think the answers should be:

  • Formats as int if the length of the format spec is >= 1 and it ends in one of "bdoxX" (the int presentation types).

Hmmm. How about defining the characters that will be supported for string interpretation, and if there are any other characters in format spec then go int (or whatever the mix-in type is)? I'm thinking "<^>01234566789". Anything else ("+", all letter codes, etc.) gets the normal (host-type) treatment.

Is the goal of this approach to implement format in Enum instead of IntEnum?

But you can't do this in general, because in the place you implement format you must understand the mix-in type's format strings. Consider if the mix-in type is datetime: it's format strings don't end in a the set of characters you list. So I think you have to implement format on each derived-from-Enum type so you can make the best decisions there.

I think we want to have the most possible interpretations give a str output, and only use the base type if that's explicitly asked for. As Eli says, that's one of the main reasons IntEnum exists in the first place. Hence my approach for IntEnum.format.

msg195305 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-16 00:20

Eric V. Smith added the comment:

Ethan Furman added the comment:

Hmmm. How about defining the characters that will be supported for string interpretation, and if there are any other characters in format spec then go int (or whatever the mix-in type is)? I'm thinking "<^>01234566789". Anything else ("+", all letter codes, etc.) gets the normal (host-type) treatment.

Is the goal of this approach to implement format in Enum instead of IntEnum?

Yes.

But you can't do this in general, because in the place you implement format you must understand the mix-in type's format strings.

Which is why I suggest concentrating on what defines an "empty" format string. In this case "empty" means what can we put in the format spec and still get string treatment.

Consider if the mix-in type is datetime: it's format strings don't end in a the set of characters you list.

The characters I list are the justification chars and the digits that would be used to specify the field width. If those are the only characters given then treat the MixedEnum member as the member string.

msg195306 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-16 00:26

On 8/15/2013 8:20 PM, Ethan Furman wrote:

The characters I list are the justification chars and the digits that would be used to specify the field width. If those are the only characters given then treat the MixedEnum member as the member string.

But a datetime format string can end in "0", for example.

format(datetime.datetime.now(), '%H:%M:%S.00') '20:25:27.00'

I think your code would produce the equivalent of:

format(str(datetime.datetime.now()), '%H:%M:%S.00') Traceback (most recent call last): File "", line 1, in ValueError: Invalid conversion specification

msg195310 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-16 04:24

Eric V. Smith added the comment:

But a datetime format string can end in "0", for example.

format(datetime.datetime.now(), '%H:%M:%S.00') '20:25:27.00'

Not a problem, because once the digits were removed there would still be % : H M S and ., so the datetime format would be called. str format would only be called when the result of removing < ^ > 0 1 2 3 4 5 6 7 8 9 was an empty string.

msg195311 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-16 04:31

On 8/16/2013 12:24 AM, Ethan Furman wrote:

Ethan Furman added the comment:

Eric V. Smith added the comment:

But a datetime format string can end in "0", for example.

format(datetime.datetime.now(), '%H:%M:%S.00') '20:25:27.00'

Not a problem, because once the digits were removed there would still be % : H M S and ., so the datetime format would be called. str format would only be called when the result of removing < ^ > 0 1 2 3 4 5 6 7 8 9 was an empty string.

Ah, I misread your earlier text. You'd want to remove 's' also, wouldn't you?

You might be able to guess whether this particular format spec is a str format spec or not, but it can't work in the general case, because not all types with format specifiers have been written yet. I can imagine a type with a format specification that's identical to str's (yet produces different output), and you'd always want to use its format instead of str.format.

I feel strongly this is sufficiently magic that we shouldn't do it, and instead just implement the more robust algorithm in IntEnum.format. But I'm not going to argue it any more.

msg195313 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-16 07:26

Eric, please do not feel your time has been wasted. I greatly appreciate the knowledge you shared and I learned much.

I feel very strongly that, as much as possible, an Enum should Just Work. Requiring users to write their own format any time they create a new mixinEnum in order to get sane default behaviour is just not cool.

And while the behaviour of switching from str.format to mixin.format can appear a bit magical, it is nothing compared to Enum as a whole.

You can review the attached patch to see what I mean about filtering the format spec to decide which format method to call. Any code besides the basic width and justification codes will switch to the mix-in's format; so '+', 'b', '%Y', 's', and everything we haven't thought of yet will switch to mix-in.

msg195314 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-16 08:01

This patch contains the previous patch, plus a fix and tests for %i, %d, and %u formatting, and tests only for %f formatting (nothing to fix there, but don't want it breaking in the future ;) .

msg195628 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-19 12:45

Oh, I don't feel my time has been wasted. Where else can I have a discussion of format?

With this patch, given this:

class UpperString(str): def format(self, fmt): return str.format(self, fmt).upper()

class UpperEnum(UpperString, Enum): pass

class S(UpperEnum): a = 'a' b = 'b'

this gives the (to me) surprising results of:

format(S.a) 'S.a' format(S.a, '10') 'S.a ' format(S.a, '10s') 'A '

I'd expect this to always use UpperString.format, since it understands all str format specs.

And before you say UpperString is contrived, I've used classes like it in the past: they're just like a string, but the format method does something special after calling through to str.format.

Which is why I think format has to go in the derived type (IntEnum, in the originally reported case): only it can decide whether to call str.format or the mix-in class's format.

Now, whether or not Enum needs to support such classes with specialized format_, I can't say. I suppose it's not super-important. But it will be difficult to explain all of this.

Also, the patch give this:

class E(IntEnum): ... one = 1 ... two = 2 ... format(E.one) 'E.one' format(E.one, 's') Traceback (most recent call last): File "", line 1, in File "/home/eric/local/python/cpython/Lib/enum.py", line 463, in format return obj.format(val, format_spec) ValueError: Unknown format code 's' for object of type 'int'

I can't format it using the 's' presentation type, despite the fact it looks like a string. I think you need to add 's' to your _remove_plain_format_chars.

And consider this valid (but arguably pathological) code:

format(datetime.datetime.now(), '10') '10'

Despite this being a valid datetime format spec, your code would consider it a str spec.

tl;dr: I think format belongs in the class that understands how the subclass handles format specs.

msg195632 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2013-08-19 14:13

For C-style formatting see also .

msg195736 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-21 00:43

Eli Bendersky added the comment:

The whole point of IntEnum and replacing stdlib constants with it was friendly str & repr out of the box.

Sure, friendly str and repr plus an nice way to work them in code.

This means that "just printing out" an enum member should have a nice string representation.

And when are you going to print out an enum? Debugger and/or command line.

And "just printing out" means:

print(member) "%s" % member "{}".format(member)

Would you seriously use either of those last two in either the debugger or the command line?

msg195739 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-21 01:26

Eric V. Smith added the comment:

this gives the (to me) surprising results of:

format(S.a) 'S.a' format(S.a, '10') 'S.a' format(S.a, '10s') 'A'

that is surprising: if a format is defined in the Enum class chain then it should be used instead of the default. I'll fix that (I treat new, repr, getnewargs, and maybe one other the same way).

Also, the patch give this:

class E(IntEnum): ... one = 1 ... two = 2 ... format(E.one) 'E.one' format(E.one, 's') Traceback (most recent call last): File "", line 1, in File "/home/eric/local/python/cpython/Lib/enum.py", line 463, in format return obj.format(val, format_spec) ValueError: Unknown format code 's' for object of type 'int'

I can't format it using the 's' presentation type, despite the fact it looks like a string. I think you need to add 's' to your _remove_plain_format_chars.

Well, they are Enums, not strings. And if I add 's' to the remove, then a str-derived enum would not pass through to the value correctly.

And consider this valid (but arguably pathological) code:

format(datetime.datetime.now(), '10') '10'

Despite this being a valid datetime format spec, your code would consider it a str spec.

Sounds like the way forward is to specify in the docs how the default Enum format behaves (basically honors width and justification settings to yield the name, anything else passes through to the Enum member) along with advice matching that for str and repr: if you want something different, write your own method. ;)

And I learned something else: the format mini-language is really in two parts; the first part is selecting the object to be printed ({} or {3} or {some_name} or {other.name} etc., etc.) and the second part is the format spec for the object selected. The kicker is that each object can specify what it knows about. So float's treat 'f' as float, but something else might treat 'f' as fixed.

msg195744 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-21 03:37

On 8/20/2013 9:26 PM, Ethan Furman wrote:

Sounds like the way forward is to specify in the docs how the default Enum format behaves (basically honors width and justification settings to yield the name, anything else passes through to the Enum member) along with advice matching that for str and repr: if you want something different, write your own method.

I definitely agree on the documentation part.

But I think that IntEnum should have its own format, because it wants something different.

And I still think that any interpretation of the format spec in Enum.format is a mistake, because you don't know what the string means to the passed-to object. You're basically trying to guess: does this look like something that makes sense as a str.format specifier and I should handle it directly, or does it make sense to the passed-to object? And you can't know for sure.

And I learned something else: the format mini-language is really in two parts; the first part is selecting the object to be printed ({} or {3} or {some_name} or {other.name} etc., etc.) and the second part is the format spec for the object selected.

This is why I've been trying to frame this discussion in terms of built-in format() or obj.format(), and get away from str.format().

The kicker is that each object can specify what it knows about. So float's treat 'f' as float, but something else might treat 'f' as fixed.

And some other object might consider 'f' as being part of a literal that's always output (like datetime).

msg195773 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2013-08-21 11:37

print(member) "%s" % member "{}".format(member)

Would you seriously use either of those last two in either the debugger or the command line?

Yes, of course. When you need debug output from function or loop inners.

for ...:
    ...
    print('stage1(%s) = [%s:%s] %s' % (i, start, stop, result))

msg195775 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2013-08-21 11:43

I'm proposing to split this issue on two issues. One for C-style formatting (I guess everyone agree that '%i' % should return decimal representation of its numerical value) and other for format() which is more questionable.

msg195785 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-21 12:26

Serhiy Storchaka added the comment:

I'm proposing to split this issue on two issues. One for C-style formatting (I guess everyone agree that '%i' % should return decimal representation of its numerical value) and other for format() which is more questionable.

Not sure that's necessary. The code to fix the C-style %-formatting is already (I think) in the last patch I attached.

msg195789 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2013-08-21 13:34

Not sure that's necessary. The code to fix the C-style %-formatting is already (I think) in the last patch I attached.

Could you please extract this part of the patch and add tests? It can be reviewed and committed separately. See also , there is a more serious problem with current code.

msg195790 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-21 13:40

I agree splitting this makes sense, so as to not delay the %-formatting fix. While similar in principle, the fixes are unrelated. We may as well keep this issue the format part, since it's been most discussed here.

msg195791 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-21 13:43

On Wed, Aug 21, 2013 at 6:40 AM, Eric V. Smith <report@bugs.python.org>wrote:

Eric V. Smith added the comment:

I agree splitting this makes sense, so as to not delay the %-formatting fix. While similar in principle, the fixes are unrelated. We may as well keep this issue the format part, since it's been most discussed here.

+1

$REALLIFE took over in the past few days but I'll be back towards the weekend and want to review this part too.

msg195793 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-21 13:50

Serhiy, if you feel this is related to #18780 maybe they can be merged into a single issue (since the comment on #18780 says the fix here will work there too)?

msg195794 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2013-08-21 13:56

I rather think that two discussions about two almost unrelated series of patches in same issue will be confused.

msg196356 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-28 08:17

On 08/14/2013 09:27 PM, on PyDev, Nick Coghlan wrote:

For enums, I believe they should be formatted like their base types (so !s and !r will show the enum name, anything without coercion will show the value).

I agree. While one of the big reasons for an Enum type was the pretty str and repr, I don't see format in that area.

So, these are some of the ways we have to display an object:

str() calls obj.str() repr() calls obj.repr()

"%s" calls obj.str() "%r" calls obj.repr() "%d" calls... not sure, but we see the int value

"{}".format() should (IMO) also display the value of the object

Using int as the case study, its presentation types are ['b', 'd', 'n', 'o', 'x', 'X']. Notice there is no 's' nor 'r' in there, as int expects to display a number, not arbitrary text.

So, for mixed-type Enumerations, I think any format calls should simply be forwarded to the mixed-in type (unless, of course, a custom format was specified in the new Enumeration).

Patch attached.

msg196640 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-31 14:50

The idea looks reasonable. Posted a code review.

msg196676 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-31 21:42

Final (hopefully ;) patch attached. Thanks, Eli, for your comments and help.

msg196677 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-31 21:47

Thanks, Eric, for teaching me a bunch about format. :)

msg196679 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2013-08-31 22:16

Okay, the final final patch. ;)

msg196683 - (view)

Author: Eli Bendersky (eli.bendersky) * (Python committer)

Date: 2013-08-31 22:22

lgtm

msg196687 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2013-08-31 22:49

Looks good to me, too. Thanks for considering all of the feedback!

msg196697 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2013-09-01 02:18

New changeset 058cb219b3b5 by Ethan Furman in branch 'default': Close #18738: Route format calls to mixed-in type for mixed Enums (such as IntEnum). http://hg.python.org/cpython/rev/058cb219b3b5