[Python-Dev] PEP 461 updates (original) (raw)

Ethan Furman ethan at stoneleaf.us
Thu Jan 16 01:13:55 CET 2014


Current copy of PEP, many modifications from all the feedback. Thank you to everyone.

I know it's been a long week (feels a lot longer!) while all this was hammered out, but I think we're getting close!

============================ Abstract

This PEP proposes adding the % and {} formatting operations from str to bytes [1].

Overriding Principles

In order to avoid the problems of auto-conversion and value-generated exceptions, all object checking will be done via isinstance, not by values contained in a Unicode representation. In other words::

Proposed semantics for bytes formatting

%-interpolation

All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers, except locale.

Example::

>>> b'%4x' % 10
b'   a'

%c will insert a single byte, either from an int in range(256), or from a bytes argument of length 1.

Example:

 >>> b'%c' % 48
 b'0'

 >>> b'%c' % b'a'
 b'a'

%s is restricted in what it will accept::

Examples:

 >>> b'%s' % b'abc'
 b'abc'

 >>> b'%s' % 3.14
 Traceback (most recent call last):
 ...
 TypeError: 3.14 has no __bytes__ method

 >>> b'%s' % 'hello world!'
 Traceback (most recent call last):
 ...
 TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?

.. note::

Because the str type does not have a __bytes__ method, attempts to
directly use 'a string' as a bytes interpolation value will raise an
exception.  To use 'string' values, they must be encoded or otherwise
transformed into a bytes sequence::

   'a string'.encode('latin-1')

format

The format mini language codes, where they correspond with the %-interpolation codes, will be used as-is, with three exceptions::

Numeric Format Codes

To properly handle int and float subclasses, int(), index(), and float() will be called on the objects intended for (d, i, u), (b, o, x, X), and (e, E, f, F, g, G).

Unsupported codes

%r (which calls repr), and %a (which calls ascii() on repr) are not supported.

!r and !a are not supported.

The n integer and float format code is not supported.

Open Questions

Currently non-numeric objects go through::

Do we want to add a format_bytes method in there?

Do we need to support all the numeric format codes? The floating point exponential formats seem less appropriate, for example.

Proposed variations

It was suggested to let %s accept numbers, but since numbers have their own format codes this idea was discarded.

It has been suggested to use %b for bytes instead of %s.

It has been proposed to automatically use .encode('ascii','strict') for str arguments to %s.

It has been proposed to have %s return the ascii-encoded repr when the value is a str (b'%s' % 'abc' --> b"'abc'").

Footnotes

.. [1] string.Template is not under consideration. .. [2] TypeError, ValueError, or UnicodeEncodeError?

-- Ethan



More information about the Python-Dev mailing list