[Python-Dev] PEP 461 Final? (original) (raw)

Ethan Furman ethan at stoneleaf.us
Fri Jan 17 17:49:21 CET 2014


Here's the text for your reading pleasure. I'll commit the PEP after I add some markup.

Major change:

Coming soon:

================================================================================ PEP: 461 Title: Adding % formatting to bytes Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Ethan Furman <ethan at stoneleaf.us> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 Post-History: 2014-01-14, 2014-01-15, 2014-01-17 Resolution:

Abstract

This PEP proposes adding % formatting operations similar to Python 2's str type to bytes [1]_ [2]_.

Overriding Principles

In order to avoid the problems of auto-conversion and Unicode exceptions that could plague Py2 code, all object checking will be done by duck-typing, not by values contained in a Unicode representation [3]_.

Proposed semantics for bytes formatting

%-interpolation

All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers.

Example::

>>> b'%4x' % 10
b'   a'

>>> '%#4x' % 10
' 0xa'

>>> '%04X' % 10
'000A'

%c will insert a single byte, either from an int in range(256), or from a bytes argument of length 1, not from a str.

Example:

 >>> b'%c' % 48
 b'0'

 >>> b'%c' % b'a'
 b'a'

%s is restricted in what it will accept::

Examples:

 >>> b'%s' % b'abc'
 b'abc'

 >>> b'%s' % 3.14
 Traceback (most recent call last):
 ...
 TypeError: 3.14 has no __bytes__ method

 >>> b'%s' % 'hello world!'
 Traceback (most recent call last):
 ...
 TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it?

.. note::

Because the str type does not have a __bytes__ method, attempts to
directly use 'a string' as a bytes interpolation value will raise an
exception.  To use 'string' values, they must be encoded or otherwise
transformed into a bytes sequence::

   'a string'.encode('latin-1')

Numeric Format Codes

To properly handle int and float subclasses, int(), index(), and float() will be called on the objects intended for (d, i, u), (b, o, x, X), and (e, E, f, F, g, G).

Unsupported codes

%r (which calls repr), and %a (which calls ascii() on repr) are not supported.

Proposed variations

It was suggested to let %s accept numbers, but since numbers have their own format codes this idea was discarded.

It has been suggested to use %b for bytes instead of %s.

It has been proposed to automatically use .encode('ascii','strict') for str arguments to %s.

It has been proposed to have %s return the ascii-encoded repr when the value is a str (b'%s' % 'abc' --> b"'abc'").

Originally this PEP also proposed adding format style formatting, but it was decided that format and its related machinery were all strictly text (aka str) based, and it was dropped.

Various new special methods were proposed, such as ascii, format_bytes_, etc.; such methods are not needed at this time, but can be visited again later if real-world use shows deficiencies with this solution.

Footnotes

.. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting .. [2] neither string.Template, format, nor str.format are under consideration. .. [3] %c is not an exception as neither of its possible arguments are unicode.

Copyright

This document has been placed in the public domain.

.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:



More information about the Python-Dev mailing list