(original) (raw)

bytes.format() below. I'll leave it to you to decide if they warrant using, leaving as an open question, or rejecting.

On Tue, Jan 14, 2014 at 2:56 PM, Ethan Furman <ethan@stoneleaf.us> wrote:

Duh. Here's the text, as well. ;)

PEP: 461
Title: Adding % and {} formatting to bytes
Version: RevisionRevisionRevision
Last-Modified: DateDateDate
Author: Ethan Furman <ethan@stoneleaf.us>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-01-13
Python-Version: 3.5
Post-History: 2014-01-13
Resolution:

Abstract
\========

This PEP proposes adding the % and {} formatting operations from str to bytes.

Proposed semantics for bytes formatting
\=======================================

%-interpolation
\---------------

All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.)
will be supported, and will work as they do for str, including the
padding, justification and other related modifiers.

Example::

\>>> b'%4x' % 10
b' a'

%c will insert a single byte, either from an int in range(256), or from
a bytes argument of length 1.

Example:

\>>> b'%c' % 48
b'0'

\>>> b'%c' % b'a'
b'a'

%s, because it is the most general, has the most convoluted resolution:

\- input type is bytes?
pass it straight through

\- input type is numeric?
use its \_\_xxx\_\_ \[1\] \[2\] method and ascii-encode it (strictly)

\- input type is something else?
use its \_\_bytes\_\_ method; if there isn't one, raise an exception \[3\]

Examples:

\>>> b'%s' % b'abc'
b'abc'

\>>> b'%s' % 3.14
b'3.14'

\>>> b'%s' % 'hello world!'
Traceback (most recent call last):
...
TypeError: 'hello world' has no \_\_bytes\_\_ method, perhaps you need to encode it?

.. note::

Because the str type does not have a \_\_bytes\_\_ method, attempts to
directly use 'a string' as a bytes interpolation value will raise an
exception. To use 'string' values, they must be encoded or otherwise
transformed into a bytes sequence::

'a string'.encode('latin-1')

format
\------

The format mini language will be used as-is, with the behaviors as listed
for %-interpolation.

That's too vague; % interpolation does not support other format operators in the same way as str.format() does. % interpolation has specific code to support %d, etc. But str.format() gets supported for {:d} not from special code but because e.g. float.\_\_format\_\_('d') works. So you can't say "bytes.format() supports {:d} just like %d works with string interpolation" since the mechanisms are fundamentally different.

This is why I have argued that if you specify it as "if there is a format spec specified, then the return value from calling \_\_format\_\_() will have str.decode('ascii', 'strict') called on it" you get the support for the various number-specific format specs for free. It also means if you pass in a string that you just want the strict ASCII bytes of then you can get it with {:s}.

I also think that a 'b' conversion be added to bytes.format(). This doesn't have the same issue as %b if you make {} implicitly mean {!b} in Python 3.5 as {} will mean what is the most accurate for bytes.format() in either version. It also allows for explicit support where you know you only want a byte and allows {!s} to mean you only want a string (and thus throw an error otherwise).

And all of this means that much like %s only taking bytes, the only way for bytes.format() to accept a non-byte argument is for some format spec to be specified to trigger the .encode('ascii', 'strict') call.

-Brett

Open Questions
\==============

For %s there has been some discussion of trying to use the buffer protocol
(Py\_buffer) before trying \_\_bytes\_\_. This question should be answered before
the PEP is implemented.

Proposed variations
\===================

It has been suggested to use %b for bytes instead of %s.

\- Rejected as %b does not exist in Python 2.x %-interpolation, which is
why we are using %s.

It has been proposed to automatically use .encode('ascii','strict') for str
arguments to %s.

\- Rejected as this would lead to intermittent failures. Better to have the
operation always fail so the trouble-spot can be correctly fixed.

It has been proposed to have %s return the ascii-encoded repr when the value
is a str (b'%s' % 'abc' --> b"'abc'").

\- Rejected as this would lead to hard to debug failures far from the problem
site. Better to have the operation always fail so the trouble-spot can be
easily fixed.

Foot notes
\==========

.. \[1\] Not sure if this should be the numeric \_\_str\_\_ or the numeric \_\_repr\_\_,
or if there's any difference
.. \[2\] Any proper numeric class would then have to provide an ascii
representation of its value, either via \_\_repr\_\_ or \_\_str\_\_ (whichever
we choose in \[1\]).
.. \[3\] TypeError, ValueError, or UnicodeEncodeError?

Copyright
\=========

This document has been placed in the public domain.

..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org