[Python-Dev] PEP 461 - Adding % and {} formatting to bytes (original) (raw)
Brett Cannon brett at python.org
Wed Jan 15 15:45:10 CET 2014
- Previous message: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes
- Next message: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
bytes.format() below. I'll leave it to you to decide if they warrant using, leaving as an open question, or rejecting.
On Tue, Jan 14, 2014 at 2:56 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
Duh. Here's the text, as well. ;)
PEP: 461 Title: Adding % and {} formatting to bytes Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Ethan Furman <ethan at stoneleaf.us> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 Post-History: 2014-01-13 Resolution: Abstract ======== This PEP proposes adding the % and {} formatting operations from str to bytes. Proposed semantics for bytes formatting ======================================= %-interpolation --------------- All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. Example:: >>> b'%4x' % 10 b' a' %c will insert a single byte, either from an int in range(256), or from a bytes argument of length 1. Example: >>> b'%c' % 48 b'0' >>> b'%c' % b'a' b'a' %s, because it is the most general, has the most convoluted resolution: - input type is bytes? pass it straight through - input type is numeric? use its xxx [1] [2] method and ascii-encode it (strictly) - input type is something else? use its bytes method; if there isn't one, raise an exception [3] Examples: >>> b'%s' % b'abc' b'abc' >>> b'%s' % 3.14 b'3.14' >>> b'%s' % 'hello world!' Traceback (most recent call last): ... TypeError: 'hello world' has no bytes method, perhaps you need to encode it? .. note:: Because the str type does not have a bytes method, attempts to directly use 'a string' as a bytes interpolation value will raise an exception. To use 'string' values, they must be encoded or otherwise transformed into a bytes sequence:: 'a string'.encode('latin-1') format ------ The format mini language will be used as-is, with the behaviors as listed for %-interpolation.
That's too vague; % interpolation does not support other format operators in the same way as str.format() does. % interpolation has specific code to support %d, etc. But str.format() gets supported for {:d} not from special code but because e.g. float.format('d') works. So you can't say "bytes.format() supports {:d} just like %d works with string interpolation" since the mechanisms are fundamentally different.
This is why I have argued that if you specify it as "if there is a format spec specified, then the return value from calling format() will have str.decode('ascii', 'strict') called on it" you get the support for the various number-specific format specs for free. It also means if you pass in a string that you just want the strict ASCII bytes of then you can get it with {:s}.
I also think that a 'b' conversion be added to bytes.format(). This doesn't have the same issue as %b if you make {} implicitly mean {!b} in Python 3.5 as {} will mean what is the most accurate for bytes.format() in either version. It also allows for explicit support where you know you only want a byte and allows {!s} to mean you only want a string (and thus throw an error otherwise).
And all of this means that much like %s only taking bytes, the only way for bytes.format() to accept a non-byte argument is for some format spec to be specified to trigger the .encode('ascii', 'strict') call.
-Brett
Open Questions ============== For %s there has been some discussion of trying to use the buffer protocol (Pybuffer) before trying bytes. This question should be answered before the PEP is implemented. Proposed variations =================== It has been suggested to use %b for bytes instead of %s. - Rejected as %b does not exist in Python 2.x %-interpolation, which is why we are using %s. It has been proposed to automatically use .encode('ascii','strict') for str arguments to %s. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have %s return the ascii-encoded repr when the value is a str (b'%s' % 'abc' --> b"'abc'"). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Foot notes ========== .. [1] Not sure if this should be the numeric str or the numeric repr, or if there's any difference .. [2] Any proper numeric class would then have to provide an ascii representation of its value, either via repr or str (whichever we choose in [1]). .. [3] TypeError, ValueError, or UnicodeEncodeError? Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ brett%40python.org -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140115/22708b67/attachment.html>
- Previous message: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes
- Next message: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]