[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Request for Pronouncement (original) (raw)
Guido van Rossum guido at python.org
Thu Mar 27 21:44:10 CET 2014
- Previous message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Request for Pronouncement
- Next message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Request for Pronouncement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Accepted. Congrats with marshalling yet another quite contentious discussion, and putting up with my last-minute block-headedness!
If you're going to commit another change, may I suggest to add, to the section stating that %r is not supported, that %a is usually a suitable replacement for %r?
On Thu, Mar 27, 2014 at 1:07 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
Requesting pronouncement on PEP 461. Full text below.
============================================================ =================== PEP: 461 Title: Adding % formatting to bytes and bytearray Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Ethan Furman <ethan at stoneleaf.us> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25, 2014-03-27 Resolution:
Abstract ======== This PEP proposes adding % formatting operations similar to Python 2's
str
type tobytes
andbytearray
[1] [2]. Rationale ========= While interpolation is usually thought of as a string operation, there are cases where interpolation onbytes
orbytearrays
make sense, and the work needed to make up for this missing functionality detracts from the overall readability of the code. Motivation ========== With Python 3 and the split betweenstr
andbytes
, one small but important area of programming became slightly more difficult, and much more painful -- wire format protocols [3]. This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a restricted %-interpolation forbytes
andbytearray
will aid both in writing new wire format code, and in porting Python 2 wire format code. Common use-cases includedbf
andFTP
andHTTP
communications, among many others. Proposed semantics forbytes
andbytearray
formatting ============================================================= %-interpolation --------------- All the numeric formatting codes (d
,i
,o
,u
,x
,X
,e
,E
,f
,F
,g
,G
, and any that are subsequently added to Python 3) will be supported, and will work as they do for str, including the padding, justification and other related modifiers (currently#
,0
,-
,(space), and
+
(plus any added to Python 3)). The only non-numeric codes allowed arec
,b
,a
, ands
(which is a synonym for b). For the numeric codes, the only difference betweenstr
andbytes
(orbytearray
) interpolation is that the results from these codes will be ASCII-encoded text, not unicode. In other words, for any numeric formatting code%x
:: b"%x" % val is equivalent to:: ("%x" % val).encode("ascii") Examples:: >>> b'%4x' % 10 b' a' >>> b'%#4x' % 10 ' 0xa' >>> b'%04X' % 10 '000A'%c
will insert a single byte, either from anint
in range(256), or from abytes
argument of length 1, not from astr
. Examples:: >>> b'%c' % 48 b'0' >>> b'%c' % b'a' b'a'%b
will insert a series of bytes. These bytes are collected in one of two ways: - input type supportsPybuffer
[4]? use it to collect the necessary bytes - input type is something else? use its_bytes_
method [5] ; if there isn't one, raise aTypeError
In particular,%b
will not accept numbers norstr
.str
is rejected as the string to bytes conversion requires an encoding, and we are refusing to guess; numbers are rejected because: - what makes a number is fuzzy (float? Decimal? Fraction? some user type?) - allowing numbers would lead to ambiguity between numbers and textual representations of numbers (3.14 vs '3.14') - given the nature of wire formats, explicit is definitely better than implicit%s
is included as a synonym for%b
for the sole purpose of making 2/3 code bases easier to maintain. Python 3 only code should use%b
. Examples:: >>> b'%b' % b'abc' b'abc' >>> b'%b' % 'some string'.encode('utf8') b'some string' >>> b'%b' % 3.14 Traceback (most recent call last): ... TypeError: b'%b' does not accept 'float' >>> b'%b' % 'hello world!' Traceback (most recent call last): ... TypeError: b'%b' does not accept 'str'%a
will give the equivalent ofrepr(someobj).encode('ascii', 'backslashreplace')
on the interpolated value. Use cases include developing a new protocol and writing landmarks into the stream; debugging data going into an existing protocol to see if the problem is the protocol itself or bad data; a fall-back for a serialization format; or any situation where defining_bytes_
would not be appropriate but a readable/informative representation is needed [6]. Examples:: >>> b'%a' % 3.14 b'3.14' >>> b'%a' % b'abc' b"b'abc'" >>> b'%a' % 'def' b"'def'" Unsupported codes -----------------%r
(which calls_repr_
and returns astr
) is not supported. Compatibility with Python 2 =========================== As noted above,%s
is being included solely to help ease migration from, and/or have a single code base with, Python 2. This is important as there are modules both in the wild and behind closed doors that currently use the Python 2str
type as abytes
container, and hence are using%s
as a bytes interpolator. However,%b
should be used in new, Python 3 only code, so%s
will immediately be deprecated, but not removed until the next major Python release. Proposed variations =================== It has been proposed to automatically use.encode('ascii','strict')
forstr
arguments to%b
. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have%b
return the ascii-encoded repr when the value is astr
(b'%b' % 'abc' --> b"'abc'"). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Originally this PEP also proposed adding format-style formatting, but it was decided that format and its related machinery were all strictly text (akastr
) based, and it was dropped. Various new special methods were proposed, such as_ascii_
,_formatbytes_
, etc.; such methods are not needed at this time, but can be visited again later if real-world use shows deficiencies with this solution. A competing PEP,PEP 460 Add binary interpolation and formatting
[7], also exists. Objections ========== The objections raised against this PEP were mainly variations on two themes: - thebytes
andbytearray
types are for pure binary data, with no assumptions about encodings - offering %-interpolation that assumes an ASCII encoding will be an attractive nuisance and lead us back to the problems of the Python 2str
/unicode
text model As was seen during the discussion,bytes
andbytearray
are also used for mixed binary data and ASCII-compatible segments: file formats such asdbf
andftp
andbytes
andbytearray
already have several methods which assume an ASCII compatible encoding.upper()
,isalpha()
, andexpandtabs()
to name just a few. %-interpolation, with its very restricted mini-language, will not be any more of a nuisance than the already existing methods. Some have objected to allowing the full range of numeric formatting codes with the claim that decimal alone would be sufficient. However, at least two formats (dbf and pdf) make use of non-decimal numbers. Footnotes ========= .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting .. [2] neither string.Template, format, nor str.format are under consideration .. [3] https://mail.python.org/pipermail/python-dev/2014- January/131518.html .. [4] http://docs.python.org/3/c-api/buffer.html examples:memoryview
,array.array
,bytearray
,bytes
.. [5] http://docs.python.org/3/reference/datamodel.html#object._bytes .. [6] https://mail.python.org/pipermail/python-dev/2014- February/132750.html .. [7] http://python.org/dev/peps/pep-0460/ Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ guido%40python.org
-- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140327/cdcdd43a/attachment-0001.html>
- Previous message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Request for Pronouncement
- Next message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Request for Pronouncement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]