PEP 460 – Add binary interpolation and formatting | peps.python.org (original) (raw)
Author:
Antoine Pitrou
Status:
Withdrawn
Type:
Standards Track
Created:
06-Jan-2014
Python-Version:
3.5
Table of Contents
- Abstract
- Rationale
- Binary formatting features
- Criticisms
- Other proposals
- Resolution
- References
- Copyright
Abstract
This PEP proposes to add minimal formatting operations to bytes and bytearray objects. The proposed additions are:
bytes % ...
andbytearray % ...
for percent-formatting, similar in syntax to percent-formatting onstr
objects (accepting a single object, a tuple or a dict).bytes.format(...)
andbytearray.format(...)
for a formatting similar in syntax tostr.format()
(accepting positional as well as keyword arguments).bytes.format_map(...)
andbytearray.format_map(...)
for an API similar tostr.format_map(...)
, with the same formatting syntax and semantics asbytes.format()
andbytearray.format()
.
Rationale
In Python 2, str % args
and str.format(args)
allow the formatting and interpolation of bytestrings. This feature has commonly been used for the assembling of protocol messages when protocols are known to use a fixed encoding.
Python 3 generally mandates that text be stored and manipulated as unicode (i.e. str
objects, not bytes
). In some cases, though, it makes sense to manipulate bytes
objects directly. Typical usage is binary network protocols, where you can want to interpolate and assemble several bytes object (some of them literals, some of them compute) to produce complete protocol messages. For example, protocols such as HTTP or SIP have headers with ASCII names and opaque “textual” values using a varying and/or sometimes ill-defined encoding. Moreover, those headers can be followed by a binary body… which can be chunked and decorated with ASCII headers and trailers!
While there are reasonably efficient ways to accumulate binary data (such as using a bytearray
object, the bytes.join
method or even io.BytesIO
), none of them leads to the kind of readable and intuitive code that is produced by a %-formatted or {}-formatted template and a formatting operation.
Binary formatting features
Supported features
In this proposal, percent-formatting for bytes
and bytearray
supports the following features:
- Looking up formatting arguments by position as well as by name (i.e.,
%s
as well as%(name)s
). %s
will try to get aPy_buffer
on the given value, and fallback on calling__bytes__
. The resulting binary data is inserted at the given point in the string. This is expected to work with bytes, bytearray and memoryview objects (as well as a couple others such as pathlib’s path objects).%c
will accept an integer between 0 and 255, and insert a byte of the given value.
Braces-formatting for bytes
and bytearray
supports the following features:
- All the kinds of argument lookup supported by
str.format()
(explicit positional lookup, auto-incremented positional lookup, keyword lookup, attribute lookup, etc.) - Insertion of binary data when no modifier or layout is specified (e.g.
{}
,{0}
,{name}
). This has the same semantics as%s
for percent-formatting (see above). - The
c
modifier will accept an integer between 0 and 255, and insert a byte of the given value (same as%c
above).
Unsupported features
All other features present in formatting of str
objects (either through the percent operator or the str.format()
method) are unsupported. Those features imply treating the recipient of the operator or method as text, which goes counter to the text / bytes separation (for example, accepting %d
as a format code would imply that the bytes object really is an ASCII-compatible text string).
Amongst those unsupported features are not only most type-specific format codes, but also the various layout specifiers such as padding or alignment. Besides, str
objects are not acceptable as arguments to the formatting operations, even when using e.g. the %s
format code.
__format__
isn’t called.
Criticisms
- The development cost and maintenance cost.
- In 3.3 encoding to ASCII or latin-1 is as fast as memcpy (but it still creates a separate object).
- Developers will have to work around the lack of binary formatting anyway, if they want to support Python 3.4 and earlier.
- bytes.join() is consistently faster than format to join bytes strings (XXX is it?).
- Formatting functions could be implemented in a third party module, rather than added to builtin types.
Other proposals
A new type datatype
It was proposed to create a new datatype specialized for “network programming”. The authors of this PEP believe this is counter-productive. Python 3 already has several major types dedicated to manipulation of binary data: bytes
, bytearray
, memoryview
, io.BytesIO
.
Adding yet another type would make things more confusing for users, and interoperability between libraries more painful (also potentially sub-optimal, due to the necessary conversions).
Moreover, not one type would be needed, but two: one immutable type (to allow for hashing), and one mutable type (as efficient accumulation is often necessary when working with network messages).
Resolution
This PEP is made obsolete by the acceptanceof PEP 461, which introduces a more extended formatting language for bytes objects in conjunction with the modulo operator.
References
- Issue #3982: support .format for bytes
- Mercurial project
- Twisted project
- Documentation of Python 2 formatting (str % args)
- Documentation of Python 2 formatting (str.format)
Copyright
This document has been placed in the public domain.