[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3 (original) (raw)
Victor Stinner victor.stinner at gmail.com
Wed Mar 26 11:10:14 CET 2014
- Previous message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3
- Next message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2014-03-25 23:37 GMT+01:00 Ethan Furman <ethan at stoneleaf.us>:
%a
will callascii()
on the interpolated value.
I'm not sure that I understood correctly: is the "%a" format supported? The result of ascii() is a Unicode string. Does it mean that ("%a" % obj) should give the same result than ascii(obj).encode('ascii', 'strict')?
Would it be possible to add a table or list to summarize supported format characters? I found:
- single byte: %c
- integer: %d, %u, %i, %o, %x, %X, %f, %g, "etc." (can you please complete "etc." ?)
- bytes and bytes method: %s
- ascii(): %a
I guess that the implementation of %a can avoid a conversion from ASCII ("PyUnicode_DecodeASCII" in the following code) and then a conversion to ASCII again (in bytes%args):
PyObject * PyObject_ASCII(PyObject *v) { PyObject *repr, *ascii, *res;
repr = PyObject_Repr(v);
if (repr == NULL)
return NULL;
if (PyUnicode_IS_ASCII(repr))
return repr;
/* repr is guaranteed to be a PyUnicode object by PyObject_Repr */
ascii = _PyUnicode_AsASCIIString(repr, "backslashreplace");
Py_DECREF(repr);
if (ascii == NULL)
return NULL;
res = PyUnicode_DecodeASCII( <==== HERE
PyBytes_AS_STRING(ascii),
PyBytes_GET_SIZE(ascii),
NULL);
Py_DECREF(ascii);
return res;
}
This is intended as a debugging aid, rather than something that should be used in production.
I don't understand the purpose of this sentence. Does it mean that %a must not be used? IMO this sentence can be removed.
Non-ASCII values will be encoded to either
\xnn
or\unnnn
representation.
Unicode is larger than that! print(ascii(chr(0x10ffff))) => '\U0010ffff'
Use cases include developing a new protocol and writing landmarks into the stream; debugging data going into an existing protocol to see if the problem is the protocol itself or bad data; a fall-back for a serialization format; or even a rudimentary serialization format when defining
_bytes_
would not be appropriate [8].
I understand the debug use case. I'm not convinced by the serialization idea :-)
.. note::
If a
str
is passed into%a
, it will be surrounded by quotes.
And:
- bytes gets a "b" prefix and surrounded by quotes as well (b'...')
- the quote ' is escaped as ' if the string contains quotes ' and "
Can you also please add examples for %a?
b"%a" % 123 b'123' b"%s" % ascii(b"bytes") b"b'bytes'" b"%s" % "text" # hum, it's not easy to see surrounding quotes with this examples b"'text'"
The following more complex examples are maybe not needed:
b"%a" % "euro:€" b"'euro:\u20ac'" _b"%a" % """quotes >'"<"""_ b''quotes >\'"<''
Proposed variations ===================
It would be fair to mention also a whole different PEP, Antoine's PEP 460!
Victor
- Previous message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3
- Next message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]