[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3 (original) (raw)

Victor Stinner victor.stinner at gmail.com
Wed Mar 26 11:10:14 CET 2014

Previous message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3
Next message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

2014-03-25 23:37 GMT+01:00 Ethan Furman <ethan at stoneleaf.us>:

%a will call ascii() on the interpolated value.

I'm not sure that I understood correctly: is the "%a" format supported? The result of ascii() is a Unicode string. Does it mean that ("%a" % obj) should give the same result than ascii(obj).encode('ascii', 'strict')?

Would it be possible to add a table or list to summarize supported format characters? I found:

single byte: %c
integer: %d, %u, %i, %o, %x, %X, %f, %g, "etc." (can you please complete "etc." ?)
bytes and bytes method: %s
ascii(): %a

I guess that the implementation of %a can avoid a conversion from ASCII ("PyUnicode_DecodeASCII" in the following code) and then a conversion to ASCII again (in bytes%args):

PyObject * PyObject_ASCII(PyObject *v) { PyObject *repr, *ascii, *res;

repr = PyObject_Repr(v);
if (repr == NULL)
    return NULL;

if (PyUnicode_IS_ASCII(repr))
    return repr;

/* repr is guaranteed to be a PyUnicode object by PyObject_Repr */
ascii = _PyUnicode_AsASCIIString(repr, "backslashreplace");
Py_DECREF(repr);
if (ascii == NULL)
    return NULL;

res = PyUnicode_DecodeASCII(   <==== HERE
    PyBytes_AS_STRING(ascii),
    PyBytes_GET_SIZE(ascii),
    NULL);

Py_DECREF(ascii);
return res;

}

This is intended as a debugging aid, rather than something that should be used in production.

I don't understand the purpose of this sentence. Does it mean that %a must not be used? IMO this sentence can be removed.

Non-ASCII values will be encoded to either \xnn or \unnnn representation.

Unicode is larger than that! print(ascii(chr(0x10ffff))) => '\U0010ffff'

Use cases include developing a new protocol and writing landmarks into the stream; debugging data going into an existing protocol to see if the problem is the protocol itself or bad data; a fall-back for a serialization format; or even a rudimentary serialization format when defining _bytes_ would not be appropriate [8].

I understand the debug use case. I'm not convinced by the serialization idea :-)

.. note::

If a str is passed into %a, it will be surrounded by quotes.

And:

bytes gets a "b" prefix and surrounded by quotes as well (b'...')
the quote ' is escaped as ' if the string contains quotes ' and "

Can you also please add examples for %a?

b"%a" % 123 b'123' b"%s" % ascii(b"bytes") b"b'bytes'" b"%s" % "text" # hum, it's not easy to see surrounding quotes with this examples b"'text'"

The following more complex examples are maybe not needed:

b"%a" % "euro:€" b"'euro:\u20ac'" _b"%a" % """quotes >'"<"""_ b''quotes >\'"<''

Proposed variations ===================

It would be fair to mention also a whole different PEP, Antoine's PEP 460!

Victor

Previous message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3
Next message: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list