Issue 19653: Generalize usage of _PyUnicodeWriter for repr(obj): add _PyObject_ReprWriter() (original) (raw)

The _PyUnicodeWriter API avoids creation of temporary Unicode strings and has very good performances to build Unicode strings with the PEP 393 (compact unicode string).

Attached patch adds a _PyObject_ReprWriter() function to avoid creation of tempory Unicode string while calling repr(obj) on containers like tuple, list or dict.

I did something similar for str%args and str.format(args).

To avoid the following code, we might add something to PyTypeObject, maybe a new tp_repr_writer field.

if (PyLong_CheckExact(v)) {

   return _PyLong_FormatWriter(writer, v, 10, 0);

}
if (PyUnicode_CheckExact(v)) {

   return _PyUnicode_ReprWriter(writer, v);

}
if (PyList_CheckExact(v)) {

   return _PyList_ReprWriter(writer, v);

}
if (PyTuple_CheckExact(v)) {

   return _PyTuple_ReprWriter(writer, v);

}
if (PyList_CheckExact(v)) {

   return _PyList_ReprWriter(writer, v);

}
if (PyDict_CheckExact(v)) {

   return _PyDict_ReprWriter(writer, v);

}

For example, repr(list(range(10))) ('[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]') should only allocate one buffer of 37 bytes and then shink it to 30 bytes.

I guess that benchmarks are required to justify such changes.

As far as I'm aware, the performance of repr() for any object is not much of a concern. Repr is mostly for debugging and interactive use, so it's already fast/efficient enough for the target consumers: us. :) Making repr() easier to write or maintain is worth it as long as benefit outweighs the cost of the churn.