Add PyBytesWriter public C API (original) (raw)

February 18, 2025, 3:33pm 1

Hi,

I propose adding a PyBytesWriter public C API to create a bytes object. It replaces the PyBytes_FromStringAndSize(NULL, size), which creates an inconsistent bytes objects (bytes are not initialized), and _PyBytes_Resize(), which treats an immutable bytes object as a mutable object. Proposed API only produces a bytes object once writing is complete: when PyBytesWriter_Finish() is called.

The PyBytesWriter_Extend() function overallocates the buffer to reduce the cost of multiple reallocations.

The API gives a void* pointer (which can be cast to char*) so the main code works directly on regular fast pointer operations, rather than going through an abstraction.

API:

typedef struct PyBytesWriter PyBytesWriter;

PyAPI_FUNC(void*) PyBytesWriter_Create(
    PyBytesWriter **writer,
    Py_ssize_t alloc);
PyAPI_FUNC(void) PyBytesWriter_Discard(
    PyBytesWriter *writer);
PyAPI_FUNC(PyObject*) PyBytesWriter_Finish(
    PyBytesWriter *writer,
    void *buf);

PyAPI_FUNC(Py_ssize_t) PyBytesWriter_GetRemaining(
    PyBytesWriter *writer,
    void *buf);
PyAPI_FUNC(void*) PyBytesWriter_Extend(
    PyBytesWriter *writer,
    void *buf,
    Py_ssize_t extend);
PyAPI_FUNC(void*) PyBytesWriter_WriteBytes(
    PyBytesWriter *writer,
    void *buf,
    const void *bytes,
    Py_ssize_t size);
PyAPI_FUNC(void*) PyBytesWriter_Format(
    PyBytesWriter *writer,
    void *buf,
    const char *format,
    ...);

Simple example creating the string b"abc":

PyObject* create_abc(void)
{
    PyBytesWriter *writer;
    char *str = PyBytesWriter_Create(&writer, 3);
    if (writer == NULL) return NULL;

    memcpy(str, "abc", 3);
    str += 3;

    return PyBytesWriter_Finish(writer, str);
}

See also:

What do you think of the proposed public C API?

Victor

MRAB (Matthew Barnett) February 18, 2025, 3:50pm 2

How about something more like:

/* Creates a new writer. */
PyBytesWriter* PyBytesWriter_Create(Py_ssize_t alloc);

/* Appends more bytes to a writer. */
int PyBytesWriter_WriteBytes(PyBytesWriter* writer, void* buf, Py_ssize_t size);

/* Makes a bytes object, discards the writer, and returns the bytes object. */
PyObject* PyBytesWriter_Finish(PyBytesWriter* writer);

steve.dower (Steve Dower) February 18, 2025, 3:52pm 3

I think we need good reasons for any divergence from the PyUnicodeWriter API: Unicode Objects and Codecs — Python 3.14.0a5 documentation (apart from using uint8_t in place of character types, of course, that’s an obvious difference)

pitrou (Antoine Pitrou) February 18, 2025, 5:52pm 4

Why do many of these API functions take a void *buf parameter? I don’t see an explanation in the proposed API docs.

storchaka (Serhiy Storchaka) February 18, 2025, 5:53pm 5

Performance. It’s all performance. In common performance sensitive code you pre-allocated the buffer, then copy one or few characters/bytes per time very fast, with minimal overhead.

while ...:
    if not can_copy():
        resize_and_widen()
    copy()

In case of bytes, you only need to check if there is enough room in the buffer (by comparing two pointers). Copying is just a byte assignment or memcpy() with following increasing of the pointer. Often the size of the resulting bytes object is known in advance, so no check is needed.

In case of characters you need to check also if the kind of the Unicode object is enough for the copied characters. Copying depends on the kinds of the buffer and the input data, it is not just memcpy(). So all operations are more expensive. And overhead added by calling the C API function is relatively smaller.

For bytes, we lose most of advantages of PyBytesWriter if require to call PyBytesWriter_WriteBytes() instead of *dst++ = c or memcpy(dst, src, n); dst += n. It will be faster to use PyBytes_FromStringAndSize() and _PyBytes_Resize() (we can finally add a public safer variant of it). Much of the code that now uses _PyBytesWriter was highly optimized before. _PyBytesWriter is used only because it provides an equivalent performance. If we replace it with a slower variant, we will need to return to the old code.

pitrou (Antoine Pitrou) February 18, 2025, 7:34pm 6

Er, sorry? I’m asking for an explanation of those APIs. I still don’t understand what the void* buf is for.

Edit: I realize quite lately that you were probably answering @steve.dower 's question, not mine? Sorry :slight_smile:

encukou (Petr Viktorin) February 19, 2025, 9:00am 7

As the caller you’re expected to manage both a data pointer and a writer pointer:

char *buf;
PyBytesWriter *writer;

buf = PyBytesWriter_Create(&writer, 6);
/* error handling omitted */

Then, for each function (except _Discard), you pass both of those in, and (except for _Finish) you assign the result back to the data pointer:

buf = PyBytesWriter_WriteBytes(writer, buf, "abc", 3);

You can also fill the buffer manually, if there’s enough space reserved. That’s what you do for performance:

memcpy(buf, "xyz", 3);
buf += 3;

Which seems a bit silly, and I find it hard to explain for the docs.
It should be possible for Python to define a public struct with a public member:

typedef struct {void *buf, _PyBytesWriter_InternalObject *_writer} PyBytesWriter;

so the usage would be easier to explain:

PyBytesWriter writer;
int result = PyBytesWriter_Init(&writer, 6)

PyBytesWriter_WriteBytes(&writer, "abc", 3);

memcpy(writer.buf, "xyz", 3);
writer.buf += 3;

(We tend to avoid structs in the API because it’s hard to change them later, but making every function take two related arguments has exactly the same issue.)

ruang (James Roy (RUANG)) February 19, 2025, 12:13pm 8

It seems that our questions are the same (Maybe). I think, if void * is used as a form of loose polymorphism, it might be beneficial to clarify the scope of conversions that void * permits. If it’s only going to be a single byte buffer, I think returning a char * is more expressive.

pitrou (Antoine Pitrou) February 19, 2025, 12:46pm 9

Why would I? The PyBytesWriter is supposed to handle its buffer by itself, why would I need to pass that parameter again and again?

It would be easier actually if you used the same API calls in all your example :wink: In one case you’re calling PyBytesWriter_WriteBytes(writer, buf, "abc", 3), in the other PyBytesWriter_WriteBytes(&writer, "abc", 3) (without buf). Which one is correct?

vstinner (Victor Stinner) February 19, 2025, 1:13pm 10

PyBytesWriter_Extend(), PyBytesWriter_WriteBytes() and PyBytesWriter_Format() extend the buffer: realloc() is called which can move the buffer in memory, so these functions return a new buffer pointer. To compute the buffer pointer, you need to pass the pointer as an argument to compute the position inside the buffer.

Simplified code:

void*
PyBytesWriter_Extend(PyBytesWriter *writer, void *buf, Py_ssize_t extend)
{
    ...
    Py_ssize_t pos = (char*)buf - byteswriter_start(writer);
    ...
    char *start = byteswriter_alloc(writer, alloc_size, overallocate);
    ...
    return start + pos;
}

It’s possible to store buf or the buffer position inside the writer, but I tried that in the past, and it made the whole API way slower. Updating existing code to use the writer API made the code slower in this case, instead of making it faster. That’s why buf is not part of the writer object/structure, but treated separately.

vstinner (Victor Stinner) February 19, 2025, 1:15pm 11

I chose void* to allow the API to be used easily with char*, unsigned char*, Py_UCS1* or anything else. The cast from void* is implicit and is convenient to use.

pitrou (Antoine Pitrou) February 19, 2025, 1:26pm 12

Really? That seems to make little sense. Did you try with several different compilers?

encukou (Petr Viktorin) February 19, 2025, 2:12pm 13

That’s essentially the question I asked in my comment. I first tried to explain Victor’s proposal, then I proposed a slightly different API.

vstinner (Victor Stinner) February 20, 2025, 4:43pm 14

I wrote an article on my very first attempt which put the pointer inside the writer structure: Fast _PyAccu, _PyUnicodeWriter and_PyBytesWriter APIs to produce strings in CPython — Victor Stinner blog 3, see the “_PyBytesWriter API: first try, big fail” section of this article. See also my question to the gcc-help mailing list: Victor Stinner - Missed optimization when using a structure (no one replied). In short, it was slower than not using this API.

The void* in and void* out design is part of the existing private _PyBytesWriter API which is being used by Python for longer than 10 years. It helped to make many functions a little bit faster (up to 50x faster in some worst cases) in the past.

I didn’t just rename the private API to make it public, but I wrote a new API and a new implementation for the public C API (based on the private API design).

pitrou (Antoine Pitrou) February 20, 2025, 5:05pm 15

Thanks for the pointers. I have two observations:

  1. Your experiments took place in 2013-2016, AFAICT. gcc might have become much better since, have you tried again?
  2. The blog post says:

    For the writer.str++ instruction, the new pointer value is written immediatly in the structure.
    Right, so why not simply store the pointer in a local variable during the loop and then update the structure at the end?

vstinner (Victor Stinner) February 21, 2025, 4:45pm 16

I just tried again the code that I posted on gcc-help. Using a structure, GCC 14.2 still produces an inefficient loop (badly optimized). Using a pointer (char*), GCC 14.2 calls the efficient memset().

Isn’t it basically the proposed API? I’m not sure that I understand.

Proposed API gives a pointer used in a local variable which produces the most efficient machine code with GCC. You have to pass back this pointer to PyBytesWriter functions to get the position in the buffer, and then the local variable is updated to the new buffer (since it can move in memory).

See the more complex example using PyBytesWriter_Extend() function. I changed the indentation to put “hot code” at the first level of indentation:

static PyObject *
byteswriter_center_example(Py_ssize_t spaces, char *str, Py_ssize_t str_size)
{
        PyBytesWriter *writer;
        char *buf = PyBytesWriter_Create(&writer, spaces * 2);
        if (buf == NULL) {
            goto error;
        }

    memset(buf, ' ', spaces);
    buf += spaces;
 
        buf = PyBytesWriter_Extend(writer, buf, str_size);
        if (buf == NULL) {
            goto error;
        }
 
    memcpy(buf, str, str_size);
    buf += str_size;
 
    memset(buf, ' ', spaces);
    buf += spaces;
 
        return PyBytesWriter_Finish(writer, buf);

error:
        PyBytesWriter_Discard(writer);
        return NULL;
}

Between PyBytesWriter API calls, the code only uses the local variable buf.

pitrou (Antoine Pitrou) February 21, 2025, 10:13pm 17

What I’m suggesting is to take this loop from your example:

        for(i=0; i<n; i++)
            *writer.str++ = '?';

and turn it into this:

        char* out = writer.str;
        for(i=0; i<n; i++)
            *out++ = '?';
        writer.str = out;

vstinner (Victor Stinner) February 21, 2025, 10:44pm 18

In the proposed API, writer is an opaque structure. You cannot read or write an (hypothetical) writer.str member.

pitrou (Antoine Pitrou) February 22, 2025, 8:57am 19

The example was from your own e-mail to the GCC list, @vstinner .

vstinner (Victor Stinner) March 10, 2025, 4:37pm 20

So what do you think of proposed API? I understood that it’s not convenient to have to pass both writer and buf to each function, but it’s a trade-off for best performance. Using buf (ex: char *buf) is convenient to write directly into the buffer and to track the position inside the writer buffer.