Message 145403 - Python tracker (original) (raw)

As mentioned in issue 6715, the buffer growth strategy used by _io.FileIO has quadratic running time if each reallocation is O(n). The code in question is new_buffersize() from Modules/_io/fileio.c:

if (currentsize > SMALLCHUNK) {
    /* Keep doubling until we reach BIGCHUNK;
       then keep adding BIGCHUNK. */
    if (currentsize <= BIGCHUNK)
        return currentsize + currentsize;
    else
        return currentsize + BIGCHUNK;
}
return currentsize + SMALLCHUNK;

Is there a reason for this? One possible improvement would be to instead use the same formula as list.resize() in Objects/listobject.c:

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

which has amortized O(n) running time behaviour.

Your thoughts?