[Python-Dev] The untuned tunable parameter ARENA_SIZE (original) (raw)

Larry Hastings larry at hastings.org
Fri Jun 2 16:05:21 EDT 2017

Previous message (by thread): [Python-Dev] The untuned tunable parameter ARENA_SIZE
Next message (by thread): [Python-Dev] The untuned tunable parameter ARENA_SIZE
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 06/02/2017 02:38 AM, Antoine Pitrou wrote:

I hope those are not the actual numbers you're intending to use ;-) I still think that allocating more than 1 or 2MB at once would be foolish. Remember this is data that's going to be carved up into (tens of) thousands of small objects. Large objects eschew the small object allocator (not to mention that third-party libraries like Numpy may be using different allocation routines when they allocate very large data).

Honest, I'm well aware of what obmalloc does and how it works. I bet I've spent more time crawling around in it in the last year than anybody else on the planet. Mainly because it works so well for CPython, nobody else needed to bother!

I'm also aware, for example, that if your process grows to consume gigabytes of memory, you're going to have tens of thousands of allocated arenas. The idea that on systems with gigabytes of memory--90%+? of current systems running CPython--we should allocate memory forever in 256kb chunks is faintly ridiculous. I agree that we should start small, and ramp up slowly, so Python continues to run well on small computers and not allocate tons of memory for small programs. But I also think we should ramp up ever, for programs that use tens or hundreds of megabytes.

Also note that if we don't touch the allocated memory, smart modern OSes won't actually commit any resources to it. All that happens when your process allocates 1GB is that the OS changes some integers around. It doesn't actually commit any memory to your process until you attempt to write to that memory, at which point it gets mapped in in local-page-size chunks (4k? 8k? something in that neighborhood and power-of-2 sized). So if we allocate 32mb, and only touch the first 1mb, the other 31mb doesn't consume any real resources. I was planning on making the multi-arena code only touch memory when it actually needs to, similarly to the way obmalloc lazily consumes memory inside an allocated pool (see the nextoffset field in pool_header), to take advantage of this ubiquitous behavior.

If I write this multi-arena code, which I might, I was thinking I'd try this approach:

leave arenas themselves at 256k
start with a 1MB multi-arena size
every time I allocate a new multi-arena, multiply the size of the next multi-arena by 1.5 (rounding up to 256k each time)
every time I free a multi-arena, divide the size of the next multi-arena by 2 (rounding up to 256k each time)
if allocation of a multi-arena fails, use a binary search algorithm to allocate the largest multi-arena possible (rounding up to 256k at each step)
cap the size of multi arenas at, let's say, 32mb

So multi-arenas would be 1mb, 1.5mb, 2.25mb, 3.5mb (round up!), etc.

Fun fact: Python allocates 16 arenas at the start of the program, just to initialize obmalloc. That consumes 4mb of memory. With the above multi-arena approach, that'd allocate the first three multi-arenas, pre-allocating 19 arenas, leaving 3 unused. It's mildly tempting to make the first multi-arena be 4mb, just so this is exactly right-sized, but... naah.

//arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20170602/71562c56/attachment.html>

Previous message (by thread): [Python-Dev] The untuned tunable parameter ARENA_SIZE
Next message (by thread): [Python-Dev] The untuned tunable parameter ARENA_SIZE
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list