[Python-Dev] Advice sought on memory allocation latency reduction C1X standard proposal (original) (raw)

Niall Douglas s_sourceforge at nedprod.com
Wed Sep 22 22:12:59 CEST 2010


Dear Python Devs,

I am hoping to gain feedback on an ISO C1X/C++ standard library proposal I hope to submit. It consists of a rationale (http://mallocv2.wordpress.com/) which shows how growth in RAM capacity is exponentially outgrowing the growth in RAM access speed. The consequences are profound: computer software which has always been written under the assumption of scarcity of RAM capacity will need to be retargeted to assume the scarcity of RAM access speed instead.

The C1X proposal (http://mallocv2.wordpress.com/the-c-proposal-text/) enables four things of interest to Python: (i) aligned block resizing (ii) speculative in-place block resizing (iii) batch block allocation and (iv) the ability to reserve address space, thus avoiding the need to overallocate array storage.

Aligned block resizing is especially useful to numpy. Where one has an array of aligned SSE vector quantities one cannot currently resize that block and guarantee that alignment will not be destroyed. With the new feature of non-relocating realloc() and being able to specify an alignment to realloc() one may avoid memory copying, and therefore reduce memory bandwidth utilisation and therefore overall memory access latencies.

The ability to reserve address space and speculative in-place block resizing can be combined to allow Python to reserve an arbitrary amount of address space after the storage for an array object. Should the array then become extended, the speculative in-place block resizing can attempt to expand storage into that reserved space without having to relocate the contents of the storage. This again translates into much reduced memory copying as well as memory consumption, and once again reduces overall memory access latencies.

Lastly, the batch allocation mechanism allows a sequence of allocations to be performed at once. I don't know of any attempts to have Python make use of similar functionality in Linux's system allocator, however Perl saw a 18% reduction in startup time (http://groups.google.com/group/perl-compiler/msg/31bca5297764002b).

I am not familiar with Python's implementation outside working extensively with Boost.Python, so I was hoping that this list could advise me on what I might be forgetting, what problems there could be for Python with this design and/or any other general concerns and thoughts. I thank the list in advance for your time and consideration.

Niall Douglas

-- Technology & Consulting Services - ned Productions Limited. http://www.nedproductions.biz/. VAT reg: IE 9708311Q. Company no: 472909.



More information about the Python-Dev mailing list