[Python-Dev] PEP 393 review (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Mon Aug 29 22:32:01 CEST 2011

Previous message: [Python-Dev] PEP 393 review
Next message: [Python-Dev] PEP 393 review
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

tl;dr: PEP-393 reduces the memory usage for strings of a very small Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB.

Am 26.08.2011 16:55, schrieb Guido van Rossum:

It would be nice if someone wrote a test to roughly verify these numbers, e.v. by allocating lots of strings of a certain size and measuring the process size before and after (being careful to adjust for the list or other data structure required to keep those objects alive).

I have now written a Django application to measure the effect of PEP 393, using the debug mode (to find all strings), and sys.getsizeof:

https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py

The results for 3.3 and pep-393 are attached.

The Django app is small in every respect: trivial ORM, very few objects (just for the sake of exercising the ORM at all), no templating, short strings. The memory snapshot is taken in the middle of a request.

The tests were run on a 64-bit Linux system with 32-bit Py_UNICODE.

The tally of strings by length confirms that both tests have indeed comparable sets of objects (not surprising since it is identical Django source code and the identical application). Most strings in this benchmark are shorter than 16 characters, and a few have several thousand characters. The tally of byte lengths shows that it's the really long memory blocks that are gone with the PEP.

Digging into the internal representation, it's possibly to estimate "unaccounted" bytes. For PEP 393:

bytes - 80*strings - (chars+strings) = 190053

This is the total of the wchar_t and UTF-8 representations for objects that have them, plus any 2-byte and four-byte strings accounted incorrectly in above formula. Unfortunately, for "default"

bytes + 56strings - 4(chars+strings) = 0

as unicode__sizeof__ doesn't account for the (separate) PyBytes object that may carry the default encoding. So in practice, the 3.3 number should be somewhat larger.

In both cases, the app didn't cope for internal fragmentation; this would be possible by rounding up each string size to the next multiple of 8 (given that it's all allocated through the object allocator).

It should be possible to squeeze a little bit out of the 190kB, by finding objects for which the wchar_t or UTF-8 representations are created unnecessarily.

Regards, Martin -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 3k.txt URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/6c37b94c/attachment.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 393.txt URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/6c37b94c/attachment-0001.txt>

Previous message: [Python-Dev] PEP 393 review
Next message: [Python-Dev] PEP 393 review
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list