[Python-Dev] PEP 393: Flexible String Representation (original) (raw)

Dj Gilcrease digitalxero at gmail.com
Wed Jan 26 02:50:30 CET 2011


On Tue, Jan 25, 2011 at 5:43 PM, M.-A. Lemburg <mal at egenix.com> wrote:

I also don't see how this could save a lot of memory. As an example take a French text with say 10mio code points. This would end up appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB), one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending on how many accents are used). That's a saving of -10MB compared to today's implementation :-)

If I am reading the pep right, which I may not be as I am no expert on unicode, the new implementation would actually give a 10MB saving since the wchar field is optional, so only the str (Latin-1) and utf8 fields would need to be stored. How it decides not to store one field or another would need to be clarified in the pep is I am right.



More information about the Python-Dev mailing list