[Python-Dev] Optimize Unicode strings in Python 3.3 (original) (raw)

Victor Stinner victor.stinner at gmail.com
Fri May 4 01:45:15 CEST 2012


Hi,

Different people are working on improving performances of Unicode strings in Python 3.3. This Python version is very different from Python 3.2 because of the PEP 393, and it is still unclear to me what is the best way to create a new Unicode string.

There are different approachs:

The optimistic approach uses realloc() to resize the string. It is faster than the PyAccu approach (at least for short ASCII strings), maybe because it avoids the creating of temporary short strings. realloc() looks to be efficient on Linux and Windows (at least Seven).

Various notes:

I am interested if you know other tricks to optimize Unicode strings in Python, or if you are interested to work on this topic.

There are open issues related to optimizing Unicode:

#11313: Speed up default encode()/decode() #12807: Optimization/refactoring for {bytearray, bytes, unicode}.strip() #14419: Faster ascii decoding #14624: Faster utf-16 decoder #14625: Faster utf-32 decoder #14654: More fast utf-8 decoding #14716: Use unicode_writer API for str.format()

Victor



More information about the Python-Dev mailing list