[Python-Dev] The future of the wchar_t cache (original) (raw)

Steve Dower steve.dower at python.org
Sat Oct 20 11:58:58 EDT 2018


On 20Oct2018 0901, Stefan Behnel wrote:

I'd be happy to get rid of it. But regarding the use under Windows, I wonder if there's interest in keeping it as a special Windows-only feature, e.g. to speed up the data exchange with the Win32 APIs. I guess it would have to provide a visible (performance?) advantage to justify such special casing over the code removal.

I think these cases would be just as well served by maintaining the original UCS-2 representation even if the maximum character fits into UCS-1, and only do the conversion when Python copies the string into a new location.

I don't have numbers, but my instinct says the most impacted operations would be retrieving collections of strings from the OS (avoiding a scan/conversion for each one), comparisons against these collections (in-memory handling for hash/comparison of mismatched KIND), and passing some of these strings back to the OS (conversion back into UCS-2). This is basically a glob/fnmatch/stat sequence, and is the main real scenario I can think of where Python's overhead might become noticeable.

Another option that might be useful is some way to allow the UCS-1/4 <-> UCS-2 conversion to occur outside the GIL. Most of the time when we need to convert we're about to release the GIL (or have just recovered it), so even without the cache we could probably recover some of the performance impact in parallelism. (That said, these are often tied up in conditions and generated code, so it may not be as easy to do this as retaining the original format.)

Some sort of tracing to see how often the cache is reused after being generated would be interesting, as well as how often the cache is being generated for a string that was originally in UCS-2 but we changed it to UCS-1.

Cheers, Steve



More information about the Python-Dev mailing list