[Python-Dev] The future of the wchar_t cache

Sat Oct 20 11:58:58 EDT 2018

On 20Oct2018 0901, Stefan Behnel wrote:
> I'd be happy to get rid of it. But regarding the use under Windows, I
> wonder if there's interest in keeping it as a special Windows-only feature,
> e.g. to speed up the data exchange with the Win32 APIs. I guess it would
> have to provide a visible (performance?) advantage to justify such special
> casing over the code removal.

I think these cases would be just as well served by maintaining the 
original UCS-2 representation even if the maximum character fits into 
UCS-1, and only do the conversion when Python copies the string into a 
new location.

I don't have numbers, but my instinct says the most impacted operations 
would be retrieving collections of strings from the OS (avoiding a 
scan/conversion for each one), comparisons against these collections 
(in-memory handling for hash/comparison of mismatched KIND), and passing 
some of these strings back to the OS (conversion back into UCS-2). This 
is basically a glob/fnmatch/stat sequence, and is the main real scenario 
I can think of where Python's overhead might become noticeable.

Another option that might be useful is some way to allow the UCS-1/4 <-> 
UCS-2 conversion to occur outside the GIL. Most of the time when we need 
to convert we're about to release the GIL (or have just recovered it), 
so even without the cache we could probably recover some of the 
performance impact in parallelism. (That said, these are often tied up 
in conditions and generated code, so it may not be as easy to do this as 
retaining the original format.)

Some sort of tracing to see how often the cache is reused after being 
generated would be interesting, as well as how often the cache is being 
generated for a string that was originally in UCS-2 but we changed it to 
UCS-1.

Cheers,
Steve