Py 3.3, unicode / upper()

Thu Dec 20 00:51:48 EST 2012

On Thu, 20 Dec 2012 00:32:42 -0500, Terry Reedy wrote:

> In the unicode case, Jim discovered that find was several times slower
> in 3.3 than 3.2 and claimed that that was a reason to not use 3.2. I ran
> the complete stringbency.py and discovered that find (and consequently
> find and replace) are the only operations with such a slowdown. I also
> discovered that another at least as common operation, encoding strings
> that only contain ascii characters to ascii bytes for transmission, is
> several times as fast in 3.3. So I reported that unless one is only
> finding substrings in long strings, there is no reason to not upgrade to
> 3.3.

Yes, and if you remember, Jim (jfm) based his complaints on very possibly 
the worst edge-case for the new Unicode implementation:

- generate a large string of characters
- replace every character in that string with another character

By memory:

s = "a"*100000
s = s.replace("a", "b")

or equivalent. Hardly representative of normal string processing, and 
likely to be the worst-performing operation on new Unicode strings. And 
yet even so, many people reported either a mild slow down or, in a few 
cases, a small speed up.

-- 
Steven