Py 3.3, unicode / upper()

Thu Dec 20 00:32:42 EST 2012

On 12/19/2012 10:12 PM, Westley Martínez wrote:
> On Wed, Dec 19, 2012 at 09:54:20PM -0500, Terry Reedy wrote:
>> On 12/19/2012 9:03 PM, Chris Angelico wrote:
>>> On Thu, Dec 20, 2012 at 5:27 AM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
>>>>  From what I've been able to discern, [jmf's] actual complaint about PEP
>>>> 393 stems from misguided moral concerns.  With PEP-393, strings that
>>>> can be fully represented in Latin-1 can be stored in half the space
>>>> (ignoring fixed overhead) compared to strings containing at least one
>>>> non-Latin-1 character.  jmf thinks this optimization is unfair to
>>>> non-English users and immoral; he wants Latin-1 strings to be treated
>>>> exactly like non-Latin-1 strings (I don't think he actually cares
>>>> about non-BMP strings at all; if narrow-build Unicode is good enough
>>>> for him, then it must be good enough for everybody).
>>>
>>> Not entirely; most of his complaints are based on performance (speed
>>> and/or memory) of 3.3 compared to a narrow build of 3.2, using silly
>>> edge cases to prove how much worse 3.3 is, while utterly ignoring the
>>> fact that, in those self-same edge cases, 3.2 is buggy.
>>
>> And the fact that stringbench.py is overall about as fast with 3.3
>> as with 3.2 *on the same Windows 7 machine* (which uses narrow build
>> in 3.2), and that unicode operations are not far from bytes
>> operations when the same thing can be done with both.
>>
>> --
>> Terry Jan Reedy
>
> Really, why should we be so obsessed with speed anyways?  Isn't
> improving the language and fixing bugs far more important?

Being conservative, there are probably at least 10 enhancement patches 
and 30 bug fix patches for every performance patch. Performance patches 
are considered enhancements and only go in new versions with 
enhancements, where they go through the extended alpha, beta, candidate 
test and evaluation process.

In the unicode case, Jim discovered that find was several times slower 
in 3.3 than 3.2 and claimed that that was a reason to not use 3.2. I ran 
the complete stringbency.py and discovered that find (and consequently 
find and replace) are the only operations with such a slowdown. I also 
discovered that another at least as common operation, encoding strings 
that only contain ascii characters to ascii bytes for transmission, is 
several times as fast in 3.3. So I reported that unless one is only 
finding substrings in long strings, there is no reason to not upgrade to 
3.3.

-- 
Terry Jan Reedy