Performance of int/long in Python 3

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Apr 3 11:02:37 EDT 2013


On Wed, 03 Apr 2013 09:43:06 -0400, Roy Smith wrote:

[...]
>> n = max(map(ord, s))
>> 4 if n > 0xffff else 2 if n > 0xff else 1
> 
> This has to inspect the entire string, no?

Correct. A more efficient implementation would be:

def char_size(s):
    for n in map(ord, s):
        if n > 0xFFFF: return 4
        if n > 0xFF: return 2
    return 1



> I posted (essentially) this a few days ago:
> 
>        if all(ord(c) <= 0xffff for c in s):
>             return "it's all bmp"
>         else:
>             return "it's got astral crap in it"


It's not "astral crap". People use it, and they'll use it more in the 
future. Just because you don't, doesn't give you leave to make 
disparaging remarks about it.

Honestly, it's really painful to see how history repeats itself:

"Bah humbug, why do we need to support the SMP astral crap? The Unicode 
BMP is more than enough for everybody."

"Bah humbug, why do we need to support Unicode crap? Latin1 is more than 
enough for everybody."

"Bah humbug, why do we need to support Latin1 crap? ASCII is more than 
enough for everybody."

"Bah humbug, why do we need to support ASCII crap? Uppercase A-Z is more 
than enough for everybody."

Seriously. Go back long enough, to the telegraph days, and you have 
people arguing that there was no need for upper and lower case letters.



> I'm reasonably sure all() is smart enough to stop at the first False
> value.

Yes, all() and any() are guaranteed to be short-circuit functions. They 
will stop as soon as they see a False or a True value respectively.



-- 
Steven



More information about the Python-list mailing list