Performance of int/long in Python 3

Ian Kelly ian.g.kelly at gmail.com
Wed Apr 3 12:38:20 EDT 2013


On Wed, Apr 3, 2013 at 9:02 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Wed, 03 Apr 2013 09:43:06 -0400, Roy Smith wrote:
>
> [...]
>>> n = max(map(ord, s))
>>> 4 if n > 0xffff else 2 if n > 0xff else 1
>>
>> This has to inspect the entire string, no?
>
> Correct. A more efficient implementation would be:
>
> def char_size(s):
>     for n in map(ord, s):
>         if n > 0xFFFF: return 4
>         if n > 0xFF: return 2
>     return 1

That's an incorrect implementation, as it would return 2 at the first
non-Latin-1 BMP character, even if there were SMP characters later in the
string.  It's only safe to short-circuit return 4, not 2 or 1.



More information about the Python-list mailing list