Performance of int/long in Python 3

Chris Angelico rosuav at gmail.com
Wed Apr 3 17:55:43 EDT 2013


On Thu, Apr 4, 2013 at 4:43 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Wed, 03 Apr 2013 10:38:20 -0600, Ian Kelly wrote:
>
>> On Wed, Apr 3, 2013 at 9:02 AM, Steven D'Aprano
>> <steve+comp.lang.python at pearwood.info> wrote:
>>> On Wed, 03 Apr 2013 09:43:06 -0400, Roy Smith wrote:
>>>
>>> [...]
>>>>> n = max(map(ord, s))
>>>>> 4 if n > 0xffff else 2 if n > 0xff else 1
>>>>
>>>> This has to inspect the entire string, no?
>>>
>>> Correct. A more efficient implementation would be:
>>>
>>> def char_size(s):
>>>     for n in map(ord, s):
>>>         if n > 0xFFFF: return 4
>>>         if n > 0xFF: return 2
>>>     return 1
>>
>> That's an incorrect implementation, as it would return 2 at the first
>> non-Latin-1 BMP character, even if there were SMP characters later in
>> the string.  It's only safe to short-circuit return 4, not 2 or 1.
>
>
> Doh!
>
> I mean, well done sir, you have successfully passed my little test!

Try this:

def str_width(s):
  width=1
  for ch in map(ord,s):
    if ch > 0xFFFF: return 4
    if cn > 0xFF: width=2
  return width

ChrisA



More information about the Python-list mailing list