Performance of int/long in Python 3

Mon Apr 1 01:57:51 EDT 2013

On Mon, Apr 1, 2013 at 4:33 PM, rusi <rustompmody at gmail.com> wrote:
> So I really wonder: Is python losing more by supporting SMP with
> performance hit on BMP?

If your strings fit entirely within the BMP, then you should see no
penalty compared to previous versions of Python. If they happen to fit
inside ASCII, then there may well be significant improvements. But
regardless, what you gain is the ability to work with *any* string,
regardless of its content, without worrying about it. You can count
characters regardless of their content. Imagine if a tuple of integers
behaved differently if some of those integers flipped to being long
ints:

x = (1, 2, 4, 8, 1<<30, 1<<300, 1<<10)

Wouldn't you be surprised if len(x) returned 8? I certainly would be.
And that's what a narrow build of Python does with Unicode.

Unicode strings are approximately comparable to tuples of integers. In
fact, they can be interchanged fairly readily:

string = "Treble clef: \U0001D11E"
array = tuple(map(ord,string))
assert(len(array) == 14)
out_string = ''.join(map(chr,array))
assert(out_string == string)

This doesn't work in Python 2.6 on Windows, partly because of
surrogates, but also because chr() isn't designed for Unicode strings.
There's probably a solution to the second, but not really to the
first. The tuple of ords should match the way the characters are laid
out to a human.

ChrisA