Performance of int/long in Python 3
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Mon Apr 1 04:14:16 EDT 2013
On Sun, 31 Mar 2013 22:33:45 -0700, rusi wrote:
> On Mar 31, 5:55 pm, Mark Lawrence <breamore... at yahoo.co.uk> wrote:
>
> <snipped jmf's broken-record whine>
>
>> I'm feeling very sorry for this horse, it's been flogged so often it's
>> down to bare bones.
>
> While I am now joining the camp of those fed up with jmf's whining, I do
> wonder if we are shooting the messenger…
No. The trouble is that the messenger is shouting that the Unicode world
is ending on December 21st 2012, and hasn't noticed that was over three
months ago and the world didn't end.
[...]
>> OK, that leads to the next question. Is there anyway I can (in Python
>> 2.7) detect when a string is not entirely in the BMP? If I could find
>> all the non-BMP characters, I could replace them with U+FFFD
>> (REPLACEMENT CHARACTER) and life would be good (enough).
Of course you can do this, but you should not. If your input data
includes character C, you should deal with character C and not just throw
it away unnecessarily. That would be rude, and in Python 3.3 it should be
unnecessary.
Although, since the person you are quoting is stuck in Python 2.7, it may
be less bad than having to deal with potentially broken Unicode strings.
> Steven's:
>> But it means that if you're one of the 99.9% of users who mostly use
>> characters in the BMP, …
Yes. "Mostly" does not mean exclusively, and given (say) a billion
computer users, that leaves about a million users who have significant
need for non-BMP characters.
If you don't agree with my estimate, feel free to invent your own :-)
> And from http://www.tlg.uci.edu/~opoudjis/unicode/unicode_astral.html
>> The informal name for the supplementary planes of Unicode is "astral
>> planes", since (especially in the late '90s) their use seemed to be as
>> remote as the theosophical "great beyond". …
That was nearly two decades ago. Two decades ago, the idea that the
entire computing world could standardize on a single character set,
instead of having to deal with dozens of different "code pages", seemed
as likely as people landing on the Moon seemed in 1940.
Today, the entire computing world has standardized on such a system,
"code pages" (encodings) are mostly only needed for legacy data and
shitty applications, but most implementations don't support the entire
Unicode range. A couple of programming languages, including Pike and
Python, support Unicode fully and correctly. Pike has never had the same
high-profile as Python, but now that Python can support the entire
Unicode range without broken surrogate support, maybe users of other
languages will start to demand the same.
> So I really wonder: Is python losing more by supporting SMP with
> performance hit on BMP?
No.
As many people have demonstrated, both with code snippets and whole-
program benchmarks, Python 3.3 is *as fast* or *faster* than Python 3.2
narrow builds. In practice, Python 3.3 saves enough memory by using
sensible string implementations that real world software is faster in
Python 3.3 than in 3.2.
> The problem as I see it is that a choice that is sufficiently skew is no
> more a straightforward choice. An example will illustrate:
>
> I can choose to drive or not -- a choice. Statistics tell me that on
> average there are 3 fatalities every day; I am very concerned that I
> could get killed so I choose not to drive. Which neglects that there are
> a couple of million safe-drives at the same time as the '3 fatalities'
Clear as mud. What does this have to do with supporting Unicode?
--
Steven
More information about the Python-list
mailing list