Performance of int/long in Python 3

Roy Smith roy at panix.com
Wed Apr 3 20:46:18 EDT 2013


In article 
<aa3b500f-bebf-4d77-9855-3d90b07eaa6c at y7g2000pbu.googlegroups.com>,
 rusi <rustompmody at gmail.com> wrote:

> On Apr 3, 6:43 pm, Roy Smith <r... at panix.com> wrote:
> > This has to inspect the entire string, no?  I posted (essentially) this
> > a few days ago:
> >
> >        if all(ord(c) <= 0xffff for c in s):
> >             return "it's all bmp"
> >         else:
> >             return "it's got astral crap in it"
> 
> Astral crap? CRAP?
> Verily sir I am offended!
> [...]
> You are American!

This is true.

But, to be fair, in the (I don't have the exact number here) roughly 200 
million records in our recent big data import job, I found exactly FOUR 
strings with astral characters.  Which boiled down to two versions of 
each of two different song titles.

One had a Unicode Character 'BALLOON' (U+1F388).  The other had some 
heart symbol (sorry, I don't remember the exact code point).  These 
hardly seem a matter of national pride.

And, if you don't believe there is astral crap, how do you explain 
U+1F4A9?



More information about the Python-list mailing list