Performance of int/long in Python 3

Mon Apr 1 09:11:50 EDT 2013

On Apr 1, 5:15 pm, Roy Smith <r... at panix.com> wrote:
> In article <515941d8$0$29967$c3e8da3$54964... at news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.pyt... at pearwood.info> wrote:
>
> > [...]
> > >> OK, that leads to the next question.  Is there anyway I can (in Python
> > >> 2.7) detect when a string is not entirely in the BMP?  If I could find
> > >> all the non-BMP characters, I could replace them with U+FFFD
> > >> (REPLACEMENT CHARACTER) and life would be good (enough).
>
> > Of course you can do this, but you should not. If your input data
> > includes character C, you should deal with character C and not just throw
> > it away unnecessarily. That would be rude, and in Python 3.3 it should be
> > unnecessary.
>
> The import job isn't done yet, but so far we've processed 116 million
> records and had to clean up four of them.  I can live with that.
> Sometimes practicality trumps correctness.

That works out to 0.000003%. Of course I assume it is US only data.
Still its good to know how skew the distribution is.