Performance of int/long in Python 3

Mon Apr 1 13:20:30 EDT 2013

On Tue, Apr 2, 2013 at 4:07 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Mon, 01 Apr 2013 08:15:53 -0400, Roy Smith wrote:
>> It turns out, the problem is that the version of MySQL we're using
>
> Well there you go. Why don't you use a real database?
>
> http://www.postgresql.org/docs/9.2/static/multibyte.html
>
> :-)
>
> Postgresql has supported non-broken UTF-8 since at least version 8.1.

Not only that, but I *rely* on PostgreSQL to test-or-reject stuff that
comes from untrustworthy languages, like PHP. If it's malformed in any
way, it won't get past the database.

>> doesn't support non-BMP characters.  Newer versions do (but you have to
>> declare the column to use the utf8bm4 character set).  I could upgrade
>> to a newer MySQL version, but it's just not worth it.
>
> My brain just broke. So-called "UTF-8" in MySQL only includes up to a
> maximum of three-byte characters. There has *never* been a time where
> UTF-8 excluded four-byte characters. What were the developers thinking,
> arbitrarily cutting out support for 50% of UTF-8?

Steven, you punctuated that wrongly.

What, were the developers *thinking*? Arbitrarily etc?

It really is brain-breaking. I could understand a naive UTF-8 codec
being too permissive (allowing over-long encodings, allowing
codepoints above what's allocated (eg FA 80 80 80 80, which would
notionally represent U+2000000), etc), but why should it arbitrarily
stop short? There must have been some internal limitation - that,
perhaps, collation was defined only within the BMP.

ChrisA