Micro Python -- a lean and efficient implementation of Python 3

Chris Angelico rosuav at gmail.com
Tue Jun 3 23:52:54 EDT 2014


On Wed, Jun 4, 2014 at 1:37 PM, Rustom Mody <rustompmody at gmail.com> wrote:
> 2. My casual/cursory reading of the contents of the SMP-planes
> suggests that the stuff there is are things like
> - egyptian hieroplyphics
> - mahjong characters
> - ancient greek musical symbols
> - alchemical symbols etc etc.
>
> IOW from pov of a universallly acceptable character set this is mostly
> rubbish
>
> And so a pure BMP-supporting implementation may be a reasonable
> compromise. [As long as no surrogate-pairs are there]

Not if you're working on the internet. There are several critical
groups of characters that aren't in the BMP, such as:

1) Most or all Chinese and Japanese characters
2) Heaps of emoticons and fancy letters
3) Mathematical symbols

You can't ignore those. You might be able to say "Well, my program
will run slower if you throw these at it", but if you're going down
that route, you probably want the full FSR and the advantages it
confers on ASCII and Latin-1 strings. Binding your program to BMP-only
is nearly as dangerous as binding it to ASCII-only; potentially worse,
because you can run an awful lot of artificial tests without
remembering to stick in some astral characters.

It's not rubbish. It's important stuff that you need to deal with.

ChrisA



More information about the Python-list mailing list