Micro Python -- a lean and efficient implementation of Python 3

wxjmfauth at gmail.com wxjmfauth at gmail.com
Tue Jun 10 05:13:25 EDT 2014


Le mardi 10 juin 2014 09:32:34 UTC+2, wxjm... at gmail.com a écrit :
> Le mercredi 4 juin 2014 13:53:19 UTC+2, Robin Becker a écrit :
> 
> > On 04/06/2014 12:01, Tim Chase wrote:
> 
> > 
> 
> > > On 2014-06-04 00:58, Paul Rubin wrote:
> 
> > 
> 
> > >> Steven D'Aprano <steve at pearwood.info> writes:
> 
> > 
> 
> > >>>> Maybe there's a use-case for a microcontroller that works in
> 
> > 
> 
> > >>>> ISO-8859-5 natively, thus using only eight bits per character,
> 
> > 
> 
> > >>> That won't even make the Russians happy, since in Russia there
> 
> > 
> 
> > >>> are multiple incompatible legacy encodings.
> 
> > 
> 
> > >>
> 
> > 
> 
> > >> I've never understood why not use UTF-8 for everything.
> 
> > 
> 
> > >
> 
> > 
> 
> > > If you use UTF-8 for everything, then you end up in a world where
> 
> > 
> 
> > > string-indexing (see ChrisA's other side thread on this topic) is no
> 
> > 
> 
> > > longer an O(1) operation, but an O(N) operation.  Some of us slice
> 
> > 
> 
> > > strings for a living. ;-)  I understand that using UTF-32 would allow
> 
> > 
> 
> > > us to maintain O(1) indexing at the cost of every string occupying 4
> 
> > 
> 
> > > bytes per character.  The FSR (again, as I understand it) allows
> 
> > 
> 
> > > strings that fit in one-byte-per-character to use that, scaling up to
> 
> > 
> 
> > > use wider characters internally as they're actually needed/used.
> 
> > 
> 
> > >
> 
> > 
> 
> > ........
> 
> > 
> 
> > I believe that we should distinguish between glyph/character indexing and string 
> 
> > 
> 
> > indexing. Even in unicode it may be hard to decide where a visual glyph starts 
> 
> > 
> 
> > and ends. I assume most people would like to assign one glyph to one unicode, 
> 
> > 
> 
> > but that's not always possible with composed glyphs.
> 
> > 
> 
> > 
> 
> > 
> 
> >  >>> for a in (u'\xc5',u'A\u030a'):
> 
> > 
> 
> > ... 	for o in (u'\xf6',u'o\u0308'):
> 
> > 
> 
> > ... 		u=a+u'ngstr'+o+u'm'
> 
> > 
> 
> > ... 		print("%s %s" % (repr(u),u))
> 
> > 
> 
> > ...
> 
> > 
> 
> > u'\xc5ngstr\xf6m' Ångström
> 
> > 
> 
> > u'\xc5ngstro\u0308m' Ångström
> 
> > 
> 
> > u'A\u030angstr\xf6m' Ångström
> 
> > 
> 
> > u'A\u030angstro\u0308m' Ångström
> 
> > 
> 
> >  >>> u'\xc5ngstr\xf6m'==u'\xc5ngstro\u0308m'
> 
> > 
> 
> > False
> 
> > 
> 
> > 
> 
> > 
> 
> > so even unicode doesn't always allow for O(1) glyph indexing. I know this is 
> 
> > 
> 
> > artificial, but this is the same situation as utf8 faces just the frequency of 
> 
> > 
> 
> > occurrence is different. A very large amount of computing is still western 
> 
> > 
> 
> > centric so searching a byte string for latin characters is still efficient; 
> 
> > 
> 
> > searching for an n with a tilde on top might not be so easy.
> 
> > 
> 
> > -- 
> 
> > 
> 
> > Robin Becker
> 
> 
> 
> =========
> 
> 
> 
> Python succeeded to become an anti-unicode product!
> 
> 
> 
> jmf

-----

And deeply buggy!



More information about the Python-list mailing list