Flexible string representation, unicode, typography, ...

Mon Aug 27 15:16:27 EDT 2012

Le dimanche 26 août 2012 22:45:09 UTC+2, Dan Sommers a écrit :
> On 2012-08-26 at 20:13:21 +0000,
> 
> Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> 
> 
> 
> > I note that not all 32-bit ints are valid code points. I suppose I can
> 
> > see sense in having rune be a 32-bit integer value limited to those
> 
> > valid code points. (But, dammit, why not call it a code point?) But if
> 
> > rune is merely an alias for int32, why not just call it int32?
> 
> 
> 
> Having a "code point" type is a good idea.  If nothing else, human code
> 
> readers can tell that you're doing something with characters rather than
> 
> something with integers.  If your language provides any sort of type
> 
> safety, then you get that, too.
> 
> 
> 
> Calling your code points int32 is a bad idea for the same reason that it
> 
> turned out to be a bad idea to call all my old ASCII characters int8.
> 
> Or all my pointers int<n> (or unsigned int<n>), for n in 16, 20, 24, 32,
> 
> 36, 48, or 64 (or I'm sure other values of n that I never had the pain
> 
> or pleasure of using).
> 

And this is precisely the concept of rune, a real int which
is a name for Unicode code point.

Go "has" the integers int32 and int64. A rune ensure
the usage of int32. "Text libs" use runes. Go has only
bytes and runes.

If you do not like the word "perfection", this mechanism
has at least an ideal simplicity (with probably a lot
of positive consequences).

rune -> int32 -> utf32 -> unicode code points.

- Why int32 and not uint32? No idea, I tried to find an
answer without asking.
- I find the name "rune" elegant. "char" would have been
too confusing.

End. This is supposed to be a Python forum.
jmf