[Python-Dev] Pre-PEP: Python Character Model
Neil Hodgson
nhodgson@bigpond.net.au
Thu, 8 Feb 2001 08:37:18 +1100
Andrew Kuchling:
> Any idea if this next version of Ruby is available in its current
> state, or if it's vaporware? It might be worth looking at what
> exactly it implements, but I wonder if this is just Matz's idea and he
> hasn't yet tried implementing it.
AFAIK, 1.7 is still vaporware although the impression that I got was this
was being implemented by Matz when he mentioned it in mid December. Some
code may be available from CVS but I haven't been following that closely.
> I'd worry that implementing a regex engine for multiple encodings
> would be impossible or, if possible, it would be quite slow because
> you'd need to abstract every single character retrieval into a
> function call that decodes a single character for a given encoding.
<speculation> I'd guess at some sort of type promotion system with
caching to avoid extra conversions. Say you want to search a Shift-JIS
string for a KOI8 string (unlikely but they do share many characters). The
infrastructure checks the character sets representable in the encodings and
chooses a super-type that can include all possibilities in the expression,
then promotes both arguments by reencoding and performs the operation. The
super-type would likely be Unicode based although given Matz' desire for
larger-than-Unicode character sets, it may be something else. </speculation>
Neil