[Python-Dev] Pre-PEP: Python Character Model

Neil Hodgson nhodgson@bigpond.net.au
Thu, 8 Feb 2001 08:37:18 +1100


Andrew Kuchling:

> Any idea if this next version of Ruby is available in its current
> state, or if it's vaporware?  It might be worth looking at what
> exactly it implements, but I wonder if this is just Matz's idea and he
> hasn't yet tried implementing it.

   AFAIK, 1.7 is still vaporware although the impression that I got was this
was being implemented by Matz when he mentioned it in mid December. Some
code may be available from CVS but I haven't been following that closely.

> I'd worry that implementing a regex engine for multiple encodings
> would be impossible or, if possible, it would be quite slow because
> you'd need to abstract every single character retrieval into a
> function call that decodes a single character for a given encoding.

   <speculation> I'd guess at some sort of type promotion system with
caching to avoid extra conversions. Say you want to search a Shift-JIS
string for a KOI8 string (unlikely but they do share many characters). The
infrastructure checks the character sets representable in the encodings and
chooses a super-type that can include all possibilities in the expression,
then promotes both arguments by reencoding and performs the operation. The
super-type would likely be Unicode based although given Matz' desire for
larger-than-Unicode character sets, it may be something else. </speculation>

   Neil