[Python-Dev] [I18n-sig] Unicode strings: an alternative

Tom Emerson tree@basistech.com
Wed, 3 May 2000 18:05:39 -0400 (EDT)


Skip Montanaro writes:
 > Note that currently the len() method doesn't call strlen() at all.  It just
 > returns the ob_size field.  Presumably, with Just's proposal len() would
 > simply return ob_size/width.  If you used a variable width encoding, Just's
 > plan wouldn't work.  (I don't know anything about string encodings - is
 > UTF-8 variable width?)

Yes, technically from 1 - 6 bytes per character, though in practice
for Unicode it's 1 - 3.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Language Hacker                                    http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"