Question on Strings

MRAB google at mrabarnett.plus.com
Fri Feb 6 08:24:53 EST 2009


John Machin wrote:
 > On Feb 6, 9:24 pm, Chris Rebert <c... at rebertia.com> wrote:
 >> On Fri, Feb 6, 2009 at 1:49 AM, Kalyankumar Ramaseshan
 >>
 >> <soft_sm... at yahoo.com> wrote:
 >>
 >>> Hi,
 >>> Excuse me if this is a repeat question!
 >>> I just wanted to know how are strings represented in python?
 >>> I need to know in terms of:
 >>> a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?
 >
 > Neither.
 >
 >> IIRC, Depends on what the build settings were when CPython was
 >> compiled. UTF-16 is the default.
 >
 > Unicode strings are held as arrays of 16-bit numbers or 32-bit numbers
 > [of which only 21 are used]. If you must use an acronym, use UCS-2 or
 > UCS-4.
 >
 > The UTF-n siblings are *external* representations.
 > 2.x: a_unicode_object.decode('UTF-16') -> an_str_object
 > 3.x: an_str_object.decode('UTF-16') -> a_bytes_object
 >
 > By the way, has anyone come up with a name for the shifting effect
 > observed above on str, and also with repr, range, and the iter*
 > family? If not, I suggest that the language's association with the
 > best of English humour be widened so that it be dubbed the "Mad
 > Hatter's Tea Party" effect.
 >
Bitwise shifts and rotates are collectively referred to as skew
operations. I therefore suggest the term "skewing". :-)



More information about the Python-list mailing list