Question on Strings

John Machin sjmachin at lexicon.net
Fri Feb 6 07:10:13 EST 2009


On Feb 6, 9:24 pm, Chris Rebert <c... at rebertia.com> wrote:
> On Fri, Feb 6, 2009 at 1:49 AM, Kalyankumar Ramaseshan
>
> <soft_sm... at yahoo.com> wrote:
>
> > Hi,
>
> > Excuse me if this is a repeat question!
>
> > I just wanted to know how are strings represented in python?
>
> > I need to know in terms of:
>
> > a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?

Neither.

>
> IIRC, Depends on what the build settings were when CPython was
> compiled. UTF-16 is the default.

Unicode strings are held as arrays of 16-bit numbers or 32-bit numbers
[of which only 21 are used]. If you must use an acronym, use UCS-2 or
UCS-4.

The UTF-n siblings are *external* representations.
2.x: a_unicode_object.decode('UTF-16') -> an_str_object
3.x: an_str_object.decode('UTF-16') -> a_bytes_object

By the way, has anyone come up with a name for the shifting effect
observed above on str, and also with repr, range, and the iter*
family? If not, I suggest that the language's association with the
best of English humour be widened so that it be dubbed the "Mad
Hatter's Tea Party" effect.



More information about the Python-list mailing list