Question on Strings

Tino Wildenhain tino at wildenhain.de
Fri Feb 6 06:11:38 EST 2009


Hi,

Kalyankumar Ramaseshan wrote:
> Hi,
> 
> Excuse me if this is a repeat question!
> 
> I just wanted to know how are strings represented in python?

It depents on if you mean python2.x or python3.x - the model
changed.

Python 2.x knows str and unicode  - the former a sequence
of single byte characters and unicode depending on configure
options either 16 or 32 bit per character.

str in python3.x replaces unicode and what formerly used
to be like str is now bytes (iirc).

> I need to know in terms of:
> 
> a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?

It uses an internal fixed length encoding for unicode, not UTF

> b) They are converted to utf-8 format when it is needed for e.g. when storing the string to disk or sending it through a socket (tcp/ip)? 

Nope. You need to do this explicitely. Default encoding for python2.x
implicit conversion is ascii.

In python2.x you would use unicodestr.encode('utf-8')
and simplestr.decode('utf-8') to convert an utf-8 encoded
string back to internal unicode.

There are many encodings available to select from.

> Any help in this regard is appreciated.

Please see also pythons documentation which is very
good and just try it out in the interactive interpreter

Regards
Tino
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3241 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20090206/e227188d/attachment-0001.bin>


More information about the Python-list mailing list