Binary strings, unicode and encodings
Laurent Therond
google at axiomatize.com
Thu Jan 15 18:29:21 EST 2004
I used the interpreter on my system:
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
OK
>>> from cStringIO import StringIO
>>> b = StringIO()
>>> b.write('%d:%s' % (len('string'), 'string'))
>>> print b.getvalue()
6:string
OK
>>> c = StringIO()
>>> c.write('%d:%s' % (len('stringé'), 'stringé'))
>>> print c.getvalue()
7:stringé
OK
Did StringIO just recognize Extended ASCII?
Did StringIO just recognize ISO 8859-1?
é belongs to Extended ASCII AND ISO 8859-1.
>>> print c.getvalue().decode('US-ASCII')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 8: ordinal
not in range(128)
>>> print c.getvalue().decode('ISO-8859-1')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python23\lib\encodings\cp437.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x82' in position 8
: character maps to <undefined>
>>>
OK
It must have been Extended ASCII, then.
I must do other tests.
More information about the Python-list
mailing list