Binary strings, unicode and encodings

Laurent Therond google at axiomatize.com
Thu Jan 15 18:29:21 EST 2004


I used the interpreter on my system:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

OK

>>> from cStringIO import StringIO
>>> b = StringIO()
>>> b.write('%d:%s' % (len('string'), 'string'))
>>> print b.getvalue()
6:string

OK

>>> c = StringIO()
>>> c.write('%d:%s' % (len('stringé'), 'stringé'))
>>> print c.getvalue()
7:stringé

OK

Did StringIO just recognize Extended ASCII?
Did StringIO just recognize ISO 8859-1?

é belongs to Extended ASCII AND ISO 8859-1.

>>> print c.getvalue().decode('US-ASCII')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 8: ordinal
not in range(128)

>>> print c.getvalue().decode('ISO-8859-1')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Python23\lib\encodings\cp437.py", line 18, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x82' in position 8
: character maps to <undefined>
>>>

OK

It must have been Extended ASCII, then.

I must do other tests.



More information about the Python-list mailing list