is this a unicode/string bug?

olsongt at verizon.net olsongt at verizon.net
Fri Dec 9 16:34:24 EST 2005


I was going to submit to sourceforge, but my unicode skills are weak.
I was trying to strip characters from a string that contained values
outside of ASCII.  I though I could just encode as 'ascii' in 'replace'
mode but it threw an error.  Strangely enough, if I decode via the
ascii codec and then encode via the ascii codec, I get what I want.
That being said, this may be operating correctly.

>>> print 'aaa\xae'
aaa®
>>> 'aaa\xae'.encode('ascii','replace') #should return 'aaa?'
Traceback (most recent call last):
  File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 3:
ordinal not in range(128)
>>> 'aaa\xae'.decode('ascii','replace') #but this doesn't throw an error?
u'aaa\ufffd'
>>> 'aaa\xae'.decode('ascii','replace').encode('ascii','replace') #this does what I wanted
'aaa?'
>>>




More information about the Python-list mailing list