usage of <string>.encode('utf-8','xmlcharrefreplace')?

J Peyret jpeyret at gmail.com
Tue Feb 19 00:36:17 EST 2008


Well, as usual I am confused by unicode encoding errors.

I have a string with problematic characters in it which I'd like to
put into a postgresql table.
That results in a postgresql error so I am trying to fix things with
<string>.encode

>>> s = 'he Company\xef\xbf\xbds ticker'
>>> print s
he Company�s ticker
>>>

Trying for an encode:

>>> print s.encode('utf-8')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)

OK, that's pretty much as expected, I know this is not valid utf-8.
But I should be able to fix this with the errors parameter of the
encode method.

>>> error_replace = 'xmlcharrefreplace'

>>> print s.encode('utf-8',error_replace)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)

Same exact error I got without the errors parameter.

Did I mistype the error handler name?  Nope.

>>> codecs.lookup_error(error_replace)
<built-in function xmlcharrefreplace_errors>

Same results with 'ignore' as an error handler.

>>> print s.encode('utf-8','ignore')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)

And with a bogus error handler:

print s.encode('utf-8','bogus')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)

This all looks unusually complicated for Python.
Am I missing something incredibly obvious?
How does one use the errors parameter on strings' encode method?

Also, why are the exceptions above complaining about the 'ascii' codec
if I am asking for 'utf-8' conversion?

Version and environment below.  Should I try to update my python from
somewhere?

./$ python
Python 2.5.1 (r251:54863, Oct  5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2

Cheers



More information about the Python-list mailing list