[Baypiggies] Handling unwanted Unicode \u2019 characters in XML

Terry Carroll carroll at tjc.com
Wed Jul 2 02:30:38 CEST 2008


Sorry, meant to send this to the list....


On Tue, 1 Jul 2008, Stephen McInerney wrote:

> Check that URL again: string.translate() IS deprecated, but
> string.maketrans() is not. unicode.translate() is not deprecated.

But can you set up the translate table, though?


>>> import string
>>> trantab = string.maketrans(u"u\2019", u"'")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\x81' in 
position 1: ordinal not in range(128)


I also note that the docs for the translate() string method suggest:

    Note, a more flexible approach is to create a custom character mapping 
    codec using the codecs module (see encodings.cp1251 for an example). 

But reading the codecs docs raised more questions for me than they 
answered; it certainly isn't as straightforward as the ascii translation 
was.




More information about the Baypiggies mailing list