Unicode Hell

Fri Nov 7 04:20:39 EST 2003

Stuart Forsyth wrote:

> The replace string in this case is actually the contents of a file.  I
> have simplified it for the purposes of the example.  The file I'm doing
> the replace on is a web archive (.mht) file.  Within that file are a
> number of different replace fields e.g. #name# #organisation# etc..
> Everything was working fine until the replace function tried to replace
> the #name# replace field with a posting variable that had a tilde in it.
> The script then moaned about it being non-ascii and crashed.  The exact
> error is:
>
> Error Type:
> Python ActiveX Scripting Engine (0x80020009)
> Traceback (most recent call last): File "<Script Block >", line 80, in ?
> FileContents =
> FileContents.replace('Repl_learner',str(Request("learner"))) File
> "C:\Python23\lib\site-packages\win32com\client\dynamic.py", line 169, in
> __str__ return str(self.__call__()) UnicodeEncodeError: 'ascii' codec
> can't encode characters in position 5-9: ordinal not in range(128)

if you're replacing parts of a Unicode string with the contents of a non-
Unicode string, Python assumes that the second string contains only
plain ASCII.

if it doesn't, you have to tell Python what encoding you're using in the
second string; there's no way Python can figure that out by itself.  here's
how to do that:

    myunicodestring.replace(tag, replacestring.decode("iso-8859-1"))

also see item 5 on this page:

    http://effbot.org/zone/unicode-objects.htm

</F>