Trouble fixing a broken ASCII string - "replace" mode in codec not working.
Robert Kern
robert.kern at gmail.com
Tue Feb 6 15:05:45 EST 2007
John Nagle wrote:
> I'm trying to clean up a bad ASCII string, one read from a
> web page that is supposedly in the ASCII character set but has some
> characters above 127. And I get this:
>
> File "D:\projects\sitetruth\InfoSitePage.py", line 285, in httpfetch
> sitetext = sitetext.encode('ascii','replace') # force to clean ASCII
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 29151:
> ordinal not in range(128)
>
> Why is that exception being raised when the codec was told 'replace'?
The .encode('ascii') takes unicode strings to str strings. Since you gave it a
str string, it first tried to convert it to a unicode string using the default
codec ('ascii'), just as if you were to have done
unicode(sitetext).encode('ascii', 'replace').
I think you want something like this:
sitetext = sitetext.decode('ascii', 'replace').encode('ascii', 'replace')
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list
mailing list