string.replace non-ascii characters

Mon Feb 12 00:38:29 EST 2007

Steven Bethard <steven.bethard at gmail.com> on Sun, 11 Feb 2007 22:23:59
-0700 didst step forth and proclaim thus:

> Samuel Karl Peterson wrote:
> > Greetings Pythonistas.  I have recently discovered a strange anomoly
> > with string.replace.  It seemingly, randomly does not deal with
> > characters of ordinal value > 127.  I ran into this problem while
> > downloading auction web pages from ebay and trying to replace the
> > "\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
> > urllib2.  Yet today, all is fine, no problems whatsoever.  Sadly, I
> > did not save the exact error message, but I believe it was a
> > ValueError thrown on string.replace and the message was something to
> > the effect "character value not within range(128).
> 
> Was it something like this?
> 
>  >>> u'\xa0'.replace('\xa0', '')
> Traceback (most recent call last):
>    File "<interactive input>", line 1, in <module>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position
> 0: ordinal not in range(128)

Yeah that looks like exactly what was happening, thank you.  I wonder
why I had a unicode string though.  I thought urllib2 always spat out
a plain string.  Oh well.

u'\xa0'.encode('latin-1').replace('\xa0', " ")

Horray.
-- 
Sam Peterson
skpeterson At nospam ucdavis.edu
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown