string.replace non-ascii characters
Samuel Karl Peterson
skpeterson at nospam.please.ucdavis.edu
Mon Feb 12 00:38:29 EST 2007
Steven Bethard <steven.bethard at gmail.com> on Sun, 11 Feb 2007 22:23:59
-0700 didst step forth and proclaim thus:
> Samuel Karl Peterson wrote:
> > Greetings Pythonistas. I have recently discovered a strange anomoly
> > with string.replace. It seemingly, randomly does not deal with
> > characters of ordinal value > 127. I ran into this problem while
> > downloading auction web pages from ebay and trying to replace the
> > "\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
> > urllib2. Yet today, all is fine, no problems whatsoever. Sadly, I
> > did not save the exact error message, but I believe it was a
> > ValueError thrown on string.replace and the message was something to
> > the effect "character value not within range(128).
>
> Was it something like this?
>
> >>> u'\xa0'.replace('\xa0', '')
> Traceback (most recent call last):
> File "<interactive input>", line 1, in <module>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position
> 0: ordinal not in range(128)
Yeah that looks like exactly what was happening, thank you. I wonder
why I had a unicode string though. I thought urllib2 always spat out
a plain string. Oh well.
u'\xa0'.encode('latin-1').replace('\xa0', " ")
Horray.
--
Sam Peterson
skpeterson At nospam ucdavis.edu
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
More information about the Python-list
mailing list