replace text in unicode string

John Machin sjmachin at lexicon.net
Sat May 14 09:34:29 EDT 2005


On 14 May 2005 02:23:55 -0700, "Dan Bishop" <danb_83 at yahoo.com> wrote:

>Svennglenn wrote:
>> I'm having problems replacing text in a
>> unicode string.
>> Here's the code:
>>
>> # -*- coding: cp1252 -*-
>>
>> titel = unicode("ä", "iso-8859-1")

To the OP:
This is not causing the later problem, but it's evidence of the wrong
mindset for a start. You have just lied to the interpreter. You said
that your script was encoded using cp1252, but then you tried to pass
off a string constant as iso-8859-1!!! They are not exactly the same
repertoire. You need to make up your mind what character repertoire
your application should be confined to, and then apply that
restriction rigorously.

To get over the error message, all you need to do is this:

titel = u"ä"

... and didn't I (and/or somebody else) tell you this only a few days
ago?


>> print titel
>> print type(titel)
>>
>> titel.replace("ä", "a")
>>
>> When i run this program I get this error:
>>
>>     titel.replace("ä", "a")
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position
>0:
>> ordinal not in range(128)
>>
>> How can i replace text in the Unicode string?
>
>titel = titel.replace(u"ä", "a")

To Dan:
Fortuitously this works but if the OP wanted to change it to (say) an
umlauted-u then it would have thrown another UnicodeDecodeError.

Everybody please get into the habit of using u"blah blah" when you're
working with Unicode.

Like this:

titel = titel.replace(u"ä", u"a")




More information about the Python-list mailing list