XML and UnicodeError

Pinke Panke dev at null.oo
Tue Oct 5 11:55:08 EDT 2004


Hello Just

> Are you perhaps using string literals containing non-ascii chars,

Yes.

> yet don't use the 'u' prefix? u"\xff" as opposed to "\xff".

No.

E.g. I convert umlauts to html entities or change symbols to ascii 
strings for file names. Instead of using the x-notation I typed the 
character itself. In the case of my script no character is over chr
(255). An example:
def foo (name):
   name = re.sub(r'®','_registered_',name)
   ... and many more substitutions

I think instead of r'' I should use u''?

It is possible to compile a RE object with the U flag:
matchreg = re.compile(u'®', re.U)
name = matchreg.sub('_registered_',name)

But maybe not neccessary. In my tests using any u-switches and u-
flags makes no difference. The only crucial things were 
1. using unicode().
2. using a coding flag as described in [1]
3. storing the python script as utf-8

For me using unicode() is ok.

[1] http://python.org/peps/pep-0263.html


Martin



More information about the Python-list mailing list