XML and UnicodeError

Just just at xs4all.nl
Tue Oct 5 06:52:43 EDT 2004


In article <Xns957977C1D83A7devnulloo at 130.133.1.19>,
 Pinke Panke <dev at null.oo> wrote:

> >   2. Convert any other text to Unicode as soon as possible.
> 
> Ok, i.e.
> 
> headline = structure[0] # is unicode
> pagetext = structure[1] # is unicode
> fill = "bar".encode('utf-8') # lets make it unicode

That's not making it unicode; you mean

  fill = unicode("bar", "utf-8")

(Or "bar".decode("utf-8"), which does the same; I prefer using the 
unicode builtin.)

> foo = headline + fill + pagetext # foo is unicode, too
> 
> ?
> 
> >   3. Manipulate only Unicode values - don't mix them up with
> >      plain strings.
> 
> It makes sense, but I need some string concatenations. E.g. I set 
> default values in the python script and try to concatenate them with 
> XML values.
> 
> But now, I would think the safest way is to transfer all plain strings 
> in the python script into a second XML file and use them, because 
> after reading in they would be in Unicode. Right?

Yes, but there's no need to. Are you perhaps using string literals 
containing non-ascii chars, yet don't use the 'u' prefix? u"\xff" as 
opposed to "\xff".

> Or saving the python script in utf-8 would make the difference?

Depends...

> >   4. Serialise to your chosen encoding only when preparing
> >      output.
> 
> Every string concatenation in my script is preparing output.

Do _all_ manipulations using unicode, and convert to utf-8 as late as 
poosible, ie. when you're passing the result to code that expects 
non-unicode data. That's basically what he was saying.

Just



More information about the Python-list mailing list