utf8 and ftplib

Fredrik Lundh fredrik at pythonware.com
Mon Jun 20 13:36:23 EDT 2005


Richard Lewis wrote:

> On Mon, 20 Jun 2005 14:27:17 +0200, "Fredrik Lundh"
> <fredrik at pythonware.com> said:
> >
> > well, you're messing it up all by yourself.  getting rid of all the
> > codecs and
> > unicode2charrefs nonsense will fix this:
> >
> Thanks for being so patient and understanding.
>
> OK, I've taken it all out. The only thinking about encoding I had to do
> in the actual code I'm working on was to use:
> file.write(document.toxml(encoding="utf-8"))
>
> instead of just
> file.write(document.toxml())
>
> because otherwise I got errors on copyright symbol characters.

sounds like a bug in minidom...

> My code now works without generating any errors but Konqueror's KHTML
> and Embedded Advanced Text Viewer and IE5 on the Mac still show
> capital-A-with-a-tilde in all the files that have been
> generated/altered. Whereas my text editor and Mozilla show them
> correctly.
>
> The "unicode2charrefs() nonsense" was an attempt to make it output with
> character references rather than literal characters for all characters
> with codes greater than 128. Is there a way of doing this?

character references refer to code points in the Unicode code
space, so you just convert the bytes you get after converting
to UTF-8. however, if you're only using characters from the ISO
Latin 1 set (which is a strict subset of Unicode), you could en-
code to "iso-8859-1" and run unicode2charrefs on the result.

but someone should really fix minidom so it does the right thing.

(fwiw, if you use my ElementTree kit, you can simply do

    tree.write(encoding="us-ascii")

and the toolkit will then use charrefs for any character that's
not plain ascii.  you can get ElementTree from here:

    http://effbot.org/zone/element-index.htm

)

</F>






More information about the Python-list mailing list