utf8 and ftplib
Fredrik Lundh
fredrik at pythonware.com
Mon Jun 20 13:36:23 EDT 2005
Richard Lewis wrote:
> On Mon, 20 Jun 2005 14:27:17 +0200, "Fredrik Lundh"
> <fredrik at pythonware.com> said:
> >
> > well, you're messing it up all by yourself. getting rid of all the
> > codecs and
> > unicode2charrefs nonsense will fix this:
> >
> Thanks for being so patient and understanding.
>
> OK, I've taken it all out. The only thinking about encoding I had to do
> in the actual code I'm working on was to use:
> file.write(document.toxml(encoding="utf-8"))
>
> instead of just
> file.write(document.toxml())
>
> because otherwise I got errors on copyright symbol characters.
sounds like a bug in minidom...
> My code now works without generating any errors but Konqueror's KHTML
> and Embedded Advanced Text Viewer and IE5 on the Mac still show
> capital-A-with-a-tilde in all the files that have been
> generated/altered. Whereas my text editor and Mozilla show them
> correctly.
>
> The "unicode2charrefs() nonsense" was an attempt to make it output with
> character references rather than literal characters for all characters
> with codes greater than 128. Is there a way of doing this?
character references refer to code points in the Unicode code
space, so you just convert the bytes you get after converting
to UTF-8. however, if you're only using characters from the ISO
Latin 1 set (which is a strict subset of Unicode), you could en-
code to "iso-8859-1" and run unicode2charrefs on the result.
but someone should really fix minidom so it does the right thing.
(fwiw, if you use my ElementTree kit, you can simply do
tree.write(encoding="us-ascii")
and the toolkit will then use charrefs for any character that's
not plain ascii. you can get ElementTree from here:
http://effbot.org/zone/element-index.htm
)
</F>
More information about the Python-list
mailing list