utf8 and ftplib

Fredrik Lundh fredrik at pythonware.com
Sat Jun 18 06:36:54 EDT 2005


Richard Lewis wrote:

> OK, I've fiddled around a bit more but I still haven't managed to get it
> to work. I get the fact that its not the FTP operation thats causing the
> problem so it must be either the xml.minidom.parse() function (and
> whatever sort of file I give that) or the way that I write my results to
> output files after I've done my DOM processing. I'll post some more
> detailed code:
>
> def open_file(file_name):
>    ftp = ftplib.FTP(self.host)
>    ftp.login(self.login, self.passwd)
>
>    content_file = file(file_name, 'w+b')
>    ftp.retrbinary("RETR " + self.path, content_file.write)
>    ftp.quit()
>    content_file.close()
>
>    ## Case 1:
>    #self.document = parse(file_name)
>
>    ## Case 2:
>    #self.document = parse(codecs.open(file_name, 'r+b', "utf-8"))
>
>    # Case 3:
>    content_file = codecs.open(file_name, 'r', "utf-8")
>    self.document = parse(codecs.EncodedFile(content_file, "utf-8",
>    "utf-8"))
>    content_file.close()
>
> In Case1 I get the incorrectly encoded characters.

case 1 is the only one where you use the XML parser as it is designed to
be used (on the stream level, XML is defined in terms of encoded text,
not Unicode characters.  the parser will decode things for you)

given that he XML tree returned by the parser contains *decoded* Uni-
code characters (in Unicode string objects), what makes you so sure that
you're getting "incorrectly encoded characters" from the parser?

</F>

(I wonder why this is so hard for so many people?  hardly any programmer has
any problem telling the difference between, say, a 32-bit binary floating point
value on disk, a floating point object, and the string representation of a float.
but replace the float with a Unicode character, and anglocentric programmers
immediately resort to poking-with-a-stick-in-the-dark programming. I'll figure
it out, some day...) 






More information about the Python-list mailing list