utf8 and ftplib
John Machin
sjmachin at lexicon.net
Thu Jun 16 17:49:22 EDT 2005
Richard Lewis wrote:
> Hi there,
>
> I'm having a problem with unicode files and ftplib (using Python 2.3.5).
>
> I've got this code:
>
> xml_source = codecs.open("foo.xml", 'w+b', "utf8")
> #xml_source = file("foo.xml", 'w+b')
>
> ftp.retrbinary("RETR foo.xml", xml_source.write)
> #ftp.retrlines("RETR foo.xml", xml_source.write)
>
> It opens a new local file using utf8 encoding and then reads from a file
> on an FTP server (also utf8 encoded) into that local file. It comes up
> with an error, however, on calling the xml_source.write callback (I
> think) saying that:
>
> "File "myscript.py", line 75, in get_content
> ftp.retrbinary("RETR foo.xml", xml_source.write)
> File "/usr/lib/python2.3/ftplib.py", line 384, in retrbinary
> callback(data)
> File "/usr/lib/python2.3/codecs.py", line 400, in write
> return self.writer.write(data)
> File "/usr/lib/python2.3/codecs.py", line 178, in write
> data, consumed = self.encode(object, self.errors)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 76:
> ordinal not in range(128)"
>
> I've tried using both the commented lines of code in the above example
> (i.e. using file() instead of codecs.open() and retlines() instead of
> retbinary()). retlines() makes no difference, but if I use file()
> instead of codecs.open() I can open the file, but the extended
> characters from the source file (e.g. foreign characters, copyright
> symbol, etc.) all appear with an extra character in front of them
> (because of the two char width in utf8?).
Saying "appear with an extra character in front of them" is close to
useless for diagnostic purposes -- print repr(sample_string) would be
more informative.
In any case, the file with the "foreign" [attitude?] characters may well
be what you want.
>
> Is the xml_source.write callback causing the problem here? Or is it
> something else? Is there any way that I can correctly retrieve a utf8
> encoded file from an FTP server?
To get an exact copy of a file via FTP -- doesn't matter whether it's
encoded in utf8 or ESCII or whatever -- use the following combination:
xml_source = file("foo.xml", 'w+b')
ftp.retrbinary("RETR foo.xml", xml_source.write)
If you were using a command-line FTP client, you would use the "binary"
command before doing a "get" or "mget".
HTH,
John
More information about the Python-list
mailing list