utf8 and ftplib

John Machin sjmachin at lexicon.net
Thu Jun 16 17:49:22 EDT 2005


Richard Lewis wrote:
> Hi there,
> 
> I'm having a problem with unicode files and ftplib (using Python 2.3.5).
> 
> I've got this code:
> 
> xml_source = codecs.open("foo.xml", 'w+b', "utf8")
> #xml_source = file("foo.xml", 'w+b')
> 
> ftp.retrbinary("RETR foo.xml", xml_source.write)
> #ftp.retrlines("RETR foo.xml", xml_source.write)
> 
> It opens a new local file using utf8 encoding and then reads from a file
> on an FTP server (also utf8 encoded) into that local file. It comes up
> with an error, however, on calling the xml_source.write callback (I
> think) saying that:
> 
> "File "myscript.py", line 75, in get_content
>   ftp.retrbinary("RETR foo.xml", xml_source.write)
> File "/usr/lib/python2.3/ftplib.py", line 384, in retrbinary
>   callback(data)
> File "/usr/lib/python2.3/codecs.py", line 400, in write
>   return self.writer.write(data)
> File "/usr/lib/python2.3/codecs.py", line 178, in write
>   data, consumed = self.encode(object, self.errors)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 76:
> ordinal not in range(128)"
> 
> I've tried using both the commented lines of code in the above example
> (i.e. using file() instead of codecs.open() and retlines() instead of
> retbinary()). retlines() makes no difference, but if I use file()
> instead of codecs.open() I can open the file, but the extended
> characters from the source file (e.g. foreign characters, copyright
> symbol, etc.) all appear with an extra character in front of them
> (because of the two char width in utf8?).

Saying "appear with an extra character in front of them" is close to 
useless for diagnostic purposes -- print repr(sample_string) would be 
more informative.

In any case, the file with the "foreign" [attitude?] characters may well 
be what you want.

> 
> Is the xml_source.write callback causing the problem here? Or is it
> something else? Is there any way that I can correctly retrieve a utf8
> encoded file from an FTP server?

To get an exact copy of a file via FTP -- doesn't matter whether it's 
encoded in utf8 or ESCII or whatever -- use the following combination:

xml_source = file("foo.xml", 'w+b')
ftp.retrbinary("RETR foo.xml", xml_source.write)

If you were using a command-line FTP client, you would use the "binary" 
command before doing a "get" or "mget".

HTH,
John



More information about the Python-list mailing list