python3; ftplib: TypeError: Can't convert 'bytes' object to str implicitly

Thu May 15 02:36:35 EDT 2014

Antoon Pardon <antoon.pardon at rece.vub.ac.be> writes:

> op 14-05-14 18:24, Akira Li schreef:
>> Antoon Pardon <antoon.pardon at rece.vub.ac.be> writes:
>>
>>> This is the code I run (python 3.3)
>>>
>>> host = ...
>>> user = ...
>>> passwd = ...
>>>
>>> from ftplib import FTP
>>>
>>> ftp = FTP(host, user, passwd)
>>> ftp.mkd(b'NewDir')
>>> ftp.rmd(b'NewDir')
>>>
>>> This is the traceback
>>>
>>> Traceback (most recent call last):
>>>   File "ftp-problem", line 9, in <module>
>>>     ftp.mkd(b'NewDir')
>>>   File "/usr/lib/python3.3/ftplib.py", line 612, in mkd
>>>     resp = self.voidcmd('MKD ' + dirname)
>>> TypeError: Can't convert 'bytes' object to str implicitly
>>>
>>> The problem is that I do something like this in a backup program.
>>> I don't know the locales that other people use. So I manipulate
>>> all file and directory names as bytes.
>>>
>>> Am I doing something wrong?
>>
>> The error message shows that ftplib expects a string here, not bytes.
>> You could use `ftp.mkd(some_bytes.decode(ftp.encoding))` as a
>> workaround.
>
> Sure but what I like to know: Can this be considered a failing of
> ftplib. Since python3 generally allows paths to be strings as
> well as bytes can't we expect the same of ftplib?
>
> Especially as I assume that path will be converted to bytes anyway
> in order to send it over the network.

bytes are supported for filenames because POSIX systems provide
bytes-based interface e.g., on my system anything except / and NUL could
be used. You can get away with passing opaque bytes filenames for some
time.

rfc 959 expects ascii filenames. rfc 2640 recommends UTF8 (if "feat"
command returns it). rfc 3659: pathnames could be send as utf-8 *and*
"raw". (plus CR LF or CR NUL or IAC or other telnet control codes
handling). Using utf-8 might have security implications and some
firewalls might interfere with OPTS command and FEAT response. Popular
clients such as FileZilla may break on non-utf-8 filenames.

It is less likely that ftp clients use the same character encoding and
it is more likely that an ftp server performs some unexpected character
encoding conversion despite it being non-standard-compliant.

You could try to post on python-ideas mailing list anyway, to suggest
the enhancement (support bytes where filenames are expected) for your
backup application use case:

- you might not avoid undecodable filenames -- UnicodeEncodeError in the
  current implementation if you pass Unicode string created using
  os.fsdecode(undecodable_bytes) to ftplib

- non-python ftp clients should be able to access the content -- no
  Python error handlers such as surrogateescape or backslashreplace are
  allowed

- set ftp.encoding to utf-8 and pass non-utf-8 filenames as bytes -- to
  avoid '\U0001F604'.encode('utf-8').decode(ftp.encoding)

--
akira