Totally confused by the str/bytes/unicode differences introduced in Pythyon 3.x

Giampaolo Rodola' gnewsg at gmail.com
Fri Jan 16 19:47:52 EST 2009


Hi,
I'm sure the message I'm going to write will seem quite dumb to most
people but I really don't understand the  str/bytes/unicode
differences introduced in Python 3.0 so be patient.
What I'm trying to do is porting pyftpdlib to Python 3.x.
I don't want to support Unicode. I don't want pyftpdlib for py 3k to
do anything new or different.
I just want it to behave exactly the same as in the 2.x version and
I'd like to know if that's possible with Python 3.x.

Now. The basic difference is that socket.recv() returns a bytes object
instead of a string object and that's the thing which confuses me
mainly.
My question is: is there a way to convert that bytes object into
exactly *the same thing* returned by socket.recv() in Python 2.x (a
string)?

I know I can do:

data = socket.recv(1024)
data = data.decode(encoding)

...to convert bytes into a string but that's not exactly the same
thing.
In Python 2.x I didn't have to care about the encoding. What
socket.recv() returned was just a string. That was all.
Now doing something like b''.decode(encoding) puts me in serious
troubles since that can raise an exception in case client and server
use a different encoding.

As far as I've understood the basic difference I see now is that a
Python 2.x based FTP server could handle a 3.x based FTP client using
"latin1" encoding or "utf-8" or anything else while with Python 3.x
I'm forced to tell my server which encoding to use and I don't know
how to deal with that.


--- Giampaolo
http://code.google.com/p/pyftpdlib



More information about the Python-list mailing list