Binary strings, unicode and encodings

Laurent Therond google at axiomatize.com
Thu Jan 15 14:38:39 EST 2004


Maybe you have a minute to clarify the following matter...

Consider:

---

from cStringIO import StringIO

def bencode_rec(x, b):
    t = type(x)

    if t is str:
        b.write('%d:%s' % (len(x), x))
    else:
        assert 0

def bencode(x):
    b = StringIO()

    bencode_rec(x, b)

    return b.getvalue()

---

Now, if I write bencode('failure reason') into a socket, what will I get
on the other side of the connection?

a) A sequence of bytes where each byte represents an ASCII character

b) A sequence of bytes where each byte represents the UTF-8 encoding of a
Unicode character

c) It depends on the system locale/it depends on what the site module
specifies using setdefaultencoding(name)

---

So, if a Python client in China connects to a Python server in Europe,
must they be careful to specify a common encoding on both sides of the
connection?

Regards,

L.



More information about the Python-list mailing list