Binary strings, unicode and encodings

Peter Hansen peter at engcorp.com
Thu Jan 15 14:59:56 EST 2004


Laurent Therond wrote:
> 
> Consider:
> ---
> from cStringIO import StringIO
> 
> def bencode_rec(x, b):
>     t = type(x)
>     if t is str:
>         b.write('%d:%s' % (len(x), x))
>     else:
>         assert 0

The above is confusing.  Why not just do

def bencode_rec(x, b):
    assert type(x) is str
    b.write(.....)

Why the if/else etc?


> def bencode(x):
>     b = StringIO()
>     bencode_rec(x, b)
>     return b.getvalue()
> 
> ---
> Now, if I write bencode('failure reason') into a socket, what will I get
> on the other side of the connection?

This is Python.  Why not try it and see?  I wrote a quick test at
the interactive prompt and concluded that StringIO converts to 
strings, so if your input is Unicode it has to be encodeable or 
you'll get the usual exception.

> a) A sequence of bytes where each byte represents an ASCII character

Yes, provided your input is exclusively ASCII (7-bit) data.

> b) A sequence of bytes where each byte represents the UTF-8 encoding of a
> Unicode character

Yes, if UTF-8 is your default encoding and you're using Unicode input.

> c) It depends on the system locale/it depends on what the site module
> specifies using setdefaultencoding(name)

Yes, as it always does if you are using Unicode but converting to byte strings
as it appears StringIO does.

-Peter



More information about the Python-list mailing list