Binary strings, unicode and encodings
Peter Hansen
peter at engcorp.com
Thu Jan 15 14:59:56 EST 2004
Laurent Therond wrote:
>
> Consider:
> ---
> from cStringIO import StringIO
>
> def bencode_rec(x, b):
> t = type(x)
> if t is str:
> b.write('%d:%s' % (len(x), x))
> else:
> assert 0
The above is confusing. Why not just do
def bencode_rec(x, b):
assert type(x) is str
b.write(.....)
Why the if/else etc?
> def bencode(x):
> b = StringIO()
> bencode_rec(x, b)
> return b.getvalue()
>
> ---
> Now, if I write bencode('failure reason') into a socket, what will I get
> on the other side of the connection?
This is Python. Why not try it and see? I wrote a quick test at
the interactive prompt and concluded that StringIO converts to
strings, so if your input is Unicode it has to be encodeable or
you'll get the usual exception.
> a) A sequence of bytes where each byte represents an ASCII character
Yes, provided your input is exclusively ASCII (7-bit) data.
> b) A sequence of bytes where each byte represents the UTF-8 encoding of a
> Unicode character
Yes, if UTF-8 is your default encoding and you're using Unicode input.
> c) It depends on the system locale/it depends on what the site module
> specifies using setdefaultencoding(name)
Yes, as it always does if you are using Unicode but converting to byte strings
as it appears StringIO does.
-Peter
More information about the Python-list
mailing list