Binary strings, unicode and encodings

Laurent Therond google at axiomatize.com
Thu Jan 15 17:19:33 EST 2004


Peter Hansen <peter at engcorp.com> wrote in message news:<4006F13C.7D432B98 at engcorp.com>...
> The above is confusing.  Why not just do
> 
> def bencode_rec(x, b):
>     assert type(x) is str
>     b.write(.....)
> 
> Why the if/else etc?

That's a code extract. The real code was more complicated.

> This is Python.  Why not try it and see?  I wrote a quick test at
> the interactive prompt and concluded that StringIO converts to 
> strings, so if your input is Unicode it has to be encodeable or 
> you'll get the usual exception.

Good point. Sorry, I don't have those good reflexes--I am new to
Python.

So, your test revealed that StringIO converts to byte strings.
Does that mean:
    - If the input string contains characters that cannot be encoded
in ASCII, bencode_rec will fail?

Yet, if your locale specifies UTF-8 as the default encoding, it should
not fail, right?

Hence, I conclude your test was made on a system that uses ASCII/ISO
8859-1 as its default encoding. Is that right?

> > a) A sequence of bytes where each byte represents an ASCII character
> 
> Yes, provided your input is exclusively ASCII (7-bit) data.

OK.

> > b) A sequence of bytes where each byte represents the UTF-8 encoding of a
> > Unicode character
> 
> Yes, if UTF-8 is your default encoding and you're using Unicode input.

OK.

> > c) It depends on the system locale/it depends on what the site module
> > specifies using setdefaultencoding(name)
> 
> Yes, as it always does if you are using Unicode but converting to byte strings
> as it appears StringIO does.

Umm...not sure here...I think StringIO must behave differently
depending on your locale and depending on how you assigned the string.

Thanks for your help!

L.



More information about the Python-list mailing list