Binary strings, unicode and encodings

Peter Hansen peter at engcorp.com
Fri Jan 16 09:21:46 EST 2004


Laurent Therond wrote:
> 
> So, your test revealed that StringIO converts to byte strings.
> Does that mean:
>     - If the input string contains characters that cannot be encoded
> in ASCII, bencode_rec will fail?

Yes, if your default encoding is ASCII.

> Yet, if your locale specifies UTF-8 as the default encoding, it should
> not fail, right?

True, provided you are actually creating UTF-8 strings...  just sticking
in a character that has the 8th bit set doesn't mean the string is UTF-8
of course.

> Hence, I conclude your test was made on a system that uses ASCII/ISO
> 8859-1 as its default encoding. Is that right?

Correct, Windows 98, sys.getdefaultencoding() returns 'ascii'.

> > > c) It depends on the system locale/it depends on what the site module
> > > specifies using setdefaultencoding(name)
> >
> > Yes, as it always does if you are using Unicode but converting to byte strings
> > as it appears StringIO does.
> 
> Umm...not sure here...I think StringIO must behave differently
> depending on your locale and depending on how you assigned the string.

It's always possible that StringIO takes locale into account in some
special way, but I suspect it does not.  As for "how you assigned the string"
I'm not sure I understand what that might mean.  How many ways do you know
to assign a string in Python?

-Peter



More information about the Python-list mailing list