''.join() with encoded strings

Fredrik Lundh fredrik at pythonware.com
Mon Feb 27 13:08:07 EST 2006


"Sandra-24" wrote:

> I'd love to know why calling ''.join() on a list of encoded strings
> automatically results in converting to the default encoding. First of
> all, it's undocumented, so If I didn't have non-ascii characters in my
> utf-8 data, I'd never have known until one day I did, and then the code
> would break. Secondly you can't override (for valid reasons) the
> default encoding, so that's not a way around it. So ''.join becomes
> pretty useless when dealing with the real (non-ascii) world.

if all strings in a sequence are encoded strings (byte buffers), join does
the right thing.

if all strings in a sequence are Unicode strings, join does the right thing.

if all strings are ascii strings, join does the right thing.

the only way to mess up is to mix byte buffers containing encoded data
with decoded strings.  the solution is simple: make sure to *decode* all
data you're using, *before* using it.

</F>






More information about the Python-list mailing list