encoding problem

Fri Dec 19 16:28:19 EST 2008

On Fri, 19 Dec 2008 08:20:07 -0700, Joe Strout wrote:

> Marc 'BlackJack' Rintsch wrote:
> 
>>> The question is why the Python interpreter use the default encoding
>>> instead of "utf-8", which I explicitly declared in the source.
>> 
>> Because the declaration is only for decoding unicode literals in that
>> very source file.
> 
> And because strings in Python, unlike in (say) REALbasic, do not know
> their encoding -- they're just a string of bytes.  If they were a string
> of bytes PLUS an encoding, then every string would know what it is, and
> things like conversion to another encoding, or concatenation of two
> strings that may differ in encoding, could be handled automatically.
> 
> I consider this one of the great shortcomings of Python, but it's mostly
> just a temporary inconvenience -- the world is moving to Unicode, and
> with Python 3, we won't have to worry about it so much.

I don't see the shortcoming in Python <3.0.  If you want real strings 
with characters instead of just a bunch of bytes simply use `unicode` 
objects instead of `str`.

And does REALbasic really use byte strings plus an encoding!?  Sounds 
strange.  When concatenating which encoding "wins"?

Ciao,
	Marc 'BlackJack' Rintsch