encoding problem

Joe Strout joe at strout.net
Fri Dec 19 17:20:08 EST 2008


Marc 'BlackJack' Rintsch wrote:

>> And because strings in Python, unlike in (say) REALbasic, do not know
>> their encoding -- they're just a string of bytes.  If they were a string
>> of bytes PLUS an encoding, then every string would know what it is, and
>> things like conversion to another encoding, or concatenation of two
>> strings that may differ in encoding, could be handled automatically.
>>
>> I consider this one of the great shortcomings of Python, but it's mostly
>> just a temporary inconvenience -- the world is moving to Unicode, and
>> with Python 3, we won't have to worry about it so much.
> 
> I don't see the shortcoming in Python <3.0.  If you want real strings 
> with characters instead of just a bunch of bytes simply use `unicode` 
> objects instead of `str`.

Fair enough -- that certainly is the best policy.  But working with any 
other encoding (sometimes necessary when interfacing with any other 
software), it's still a bit of a PITA.

> And does REALbasic really use byte strings plus an encoding!?

You betcha!  Works like a dream.

> Sounds strange.  When concatenating which encoding "wins"?

The one that is a superset of the other, or if neither is, then both are 
converted to UTF-8 (which is the "standard" encoding in RB, though it 
works comfily with any other too).

Cheers,
- Joe




More information about the Python-list mailing list