Unicode problems, yet again

Sun Apr 24 07:31:20 EDT 2005

On Sun, 24 Apr 2005 11:26:20 +0200, Ivan Voras <ivoras at _-_fer.hr>
wrote:

>Jp Calderone wrote:
>
>>  You don't have a string fetched from a database, in iso-8859-2, alas.  
>> That is the root of the problem you're having.  What you have is a 
>> unicode string.
>
>Yes, you're right :) I actually did have iso-8859-2 data, but, as I 
>found out late last night, the data got converted to unicode along the way.

Just a thought: I noticed from the traceback that you are running this
on a Windows box. Profound apologies in advance if this question is an
insult to your intelligence, but you do know that Windows code page
1250 (Latin 2) -- which I guess is the code page that you would be
using -- is *NOT* the same as iso-8859-2, don't you?

>>> (Does anyone else feel that python's unicode handling is, well... 
>>> suboptimal at least?)
>> 
>>  Hmm.  Not really.  The only problem I've found with it is misguided 
>> attempt to "do the right thing" by implicitly encoding unicode strings, 
>> and this isn't so much of a problem once you figure things out, because 
>> you can always do things explicitly and avoid invoking the implicit 
>> behavior.
>
>I'm learning that, the hard way :)
>
>One thing that I always wanted to do (but probably can't be done?) is to 
>set the default/implicit encoding to the one I'm using... I often have 
>to deal with 8-bit encodings and rarely with unicode. Can it be done 
>per-program?

It's a bit difficult to understand what you are trying to do, but I'd
suggest that you forget about setting the default encoding; if you
need to deal with Unicode, then set up the encoding explicitly on a
per-file or per-socket basis. The default ASCII encoding is then there
as a trap when (sorry to rub it in) you don't know what type of data
you have.

HTH,
John