Unicode problems, yet again
John Machin
sjmachin at lexicon.net
Sun Apr 24 07:31:20 EDT 2005
On Sun, 24 Apr 2005 11:26:20 +0200, Ivan Voras <ivoras at _-_fer.hr>
wrote:
>Jp Calderone wrote:
>
>> You don't have a string fetched from a database, in iso-8859-2, alas.
>> That is the root of the problem you're having. What you have is a
>> unicode string.
>
>Yes, you're right :) I actually did have iso-8859-2 data, but, as I
>found out late last night, the data got converted to unicode along the way.
Just a thought: I noticed from the traceback that you are running this
on a Windows box. Profound apologies in advance if this question is an
insult to your intelligence, but you do know that Windows code page
1250 (Latin 2) -- which I guess is the code page that you would be
using -- is *NOT* the same as iso-8859-2, don't you?
>>> (Does anyone else feel that python's unicode handling is, well...
>>> suboptimal at least?)
>>
>> Hmm. Not really. The only problem I've found with it is misguided
>> attempt to "do the right thing" by implicitly encoding unicode strings,
>> and this isn't so much of a problem once you figure things out, because
>> you can always do things explicitly and avoid invoking the implicit
>> behavior.
>
>I'm learning that, the hard way :)
>
>One thing that I always wanted to do (but probably can't be done?) is to
>set the default/implicit encoding to the one I'm using... I often have
>to deal with 8-bit encodings and rarely with unicode. Can it be done
>per-program?
It's a bit difficult to understand what you are trying to do, but I'd
suggest that you forget about setting the default encoding; if you
need to deal with Unicode, then set up the encoding explicitly on a
per-file or per-socket basis. The default ASCII encoding is then there
as a trap when (sorry to rub it in) you don't know what type of data
you have.
HTH,
John
More information about the Python-list
mailing list