codecs latin1 unicode standard output file

Mon Dec 15 15:23:46 EST 2003

On Mon, 15 Dec 2003 12:38:50 +0100, "Fredrik Lundh" <fredrik at pythonware.com> wrote:

>Marko Faldix wrote:
>
>> I try to describe. It's a Window machine with Python 2.3.2 installed. Using
>> command line (cmd). Put these lines of code in a file called klotentest1.py:
                                                    ^^^^[1]
>>
>> # -*- coding: iso-8859-1 -*-
                 ^^^^^^^^^^[2]
>>
>> print unicode("My umlauts are ä, ö, ü", "latin-1")
>> print "My umlauts are ä, ö, ü"
         ^^^^^^^^^^^^^^^^^^^^^^^^[3]
>>
[...]
>> Calling this on command line:
>>
>> klotentest1.py
>>
>> Indeed, result of first print is as desired, result of second print delivers
>> strange letters but no error.
>
>your console device doesn't use iso-8859-1; it probably uses cp850.
>if you print an 8-bit string to the console, Python assumes that you
>know what you're doing...
I think the OP is suggesting that given [1] & [2], [3] should implicitly carry the [2] info
and be converted for output just like the result of unicode(...) is.

(I know that's not the way it works now, and I know it's not an easy problem ;-)
>
>> Now I call this on command line:
>>
>> klotentest1.py > klotentest1.txt
>>
>> This fails:
>> Traceback (most recent call last):
>> File "C:\home\marko\moeller_port\moeller_port_exec_svn\klotentest1.py", line
>> 3, in ?
>>     print unicode("My umlauts are õ, ÷, ³", "latin-1")
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
>> 15: ordinal not in range(128)
>>
>> In my point of view python shouldn't act in different ways whether result is
>> piped to file or not.
>
>when you print to a console with a known encoding, Python 2.3 auto-
>magically converts Unicode strings to 8-bit strings using the console
>encoding.
>
>files don't have an encoding, which is why the second case fails.
I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
_do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
not just a byte sequence (I have to get back to a previous thread with Martin, where
I owe a reply. This same issue is key there). (I realize that's not the way it works now,
and that it's a hard problem, to repeat myself ;-)

Regards,
Bengt Richter