Non-ascii characters in an MSDOS window

Bengt Richter bokr at oz.net
Fri Oct 18 11:40:41 EDT 2002


On Fri, 18 Oct 2002 05:30:42 GMT, "Bartolomé Sintes Marco" <BartolomeSintes at ono.com> wrote:

>Thanks for your help. I have tried your program. I have copied it to a
>sample.py file,
>but when I run it (Ctrl+F5) I get the following error message
>
I'm sorry, I should have made it more forgiving about encoding assumptions,
which apparently were wrong (or else there is something else wrong).
Using the on() method of my code put 'strict' conversion in place for all the
output going to stdout. I should have put 'replace' in the calls (see below)
so that at least it would substitute '?' when it couldn't convert.

I was running python 2.2 on windows NT4, and I think the console is using
code page 437. But I suspect that you are having a problem with code page 1252
(windows characters) encoding somewhere.

Would you try not executing my sample.py, but instead going to a console window
from the directory where it is, and then typing as follows:

 >>> import sample
 >>> print sample.sa
 Spanish accents: ß T f = ·

The above may have gotten converted when I pasted into my newsreader, or on
your side,  so let's not trust it. Try printing the repr() value, so we'll see the hex:

 >>> print repr(sample.sa)
 'Spanish accents: \xe1 \xe9 \xed \xf3 \xfa'

Do you get those values? They are the latin-1 values for  á é í ó ú
(acute-accented a e i o u, in case they didn't show here--^^^^^^^^^.

 >>> '\xe1 \xe9 \xed \xf3 \xfa'.decode('latin-1')
 u'\xe1 \xe9 \xed \xf3 \xfa'

Note that there's a "u'" in front, but the values didn't change.

 >>> '\xe1 \xe9 \xed \xf3 \xfa'.decode('latin-1').encode('cp437')
 '\xa0 \x82 \xa1 \xa2 \xa3'

Does the above work for you? The codes are character codes for 
the same accented characters in terminal font. Notice the \x82, which is
part of the windows character set but not the latin-1 set.

 >>> print '\xe1 \xe9 \xed \xf3 \xfa'.decode('latin-1','replace').encode('cp437','replace')
 á é í ó ú

What does this do on your system?
 >>> print '\xe1 \xe9 \xed \xf3 \xfa'.decode('cp1252','replace').encode('cp437','replace')
 á é í ó ú


>Traceback (most recent call last):
>  File "C:\Mis documentos\Barto\02-03 Abastos\Python\Acentos\sample.py",
>line 20, in ?
>    print " On: Spanish accents: á é í ó ú"
>  File "C:\Mis documentos\Barto\02-03 Abastos\Python\Acentos\sample.py",
>line 14, in write
>    self.so.write(s.decode('latin-1').encode('cp437'))
>  File "C:\PYTHON22\lib\encodings\cp437.py", line 18, in encode
>    return codecs.charmap_encode(input,errors,encoding_map)
>UnicodeError: charmap encoding error: character maps to <undefined>
>
>and Python seems broken because after this error, I can not even save the
>file. Am I doing
>something wrong?
>
>Best regards,
>Barto
>
>"Bengt Richter" <bokr at oz.net> escribió en el mensaje
>news:aonrjm$2ag$0 at 216.39.172.122...
>> You could try something like this version of your sample.py:
>> (it assumes your strings are Latin-1 encoded)
>> --
>> # sample.py
>> sa =  "Spanish accents: á é í ó ú"  # som we can see sample.sa
>interactively
>> class L1to437:
>>     import sys
>>     def __init__(self):
>>         self.so = L1to437.sys.stdout
>>     def on(self):
>>         L1to437.sys.stdout = self
>>     def off(self):
>>         L1to437.sys.stdout = self.so
>>     def write(self, s):
>>         self.so.write(s.decode('latin-1').encode('cp437'))
           self.so.write(s.decode('latin-1','replace').encode('cp437','replace'))

This change should make it print question marks whenever it gets a character it doesn't
know what to do with. But the trick will be to replace the two encoding names to whatever
is applies to on your system. Maybe 'cp1252' is one of them for you?
Martin will probably have answered by now ;-)

>> so437 = L1to437()
>>
>> if __name__ == '__main__':
>>     print "Before: Spanish accents: á é í ó ú"
>>     so437.on()
>>     print "    On: Spanish accents: á é í ó ú"
>>     so437.off()
>>     print "   Off: Spanish accents: á é í ó ú"
>>     end = raw_input()
>> --
>>
>> Running it:
>> [19:20] C:\pywk\junk>sample.py
>> Before: Spanish accents: ß T f = ·
>>     On: Spanish accents: á é í ó ú
>>    Off: Spanish accents: ß T f = ·
>>
>> Or you can import it, and use sample.so437.on() to turn on coding of
>output for the DOS window:
>>
>>  >>> import sample
>>  >>> print sample.sa
>>  Spanish accents: ß T f = ·
>>  >>> sample.so437.on()
>>  >>> print sample.sa
>>  Spanish accents: á é í ó ú
>>  >>> sample.so437.off()
>>  >>> print sample.sa
>>  Spanish accents: ß T f = ·
What the characters you see above may not be the same as what I am looking at,
due to what happens when I copy and paste from console window to this newsreader
(Free Agent) and what happens on your end, but the middle one has all the accents
correct for me.

>>
>> But Martin is the expert on that stuff. There may be a clean way to do it.
>>

Regards,
Bengt Richter



More information about the Python-list mailing list