How to emit UTF-8 from console mode?

Mark Tolonen M8R-yfto6h at mailinator.com
Wed Oct 1 22:46:37 EDT 2008


"Siegfried Heintze" <siegfried at heintze.com> wrote in message 
news:vLCdnUSj27MaCX7VnZ2dnUVZ_uGdnZ2d at comcast.com...
>
>>Make sure you are using the Lucida Console font for the cmd.exe window and
>>type the commands:
>>
>>chcp 1251
>>python -c "print ''.join(unichr(i) for i in range(0x410,0x431))"
>>
>>Output:
>>
>>?????????????????????????????????
>>
> Wowa! I was not aware of that chcp command! Thanks! How could I do that 
> "chcp 1251" programatically?
>
> The code was a little confusing because those two apostrophes look like a 
> double quote!
>
> But what are we doing here? Can you convince me that we are emitting 
> UTF-8? I need UTF-8 because I need to experiment with some OS function 
> calls that give me UTF-16 and I need to emit UTF-16 or UTF-8.
>
> I think part of the problem is that Lucida Console is not as capable as 
> "Arial Unicode MS" or the fonts used by urxvt-X.

In this case, it is not emitting UTF-8.  It is emitting the windows-1251 
encoding.  As another poster mentioned, the Windows console gets an error 
when attempting to write UTF8 when the code page is 65001 (UTF8).  But you 
can write output to a file explicitly in UTF-8 or UTF-16 and view the file 
with Notepad.  I've used this method for processing Chinese.

>>> import os,codecs
>>> data = u''.join(unichr(i) for i in range(0x410,0x431))
>>> codecs.open('out.txt','wt','utf-8').write(data)
>>> os.startfile('out.txt')

P.S.

One way to set the code page programmatically is to use ctypes, but this 
will only work in a Windows console:

>>> import ctypes
>>> k=ctypes.WinDLL('kernel32')
>>> x.SetConsoleOutputCP(1251)
1
>>> print u''.join(unichr(i) for i in 
>>> range(0x410,0x430)).encode('windows-1251')
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

--Mark




More information about the Python-list mailing list