How to emit UTF-8 from console mode?
Mark Tolonen
M8R-yfto6h at mailinator.com
Wed Oct 1 22:46:37 EDT 2008
"Siegfried Heintze" <siegfried at heintze.com> wrote in message
news:vLCdnUSj27MaCX7VnZ2dnUVZ_uGdnZ2d at comcast.com...
>
>>Make sure you are using the Lucida Console font for the cmd.exe window and
>>type the commands:
>>
>>chcp 1251
>>python -c "print ''.join(unichr(i) for i in range(0x410,0x431))"
>>
>>Output:
>>
>>?????????????????????????????????
>>
> Wowa! I was not aware of that chcp command! Thanks! How could I do that
> "chcp 1251" programatically?
>
> The code was a little confusing because those two apostrophes look like a
> double quote!
>
> But what are we doing here? Can you convince me that we are emitting
> UTF-8? I need UTF-8 because I need to experiment with some OS function
> calls that give me UTF-16 and I need to emit UTF-16 or UTF-8.
>
> I think part of the problem is that Lucida Console is not as capable as
> "Arial Unicode MS" or the fonts used by urxvt-X.
In this case, it is not emitting UTF-8. It is emitting the windows-1251
encoding. As another poster mentioned, the Windows console gets an error
when attempting to write UTF8 when the code page is 65001 (UTF8). But you
can write output to a file explicitly in UTF-8 or UTF-16 and view the file
with Notepad. I've used this method for processing Chinese.
>>> import os,codecs
>>> data = u''.join(unichr(i) for i in range(0x410,0x431))
>>> codecs.open('out.txt','wt','utf-8').write(data)
>>> os.startfile('out.txt')
P.S.
One way to set the code page programmatically is to use ctypes, but this
will only work in a Windows console:
>>> import ctypes
>>> k=ctypes.WinDLL('kernel32')
>>> x.SetConsoleOutputCP(1251)
1
>>> print u''.join(unichr(i) for i in
>>> range(0x410,0x430)).encode('windows-1251')
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
--Mark
More information about the Python-list
mailing list