[Tutor] print the hole unicode list
Mark Tolonen
metolone+gmane at gmail.com
Thu Aug 28 15:52:22 CEST 2008
"Yang" <winglion1 at 163.com> wrote in message
news:48B6119B.14D89C.08581 at m5-82.163.com...
> Hello,
> I am trying to print out the hole unicode char list in window! form
> 0-65535.
> I use the winxp in simple chinese LOCAL! the ascii form 0-127 and CJK
> chars form
> 0X4E00-0X9FA4 can be print out! Other ucode chars case this error
> "UnicodeEncodeError: 'gbk' codec can't encode character u'\u9fa6' in
> position 0"
>
> my code is here:
> for i in range(0,65536 ):
> uchar=unicode("\u%04X"%i,"unicode-escape")
> print "%x :"%i,uchar
>
> how can I achive a hole unicode list? Or can it be done?
Your console encoding is 'gbk', which can't display all the Unicode
characters. The following code can be used to generate all the characters
into a file using an encoding that supports all Unicode characters, and then
that file can be viewed in a program that supports the encoding (like
Notepad for this example). Still, what characters you see will depend on
the font used. Fonts generally do not support display of every Unicode
character.
import codecs
f=codecs.open('unicode.txt','wt',encoding='utf-8')
for i in xrange(32,0x10000): # skip control chars
if i < 0xD800 or i > 0xDFFF: # skip surrogate pair chars
f.write(u'%04X: %s\t' % (i,unichr(i)))
f.close()
-Mark
More information about the Tutor
mailing list