[Tutor] print the hole unicode list

Mark Tolonen metolone+gmane at gmail.com
Thu Aug 28 15:52:22 CEST 2008


"Yang" <winglion1 at 163.com> wrote in message 
news:48B6119B.14D89C.08581 at m5-82.163.com...
> Hello,
>    I am trying to print out the hole unicode char list in window! form 
> 0-65535.
> I use the winxp in simple chinese LOCAL! the ascii form 0-127 and CJK 
> chars form
> 0X4E00-0X9FA4 can be print out! Other ucode chars case this error
> "UnicodeEncodeError: 'gbk' codec can't encode character u'\u9fa6' in 
> position 0"
>
> my code is here:
> for i in range(0,65536 ):
>            uchar=unicode("\u%04X"%i,"unicode-escape")
>            print "%x :"%i,uchar
>
> how can I achive a hole unicode list? Or can it be done?

Your console encoding is 'gbk', which can't display all the Unicode 
characters.  The following code can be used to generate all the characters 
into a file using an encoding that supports all Unicode characters, and then 
that file can be viewed in a program that supports the encoding (like 
Notepad for this example).  Still, what characters you see will depend on 
the font used.  Fonts generally do not support display of every Unicode 
character.

import codecs
f=codecs.open('unicode.txt','wt',encoding='utf-8')
for i in xrange(32,0x10000):       # skip control chars
    if i < 0xD800 or i > 0xDFFF:  # skip surrogate pair chars
        f.write(u'%04X: %s\t' % (i,unichr(i)))
f.close()


-Mark 




More information about the Tutor mailing list