[Tutor] Available characters

Steven D'Aprano steve at pearwood.info
Thu May 23 03:06:48 CEST 2013


On 23/05/13 04:14, Citizen Kant wrote:
> Does anybody know if there's a Python method that gives or stores the
> complete list of ascii characters or unicode characters? The list of every
> single character available would be perfect.


There are only 127 ASCII characters, so getting a list of them is trivial:

ascii = map(chr, range(128))  # Python 2
ascii = list(map(chr, range(128)))  # Python 3


or if you prefer a string:

ascii = ''.join(map(chr, range(128)))


If you don't like map(), you can use a list comprehension:

[chr(i) for i in range(128)]

The string module already defines some useful subsets of them:

py> import string
py> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'


There are 1114111 (hexadecimal 0x10FFFF) possible Unicode code-points, but most of them are currently unassigned. Of those that are assigned, many of them are reserved as non-characters or for special purposes, and even those which are assigned, most fonts do not display anything even close to the full range of Unicode characters.

If you spend some time on the Unicode web site, you will find lists of characters which are defined:

www.unicode.org

but beware, it is relatively heavy going. Wikipedia has a page showing all currently assigned characters, but font support is still lousy and many of them display as boxes:

http://en.wikipedia.org/wiki/List_of_Unicode_characters

You can generate the entire list yourself, using the same technique as for ASCII above:


# Python 2:
unicode = ''.join(map(unichr, xrange(1114112)))

# Python3:
unicode = ''.join(map(chr, range(1114112)))


although it will take a few seconds to generate the entire range. You can then get the name for each one using something like this:

import unicodedata
for c in unicode:
     try:
         print(c, unicodedata.name(c))
     except ValueError:
         # unassigned, or a reserved non-character
         pass


but remember that there are currently almost 100,000 defined characters in Unicode, and your terminal will probably not be able to print most of them. Expect to see a lot of boxes.




-- 
Steven


More information about the Tutor mailing list