[Python-Dev] Unicode database

Nick Maclaren nmm1 at cus.cam.ac.uk
Thu Aug 9 10:27:46 CEST 2007


=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at v.loewis.de> wrote:
>
> > I think that you will find that you are using a non-standard
> > environment and set of Python sources.
>
> Please trust me that I didn't. See below.

I always trust people as much as I trust myself, but I do tend to
check up.  See below.

> Ah, the makefile. I don't think you use it create the Unicode database.
> 
> It's only good for generating the codecs (Lib/encodings)

Yes, but it DOES attempt to download the mappings, and is the ONLY
script which attempts to do so.

beelzebub$find Python-2.5.1 -type f | wc
   3458    3460  135981
beelzebub$find Python-2.5.1 -type f | xargs grep ftp.unicode.org
Python-2.5.1/Doc/lib/libunicodedata.tex:4.1.0 which is publicly available from \url{ftp://ftp.unicode.org/}.
grep: Python-2.5.1/Mac/Icons/Disk: No such file or directory
grep: Image.icns: No such file or directory
grep: Python-2.5.1/Mac/Icons/Python: No such file or directory
grep: Folder.icns: No such file or directory
Python-2.5.1/Misc/NEWS:  at ftp.unicode.org and contain a few updates (e.g. the Mac OS
Python-2.5.1/Tools/unicode/Makefile:# files available at ftp://ftp.unicode.org/
Python-2.5.1/Tools/unicode/Makefile:    ncftpget -R ftp.unicode.org . Public/MAPPINGS
Python-2.5.1/Tools/unicode/gencodec.py:site (ftp://ftp.unicode.org/Public/MAPPINGS/) and creates Python codec
Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:#       ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT the
Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:#       ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT
Python-2.5.1/Tools/unicode/python-mappings/KOI8-U.TXT:#       ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT
Python-2.5.1/Tools/unicode/python-mappings/CP1140.TXT:#       ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT
Python-2.5.1/Modules/unicodedata.c:4.1.0 which is publically available from ftp://ftp.unicode.org/.\n

> AFAICT, the mappings are still where they always were: at the
> location given in the Makefile. (e.g.
> ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT
> )

Then you DEFINITELY are using a non-standard set of files.  That
above was from the source of Python 2.5.1 that I have just downloaded.

> Did you really believe the Unicode consortium doesn't have the
> old versions of the character database online? Do you think
> they are complete fools?

Please don't be offensive.  I said that I had failed to find them,
after searching the Unicode Web site.  Now that you have give me
the actual file name, I can find them, but searching on the version
and request for that database leads to unhelpful files.

> Googling for "unicode 3.2 ucd" gives me
> 
> http://unicode.org/Public/3.2-Update/
> 
> as the top hit (of course, you have to know that they call
> the character database "ucd" to invoke that query).

Generally, I distrust Google for such things, as it is as likely
to lead to you the wrong information as the right one.  For example,
that hit you found was on a different logical server, and could
well be an incorrect version of the database.  It is VERY common
for such things to 'escape' into Google.

Have you checked whether or not that file is correct with the
Unicode consortium?


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679


More information about the Python-Dev mailing list