Python usage numbers

Christian Heimes lists at cheimes.de
Sun Feb 12 19:00:14 EST 2012


Am 12.02.2012 23:07, schrieb Terry Reedy:
> But because of the limitation of ascii on a worldwide, as opposed to
> American basis, we ended up with 100-200 codings for almost as many
> character sets. This is because the idea of ascii was applied by each
> nation or language group individually to their local situation.

You really learn to appreciate unicode when you have to deal with mixed
languages in texts and old databases from the 70ties and 80ties.

I'm working with books that contain medieval German, old German, modern
German, English, French, Latin, Hebrew, Arabic, ancient and modern
Greek, Rhaeto-Romanic, East European and more languages. Sometimes three
or four languages are used in a single book. Some books are more than
700 years old and contain glyphs that aren't covered by unicode yet.
Without unicode it would be virtually impossible to deal with it.

Metadata for these books come from old and proprietary databases and are
stored in a format that is optimized for magnetic tape. Most people will
never have heard about ISO-5426 or ANSEL encoding or about file formats
like MAB2, MARC or PICA. It took me quite some time to develop codecs to
encode and decode an old and partly undocumented variable multibyte
encodings that predates UTF-8 by about a decade. Of course every system
interprets the undocumented parts slightly different ...

Unicode and XML are bliss for metadata exchange and long term storage!




More information about the Python-list mailing list