An assessment of the Unicode standard
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Aug 30 01:46:12 EDT 2009
On Sun, 30 Aug 2009 03:07:17 +0000, Neil Hodgson wrote:
> Not sure if you are referring to the ☃ snowman character or Arctic
> region languages like Canadian Aboriginal syllabic writing like ᐲᐦᒑᔨᕽ
> which were added to Unicode 8 years after the initial version. I'd guess
> that was added from political rather than marketing motives. ☃ was
> required since it was present in Japanese character sets.
If I recall correctly, the snowman was specifically added at the request
of Japanese television producers, because it is a standard glyph used for
representing snow when showing the weather on TV.
Unicode's stated aim is to have a single universal standard for all
characters needed for communication. From the Unicode Consortium:
[quote]
What is Unicode?
Unicode provides a unique number for every character, no matter what the
platform, no matter what the program, no matter what the language.
...
Even for a single language like English no single encoding was adequate
for all the letters, punctuation, and technical symbols in common use.
These encoding systems also conflict with one another. That is, two
encodings can use the same number for two different characters, or use
different numbers for the same character. Any given computer (especially
servers) needs to support many different encodings; yet whenever data is
passed between different encodings or platforms, that data always runs
the risk of corruption.
Unicode is changing all that!
Unicode provides a unique number for every character, no matter what the
platform, no matter what the program, no matter what the language.
[end quote]
And from the FAQs:
[quote]
Unicode covers all the characters for all the writing systems of the
world, modern and ancient. It also includes technical symbols,
punctuations, and many other characters used in writing text.
[end quote]
It's not just about supporting languages used by foreigners too stupid to
speak English (sarcasm!). It's about supporting business users who want a
standard way of referring to dingbats and pictographs, historians who
need to deal with ancient writings and obsolete characters, scientists
and mathematicians who want to use mathematical symbols, editors and book
publishers who want to use their own typographic symbols, including
Braille, musical symbols, and even TV producers who want to include
snowmen on their weather charts.
The Unicode system replaces dozens of incompatible, clashing systems with
a single universal, extensible system. Why would anyone want to go back
to the Bad Old Days where you couldn't transfer data from one OS to
another, or even from one application to another, without quote marks
turning into mathematical symbols or boxes?
--
Steven
More information about the Python-list
mailing list