Newbie question about text encoding

Terry Reedy tjreedy at udel.edu
Thu Feb 26 12:02:25 EST 2015


On 2/26/2015 8:24 AM, Chris Angelico wrote:
> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rustompmody at gmail.com> wrote:
>> Wrote something up on why we should stop using ASCII:
>> http://blog.languager.org/2015/02/universal-unicode.html

I think that the main point of the post, that many Unicode chars are 
truly planetary rather than just national/regional, is excellent.

>  From that post:
>
> """
> 5.1 Gibberish
>
> When going from the original 2-byte unicode (around version 3?) to the
> one having supplemental planes, the unicode consortium added blocks
> such as
>
> * Egyptian hieroglyphs
> * Cuneiform
> * Shavian
> * Deseret
> * Mahjong
> * Klingon
>
> To me (a layman) it looks unprofessional – as though they are playing
> games – that billions of computing devices, each having billions of
> storage words should have their storage wasted on blocks such as
> these.
> """
>
> The shift from Unicode as a 16-bit code to having multiple planes came
> in with Unicode 2.0, but the various blocks were assigned separately:
> * Egyptian hieroglyphs: Unicode 5.2
> * Cuneiform: Unicode 5.0
> * Shavian: Unicode 4.0
> * Deseret: Unicode 3.1
> * Mahjong Tiles: Unicode 5.1
> * Klingon: Not part of any current standard

You should add emoticons, but not call them or the above 'gibberish'.
I think that this part of your post is more 'unprofessional' than the 
character blocks.  It is very jarring and seems contrary to your main point.

> However, I don't think historians will appreciate you calling all of
> these "gibberish". To adequately describe and discuss old texts
> without these Unicode blocks, we'd have to either do everything with
> images, or craft some kind of reversible transliteration system and
> have dedicated software to render the texts on screen. Instead, what
> we have is a well-known and standardized system for transliterating
> all of these into numbers (code points), and rendering them becomes a
> simple matter of installing an appropriate font.
>
> Also, how does assigning meanings to codepoints "waste storage"? As
> soon as Unicode 2.0 hit and 16-bit code units stopped being
> sufficient, everyone needed to allocate storage - either 32 bits per
> character, or some other system - and the fact that some codepoints
> were unassigned had absolutely no impact on that. This is decidedly
> NOT unprofessional, and it's not wasteful either.

I agree.

-- 
Terry Jan Reedy





More information about the Python-list mailing list