unicode by default

harrismh777 harrismh777 at charter.net
Fri May 13 15:53:50 EDT 2011


jmfauth wrote:
>> to worry about encodings are when you're encoding unicode characters
>> >  to byte strings, or decoding bytes to unicode characters
>
> A small but important correction/clarification:
>
> In Unicode, "unicode" does not encode a*character*. It
> encodes a*code point*, a number, the integer associated
> to the character.
>

That is a huge code-point... pun intended.

... and there is another point that I continue to be somewhat puzzled 
about, and that is the issue of fonts.

    On of my hobbies at the moment is ancient Greek (biblical studies, 
Septuaginta LXX, and Greek New Testament).  I have these texts on my 
computer in a folder in several formats... pdf, unicode 'plaintext', 
osis.xml, and XML.

    These texts may be found at http://sblgnt.com

    I am interested for the moment only in the 'plaintext' stream, 
because it is unicode.  ( first, in unicode, according to all the doc 
there is no such thing as 'plaintext,' so keep that in mind).

    When I open the text stream in one of my unicode editors I can see 
'most' of the characters in a rudimentary Greek font with accents; 
however, I also see many tiny square blocks indicating (I think) that 
the code points do *not* have a corresponding character in my unicode 
font for that Greek symbol (whatever it is supposed to be).

    The point, or question is, how does one go about making sure that 
there is a corresponding font glyph to match a specific unicode code 
point for display in a particular terminal (editor, browser, whatever) ?

    The unicode consortium is very careful to make sure that thousands 
of symbols have a unique code point (that's great !) but how do these 
thousands of symbols actually get displayed if there is no font 
consortium?   Are there collections of 'standard' fonts for unicode that 
I am not aware?  Is there a unix linux package that can be installed 
that drops at least 'one' default standard font that will be able to 
render all or 'most' (whatever I mean by that) code points in unicode? 
  Is this a Python issue at all?


kind regards,
m harris







More information about the Python-list mailing list