Glyphs and graphemes [was Re: Cult-like behaviour]

Terry Reedy tjreedy at udel.edu
Mon Jul 16 15:28:51 EDT 2018


On 7/16/2018 1:11 PM, Richard Damon wrote:

> Many consider that UTF-32 is a variable-width encoding because of the combining characters. It can take multiple ‘codepoints’ to define what should be a single ‘character’ for display.

I hope you realize that this is not the standard meaning of 
'variable-width encoding', which is 'variable number of bytes for a 
codepoint'.  UTF-16 and UTF-8 are variable width.  If one expands the 
definition enough, Ascii is 'variable width' because 'fi' is two bytes, 
or more realistically, because <= and >= are two bytes instead of one 
(as they can be in Unicode!).

If one is using a broader definition than usual, it is clearer to say so.

-- 
Terry Jan Reedy





More information about the Python-list mailing list