Changing filenames from Greeklish => Greek (subprocess complain)

Ned Batchelder ned at nedbatchelder.com
Mon Jun 10 16:28:36 EDT 2013


On Monday, June 10, 2013 3:48:08 PM UTC-4, jmfauth wrote:
> -----
> 
> 
> 
> A coding scheme works with three sets. A *unique* set
> of CHARACTERS, a *unique* set of CODE POINTS and a *unique*
> set of ENCODED CODE POINTS, unicode or not.
> 
> The relation between the set of characters and the set of the
> code points is a *human* table, created with a sheet of paper
> and a pencil, a deliberate choice of characters with integers
> as "labels".
> 
> The relation between the set of the code points and the
> set of encoded code points is a "mathematical" operation.
> 
> In the case of an "8bits" coding scheme, like iso-XXX,
> this operation is a no-op, the relation is an identity.
> Shortly: set of code points == set of encoded code points.
> 
> In the case of unicode, The Unicode consortium endorses
> three such mathematical operations called UTF-8, UTF-16 and
> UTF-32 where UTF means Unicode Transformation Format, a
> confusing wording meaning at the same time, the process
> and the result of the process. This Unicode Transformation does
> not produce bytes, it produces words/chunks/tokens of *bits* with
> lengths 8, 16, 32, called Unicode Transformation Units (from this
> the names UTF-8, -16, -32). At this level, only a structure has
> been defined (there is no computing). 

This is a really good description of the issues involved with character sets and encodings, thanks.

> Very important, an healthy
> coding scheme works conceptually only with this *unique" set
> of encoded code points, not with bytes, characters or code points.
> 

You don't explain why it is important to work with encoded code points.  What's wrong with working with code points?

> 
> The last step, the machine implementation: it is up to the
> processor, the compiler, the language to implement all these
> Unicode Transformation Units with of course their related
> specifities: char, w_char, int, long, endianess, rune (Go
> language), ...
> 
> Not too over-simplified or not too over-complicated and enough
> to understand one, if not THE, design mistake of the flexible
> string representation.
> 
> jmf

Again you've made the claim that the flexible string representation is a mistake.  But you haven't said WHY.  I can't tell if you are trolling us, or are deluded, or genuinely don't understand what you are talking about.

Some day you might explain yourself. I look forward to it.

--Ned.



More information about the Python-list mailing list