Changing filenames from Greeklish => Greek (subprocess complain)
Ned Batchelder
ned at nedbatchelder.com
Mon Jun 10 16:28:36 EDT 2013
On Monday, June 10, 2013 3:48:08 PM UTC-4, jmfauth wrote:
> -----
>
>
>
> A coding scheme works with three sets. A *unique* set
> of CHARACTERS, a *unique* set of CODE POINTS and a *unique*
> set of ENCODED CODE POINTS, unicode or not.
>
> The relation between the set of characters and the set of the
> code points is a *human* table, created with a sheet of paper
> and a pencil, a deliberate choice of characters with integers
> as "labels".
>
> The relation between the set of the code points and the
> set of encoded code points is a "mathematical" operation.
>
> In the case of an "8bits" coding scheme, like iso-XXX,
> this operation is a no-op, the relation is an identity.
> Shortly: set of code points == set of encoded code points.
>
> In the case of unicode, The Unicode consortium endorses
> three such mathematical operations called UTF-8, UTF-16 and
> UTF-32 where UTF means Unicode Transformation Format, a
> confusing wording meaning at the same time, the process
> and the result of the process. This Unicode Transformation does
> not produce bytes, it produces words/chunks/tokens of *bits* with
> lengths 8, 16, 32, called Unicode Transformation Units (from this
> the names UTF-8, -16, -32). At this level, only a structure has
> been defined (there is no computing).
This is a really good description of the issues involved with character sets and encodings, thanks.
> Very important, an healthy
> coding scheme works conceptually only with this *unique" set
> of encoded code points, not with bytes, characters or code points.
>
You don't explain why it is important to work with encoded code points. What's wrong with working with code points?
>
> The last step, the machine implementation: it is up to the
> processor, the compiler, the language to implement all these
> Unicode Transformation Units with of course their related
> specifities: char, w_char, int, long, endianess, rune (Go
> language), ...
>
> Not too over-simplified or not too over-complicated and enough
> to understand one, if not THE, design mistake of the flexible
> string representation.
>
> jmf
Again you've made the claim that the flexible string representation is a mistake. But you haven't said WHY. I can't tell if you are trolling us, or are deluded, or genuinely don't understand what you are talking about.
Some day you might explain yourself. I look forward to it.
--Ned.
More information about the Python-list
mailing list