Ah Python, you have spoiled me for all other languages

Chris Angelico rosuav at gmail.com
Sun Jun 7 11:12:33 EDT 2015


On Mon, Jun 8, 2015 at 12:58 AM,  <random832 at fastmail.us> wrote:
> On Sun, Jun 7, 2015, at 07:42, Steven D'Aprano wrote:
>> The question of graphemes (what "ordinary people" consider letters and
>> characters, e.g. "ch" is two letters to an English speaker but one letter
>> to a Czech speaker) should be left to libraries.
>
> Do Czech speakers expect to be able to select and delete it as a single
> unit and never have the cursor in the middle of it? If not, then this is
> not really fundamentally the same thing as what we have with combining
> characters or certain sequences of Indic letters.

Not sure about Indic letters, but with combining characters, you *should*
select and delete a single unit containing a base character and all its
combining characters, and you should never have the cursor in the middle of
it. (Not everything gets this right; SciTE, though otherwise a decent text
editor, does allow the cursor to go inside combining characters.) But I
suspect that with the Czech "ch", like the Dutch "ij" and the German "oe"
(when it's not ö), should be treated as two separate characters.

Digression: English has seventy phonograms, which are what words are really
built out of. Digraphs like "th" and "sh", represent single sounds despite
being spelled with multiple letters - but nobody ever expects them to be
treated as single character units just because other languages spell them
"þ" or "ş". The alphabet of English includes "q", which is not a phonogram
on its own ("qu" is), and doesn't include all the digraphs, and any
character-based representation of English should correspondingly work with
letters, not phonograms.

I don't know Czech enough to be able to say whether "ch" is more like a
single letter or a phonogram, but even if it basically functions as a
letter, I suspect that treating it as two characters will be no surprise to
most people.

ChrisA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20150608/a24cb599/attachment.html>


More information about the Python-list mailing list