Re: 'Straße' ('Strasse') and Python 2

Chris Angelico rosuav at gmail.com
Wed Jan 15 20:26:20 EST 2014


On Thu, Jan 16, 2014 at 11:43 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> Worse, linguists sometimes disagree as to what counts as a grapheme. For
> instance, some authorities consider the English "sh" to be a separate
> grapheme. As a native English speaker, I'm not sure about that. Certainly
> it isn't a separate letter of the alphabet, but on the other hand I can't
> think of any words containing "sh" that should be considered as two
> graphemes "s" followed by "h". Wait, no, that's not true... compound
> words such as "glasshouse" or "disheartened" are counter examples.

Digression: When I was taught basic English during my school days, my
mum used Spalding's book and the 70 phonograms. 25 of them are single
letters (Q is not a phonogram - QU is), and the others are mostly
pairs (there are a handful of 3- and 4-letter phonograms). Not every
instance of "s" followed by "h" is the phonogram "sh" - only the times
when it makes the single sound "sh" (which it doesn't in "glasshouse"
or "disheartened").

Thing is, you can't define spelling and pronunciation in terms of each
other, because you'll always be bitten by corner cases. Everyone knows
how "Thames" is pronounced... right? Well, no. There are (at least)
two rivers of that name, the famous one in London p1[ and another one
further north [2]. The obscure one is pronounced the way the word
looks, the famous one isn't. And don't even get started on English
family names... Majorinbanks, Meux and Cholmodeley, as lampshaded [3]
in this song [4]! Even without names, though, there are the tricky
cases and the ones where different localities pronounce the same word
very differently; Unicode shouldn't have to deal with that by changing
whether something's a single character or two. Considering that
phonograms aren't even ligatures (though there is overlap, eg "Th"),
it's much cleaner to leave them as multiple characters.

ChrisA

[1] https://en.wikipedia.org/wiki/River_Thames
[2] Though it's better known as the Isis. https://en.wikipedia.org/wiki/The_Isis
[3] http://tvtropes.org/pmwiki/pmwiki.php/Main/LampshadeHanging
[4] http://www.stagebeauty.net/plays/th-arca2.html - "Mosh-banks",
"Mow", and "Chumley" are the pronunciations used



More information about the Python-list mailing list