Normalize a polish L

Mike Orr sluggoster at gmail.com
Mon Oct 22 18:50:20 EDT 2007


On Oct 16, 9:51 am, Roberto Bonvallet <rbonv... at gmail.com> wrote:
> For example, in Spanish, "ü" (u with umlaut) should be represented as
> "u", but in German, it should be represented as "ue".
>
>     pingüino -> pinguino
>     Frühstück -> Fruehstueck
>
> I'd like that web applications (e.g. blogs) took into account these
> conventions when creating URLs from the title of an article.

Well, that gets into official vs unofficial conversions.  Does the
Spanish Academy really say 'ü' should be converted to 'u'?  In
German,'ü' -> 'ue' is an official standard used by Germans themselves.
In contrast, I've heard that Swedish unlike German prefers 'o' rather
than 'oe' for 'ö', and Norwegian prefers 'o' for 'ö', even though
they're all etymologically the same letter as the German 'ö'.  Russian
has some four common ways to romanize/ASCII'ify their alphabet (sylniy
or sylnyj or silnii?  schi or shchi?  byt' or bit' -- the latter
creates a false homograph with bit'.  s"yest'?)  Yes, on my US-ASCII
keyboard I simply drop the accents unless I know there's a standard
conversion (German 'ß' to 'ss').  But whether that should be hardcoded
into a blog URL library is different matter, and if it is there should
probably be plugin tables for different preferred standards.

--Mike




More information about the Python-list mailing list