Normalize a polish L

Thorsten Kampe thorsten at thorstenkampe.de
Mon Oct 15 14:20:30 EDT 2007


* Peter Bengtsson (Mon, 15 Oct 2007 16:33:26 -0000)
> In UTF8, \u0141 is a capital L with a little dash through it as can be
> seen in this image:
> http://static.peterbe.com/lukasz.png
> I tried this:
> >>> import unicodedata
> >>> unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
> ''
> 
> I was hoping it would convert it it 'L' because that's what it
> visually looks like. And I've seen it becoming a normal ascii L before
> in other programs such as Thunderbird.

The 'L' is actually pronounced like the English "w"...
 
> I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
> none of them helped.

>>> unicodedata.decomposition(u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}')
'0043 0327'

>>> unicodedata.normalize('NFKD', u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}').encode('ascii','ignore')
'C'

>>> unicodedata.decomposition(u'\N{LATIN CAPITAL LETTER L WITH STROKE}')
''



More information about the Python-list mailing list