Normalize a polish L

John Machin sjmachin at lexicon.net
Mon Oct 15 17:57:18 EDT 2007


On Oct 16, 2:33 am, Peter Bengtsson <pete... at gmail.com> wrote:
> In UTF8, \u0141 is a capital L with a little dash through it as can be
> seen in this image:http://static.peterbe.com/lukasz.png
>
> I tried this:>>> import unicodedata
> >>> unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
>
> ''
>
> I was hoping it would convert it it 'L' because that's what it
> visually looks like. And I've seen it becoming a normal ascii L before
> in other programs such as Thunderbird.
>
> I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
> none of them helped.
>
> What am I doing wrong?

The character in question is NOT composed (in the way that Unicode
means) of an 'L' and a little slash; hence the concepts of
"normalization" and "decomposition" don't apply.

To "asciify" such text, you need to build a look-up table that suits
your purpose. unicodedata.decomposition() is (accidentally) useful in
providing *some* of the entries for such a table.





More information about the Python-list mailing list