ascii to latin1
Luis P. Mendes
luis_lupe2XXX at netvisaoXXX.pt
Tue May 9 07:56:10 EDT 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Richie Hindle escreveu:
> [Serge]
>> def search_key(s):
>> de_str = unicodedata.normalize("NFD", s)
>> return ''.join(cp for cp in de_str if not
>> unicodedata.category(cp).startswith('M'))
>
> Lovely bit of code - thanks for posting it!
>
> You might want to use "NFKD" to normalize things like LATIN SMALL
> LIGATURE FI and subscript/superscript characters as well as diacritics.
>
Thank you very much for your info. It's a very good aproach.
When I used the "NFD" option, I came across many errors on these and
possibly other codes: \xba, \xc9, \xcd.
I tried to use "NFKD" instead, and the number of errors was only about
half a dozen, for a universe of 600000+ names, on code \xbf.
It looks like I have to do a search and substitute using regular
expressions for these cases. Or is there a better way to do it?
Luis P. Mendes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEYINaHn4UHCY8rB8RAqLKAJ0cN7yRlzJSpmH7jlrWoyhUH1990wCgkxCW
9d7f/FyHXoSfRUrbES0XKvU=
=eAuO
-----END PGP SIGNATURE-----
More information about the Python-list
mailing list