Translation table to map Latin-1 to ASCII?

Rene Pijlman reageer.in at de.nieuwsgroep
Sun Jan 26 18:48:49 EST 2003


John Machin:
>Rene Pijlman:
>> accentstable = string.join(map(chr, range(192)), "") +
>> "AAAAAAACEEEEIIIIDNOOOOOxOUUUUYpBaaaaaaaceeeeiiiidnooooo/ouuuuypy"
>> 
>
>You have mapped 0xF0 (small eth) to o, instead of d. Is this
>deliberate?

No, just plain old ignorance.

>You have mapped 0xDE (capital thorn) to P and 0xFE (small thorn) to p.
>On a sound-alike basis rather than a look-alike basis, it may be more
>appropriate to map these to T & t (or TH and th).
>
>Likewise 0xDF (small sharp s) may be better mapped to s (or ss) than
>to B.

Wow! You people sure are paying attention :-)

>You may wish to contemplate the notion that a single mapping of one
>byte to one byte may not be the best way to go, but of course this
>depends on what you are trying to achieve.

This is just for search log analysis. It's not really a problem
when some Icelandic searches are 1% skewed statistically :-)

But here is the new and improved Latin2AsciiLossyMapping (t)
version 1.0 rc1:

accentstable = string.join(map(chr, range(192)), "") +
"AAAAAAACEEEEIIIIDNOOOOOxOUUUUYTsaaaaaaaceeeeiiiidnooooo/ouuuuyty"

Thanks everyone for your help.

-- 
René Pijlman

Wat wil jij leren?  http://www.leren.nl




More information about the Python-list mailing list