utf - string translation

hg hg at nospam.com
Wed Nov 22 13:38:06 EST 2006


Hi,

I'm bringing over a thread that's going on on f.c.l.python.

The point was to get rid of french accents from words.

We noticed that len('à') != len('a') and I found the hack below to fix
the "problem" ... yet I do not understand - especially since 'à' is
included in the extended ASCII table, and thus can be stored in one byte.

Any clue ?

hg





# -*- coding: utf-8 -*-
import string

def convert(mot):
    print len(mot)
    print mot[0]
    print '%x' % ord(mot[1])
    table =
string.maketrans('àâäéèêëîïôöùüû','\x00a\x00a\x00a\x00e\x00e\x00e\x00e\x00i\x00i\x00o\x00o\x00u\x00u\x00u')

    return mot.translate(table).replace('\x00','')


c = 'àbôö a '
print convert(c)



More information about the Python-list mailing list