UTF-8 usage in Python 2.0

François Granger francois.granger at free.fr
Fri Oct 27 17:52:25 EDT 2000


I am participating in the translation of Python docs to french. I works
on Mac (at home). Since this doc is to be delivered in iso-8859-1,
grabbing from other scripts, I came with a simple script wich translate
my Mac charset to 8859.

def macTo88591(s):
    t=""
    for c in s:
        if entitydefs.has_key(c):
            t=t + entitydefs[c]
        else:
            t=t+c
    return t
    
The entitydefs is a hacked version of """HTML character entity
references.""" with Mac char replacing html entities as keys.

On the professional side, I receive html files translated to french.
They are coded in html entity. I need to translate them to UTF-8. I
currently use Tidy to do this, but I need to do some manual
modifications after it.

I looked throught the new features of Python 2 but I did not found an
easy way to do something similar to what I did with this 8859
modification.

Any idea or bigginning of solution ?

-- 
"La connaissance est le chemin de la tolérance, c'est valable pour 
tous, en toutes saisons."
- Raymond Page



More information about the Python-list mailing list