hard_decoding

Peter Maas peter at somewhere.com
Thu Feb 10 08:04:31 EST 2005


Tamas Hegedus schrieb:
> Do you have a convinient, easy way to remove special charachters from 
> u'strings'?
> 
> Replacing:
> ÀÁÂÃÄÅ     => A
> èéêë    => e
> etc.
> 'L0xe1szl0xf3' => Laszlo
> or something like that:
> 'L\xc3\xa1szl\xc3\xb3' => Laszlo

 >>> ord(u'ë')
235
 >>> ord(u'e')
101
 >>> cmap = {235:101}
 >>> u'hello'.translate(cmap)
u'hello'
 >>> u'hëllo'.translate(cmap)
u'hello'

The inconvenient part is to generate cmap. I suggest you write a
helper class genmap for this:

 >>> g = genmap()
 >>> g.add(u'ÀÁÂÃÄÅ', u'A')
 >>> g.add(u'èéêë', u'e')
 >>> 'László'.translate(g.cmap())
Laszlo

-- 
-------------------------------------------------------------------
Peter Maas,  M+R Infosysteme,  D-52070 Aachen,  Tel +49-241-93878-0
E-mail 'cGV0ZXIubWFhc0BtcGx1c3IuZGU=\n'.decode('base64')
-------------------------------------------------------------------



More information about the Python-list mailing list