String multi-replace

MRAB python at mrabarnett.plus.com
Thu Nov 18 06:41:40 EST 2010


On 18/11/2010 04:30, Benjamin Kaplan wrote:
> On Wed, Nov 17, 2010 at 11:21 PM, Sorin Schwimmer<sxn02 at yahoo.com>  wrote:
>> Hi All,
>>
>> I have to eliminate diacritics in a fairly large file.
>>
>> Inspired by http://code.activestate.com/recipes/81330/, I came up with the following code:
>>
>> #! /usr/bin/env python
>>
>> import re
>>
>> nodia={chr(196)+chr(130):'A', # mamaliga
>>        chr(195)+chr(130):'A', # A^
>>        chr(195)+chr(142):'I', # I^
>>        chr(195)+chr(150):'O', # OE
>>        chr(195)+chr(156):'U', # UE
>>        chr(195)+chr(139):'A', # AE
>>        chr(197)+chr(158):'S',
>>        chr(197)+chr(162):'T',
>>        chr(196)+chr(131):'a', # mamaliga
>>        chr(195)+chr(162):'a', # a^
>>        chr(195)+chr(174):'i', # i^
>>        chr(195)+chr(182):'o', # oe
>>        chr(195)+chr(188):'u', # ue
>>        chr(195)+chr(164):'a', # ae
>>        chr(197)+chr(159):'s',
>>        chr(197)+chr(163):'t'
>>       }
>> name="R\xc3\xa2\xc5\x9fca"
>>
>> regex = re.compile("(%s)" % "|".join(map(re.escape, nodia.keys())))
>> print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)
>>
>> But it won't work; I end up with:
>>
>> Traceback (most recent call last):
>>   File "multirep.py", line 25, in<module>
>>     print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)
>>   File "multirep.py", line 25, in<lambda>
>>     print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)
>> TypeError: 'type' object is not subscriptable
>>
>> What am I doing wrong?
>>
>> Thanks for your advice,
>> SxN
>>
>
> dict is a type, not a dict. Your dict is called nodia. I'm guess
> that's what you meant to use.
>
Could I also suggest that you use:

     mo.group()

instead of:

     mo.string[mo.start():mo.end()]



More information about the Python-list mailing list