string.replace() or re.subn()

Duncan Booth duncan at rcp.co.uk
Fri Sep 1 11:29:36 EDT 2000


bragib at my-deja.com wrote in <8oogfa$mov$1 at nnrp1.deja.com>:

>Hi:
>
>I have the following problem where I am replacing periods in certain
>names in a text file by underscores.  So for example if I have
>these names in the file [set.1, set.1.1] I would like to replace
>them everywhere by set_1 and set_1_1.  Now the catch is I can
>have a line like this
>
>line = '1.0, 2.0, set.1, set.1.1'
>
>for name in ['set.1', 'set.1.1']:
>    line = string.replace(line, name, string.replace(name,'.','_'))
>    print line
>
>1.0, 2.0, set_1, set_1.1
>1.0, 2.0, set_1, set_1.1
>
>which is not what I wanted.  I moved away from using re.sub because the
>names can potentially contain characters such as
>!@#$%^&*()_-+={}[]\|~`?/<>.,
>
I'm not convinced you have given enough information here for a definitive 
answer. If your names include set.1 and 1.set, and the input line contains 
the text set.1.set, which of the two dots would you like replaced?

If the answer is both then try:

for name in ['set.1', 'set.1.1']:
    pat = string.replace(re.escape(name), '\\.', '(\\.|_)')
    repl = string.replace(name, '.', '_')
    line = re.sub(pat, repl, line)
    print line

which should handle all your funny characters correctly by first escaping 
them, and handles the overlapping replacements by matching either . or _

Of course someone will point out that this will produce the wrong result 
for an input that is already '1.0, 2.0, set_1, set_1.1', as it will change 
the output when it shouldn't. If that worries you I suggest you search the 
string for all locations where you could replace a dot with an underscore, 
remember them, then go back and do all the replacements after you have 
finished all the searches.



More information about the Python-list mailing list