re sub help

Bengt Richter bokr at oz.net
Sat Nov 5 19:06:09 EST 2005


On 4 Nov 2005 22:49:03 -0800, s99999999s2003 at yahoo.com wrote:

>hi
>
>i have a string :
>a =
>"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
>
>inside the string, there are "\n". I don't want to substitute the '\n'
>in between
>the [startdelim] and [enddelim] to ''. I only want to get rid of the
>'\n' everywhere else.
>
>i have read the tutorial and came across negative/positive lookahead
>and i think it can solve the problem.but am confused on how to use it.
>anyone can give me some advice? or is there better way other than
>lookaheads ...thanks..
>

Sometimes splitting and processing the pieces selectively can be a solution, e.g.,
if delimiters are properly paired, splitting (with parens to keep matches) should
give you a repeating pattern modulo 4 of
     <"everywhere else" as you said><first delim><between><second delim> ...

 >>> a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
 >>> import re
 >>> splitter = re.compile(r'(?s)(\[startdelim\]|\[enddelim\])')
 >>> sp = splitter.split(a)
 >>> sp
 ['this\nis\na\nsentence', '[startdelim]', 'this\nis\nanother', '[enddelim]', 'this\nis\n']
 >>> ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)])
 'thisisasentence[startdelim]this\nis\nanother[enddelim]thisis'
 >>> print ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)])
 thisisasentence[startdelim]this
 is
 another[enddelim]thisis

I haven't checked for corner cases, but HTH
Maybe I'll try two pairs of delimiters:

 >>> a += "2222\n33\n4\n55555555[startdelim]6666\n77\n8888888[enddelim]9999\n00\n"
 >>> sp = splitter.split(a)
 >>> print ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)])
 thisisasentence[startdelim]this
 is
 another[enddelim]thisis222233455555555[startdelim]6666
 77
 8888888[enddelim]999900

which came from
 >>> sp
 ['this\nis\na\nsentence', '[startdelim]', 'this\nis\nanother', '[enddelim]', 'this\nis\n2222\n33
 \n4\n55555555', '[startdelim]', '6666\n77\n8888888', '[enddelim]', '9999\n00\n']

Which had the replacing when not i%4 was true

 >>> for i,s in enumerate(sp): print '%6s: %r'%(not i%4,s)
 ...
   True: 'this\nis\na\nsentence'
  False: '[startdelim]'
  False: 'this\nis\nanother'
  False: '[enddelim]'
   True: 'this\nis\n2222\n33\n4\n55555555'
  False: '[startdelim]'
  False: '6666\n77\n8888888'
  False: '[enddelim]'
   True: '9999\n00\n'

Regards,
Bengt Richter



More information about the Python-list mailing list