F: How can I make re.sub() replace patterns across newlines

Josiah Carlson jcarlson at nospam.uci.edu
Sun Feb 1 20:27:13 EST 2004


Viktor Rosenfeld wrote:

> Hi,
> 
> I want to strip a JAVA file of /* */ like comments.  Unfortunately, the
> simple regexp "\/\*.*\*\/" only works on comments, that are on one line. 
> Is there a simple way to remove comments that go across several lines with
> python regexp's?  I tried re.M to no avail.
> 
> Thanks,
> Viktor

Viktor,

Supply the DOTALL flag during the regular expression compile as 
described here: http://www.python.org/doc/current/lib/re-syntax.html

You will also want to make the regular expression non-greedy...the 
reasons are quite evident.

 >>> import re
 >>> import pprint
 >>>
 >>> st = """
... /* this is a
... multi-line comment */
...
... /* this is a single-line comment */
...
... /* this /* has multiple
... starts */
... """
#non-greedy matching
 >>> NonGreedy = re.compile("\/\*.*?\*\/", re.DOTALL)
 >>>
 >>> pprint.pprint(NonGreedy.findall(st))
['/* this is a\nmulti-line comment */',
  '/* this is a single-line comment */',
  '/* this /* has multiple\nstarts */']

#greedy matching
 >>> Greedy = re.compile("\/\*.*\*\/", re.DOTALL)
 >>> pprint.pprint(Greedy.findall(st))
['/* this is a\nmulti-line comment */\n\n/* this is a single-line 
comment */\n\n/* this /* has multiple\nstarts */']

  - Josiah



More information about the Python-list mailing list