replace c-style comments with newlines (regexp)

Peter Otten __peter__ at web.de
Fri Dec 21 04:06:01 EST 2007


Steven D'Aprano wrote:

> On Fri, 21 Dec 2007 00:00:47 +0000, lex __ wrote:
> 
>> I'm tryin to use regexp to replace multi-line c-style comments  (like /*
>>  this /n */ ) with /n (newlines). I tried someting like  
>> re.sub('/\*(.*)/\*'  , '/n'   , file) but it doesn't work for multiple
>> lines.
 
> Regexes won't cross line boundaries unless you make them multiline with 
> re.MULTILINE.

re.MULTILINE affects the behaviour of ^ and $, the relevant flag is re.DOTALL:

> Also, I'm no expert on regexes, but it looks to me that your regex is 
> greedy. I think you need the non-greedy version, which by memory (and 

>>> re.compile("/\*(.*?)\*/", re.DOTALL).findall("/*a*/ /*b\nb*/ /*c/*c*/")
['a', 'b\nb', 'c/*c']

>>> def replace(match):
...     return "\n" * match.group(1).count("\n")
... 
>>> re.compile(r"(/\*.*?\*/)", re.DOTALL).sub(replace, "A /*a*/ BB /*b\nb*/ CCC /*c/*c*/")
'A  BB \n CCC '

> Have you considered what happens when your C code includes a string 
> literal containing '/*'?

Indeed.

Peter



More information about the Python-list mailing list