Regular expression that skips single line comments?

Tim Chase python.list at tim.thechases.com
Mon Jan 19 11:40:26 EST 2009


> I am trying to parse a set of files that have a simple syntax using
> RE. I'm interested in counting '$' expansions in the files, with one
> minor consideration. A line becomes a comment if the first non-white
> space character is a semicolon.
> 
> e.g.  tests 1 and 2 should be ignored
> 
> sInput = """
> ; $1 test1
>     ; test2 $2
>     test3 ; $3 $3 $3
> test4
> $5 test5
>    $6
>   test7 $7 test7
> """
> 
> Required output:    ['$3', '$3', '$3', '$5', '$6', '$7']

We're interested in two things:  comments and "dollar-something"s

  >>> import re
  >>> r_comment = re.compile(r'\s*;')
  >>> r_dollar = re.compile(r'\$\d+')

Then remove comment lines and find the matching '$' expansions:

  >>> [r_dollar.findall(line) for line in sInput.splitlines() if 
not r_comment.match(line)]
[[], ['$3', '$3', '$3'], [], ['$5'], ['$6'], ['$7']]

Finally, roll each line's results into a single list by slightly 
abusing sum()

  >>> sum((r_dollar.findall(line) for line in sInput.splitlines() 
if not r_comment.match(line)), [])
['$3', '$3', '$3', '$5', '$6', '$7']

Adjust the r_dollar if your variable pattern differs (such as 
reverting to your previous r'\$.' pattern if you prefer, or using 
r'\$\w+' for multi-character variables).

-tkc








More information about the Python-list mailing list