Regular expression that skips single line comments?
MRAB
google at mrabarnett.plus.com
Mon Jan 19 11:29:44 EST 2009
martinjamesevans at gmail.com wrote:
> I am trying to parse a set of files that have a simple syntax using
> RE. I'm interested in counting '$' expansions in the files, with one
> minor consideration. A line becomes a comment if the first non-white
> space character is a semicolon.
>
> e.g. tests 1 and 2 should be ignored
>
> sInput = """
> ; $1 test1
> ; test2 $2
> test3 ; $3 $3 $3
> test4
> $5 test5
> $6
> test7 $7 test7
> """
>
> Required output: ['$3', '$3', '$3', '$5', '$6', '$7']
>
>
> The following RE works fine but does not deal with the commented
> lines:
>
> re.findall(r"(\$.)", sInput, re.I)
>
> e.g. ['$1', '$2', '$3', '$3', '$3', '$5', '$6', '$7']
>
>
> My attempts at trying to use (?!;) type expressions keep failing.
>
> I'm not convinced this is suitable for a single expression, so I have
> also attempted to first find-replace any commented lines out without
> much luck.
>
> e.g. re.sub(r"^[\t ]*?;.*?$", r"", sInput, re.I+re.M)
>
>
> Any suggestions would be appreciated. Thanks
>
You could use:
>>> re.findall(r"^\s*;.*|(\$.)", sInput, re.M)
['', '', '$3', '$3', '$3', '$5', '$6', '$7']
and then ignore the empty strings.
More information about the Python-list
mailing list