Translating a Perl regex into Python

Sat Sep 8 09:12:29 EDT 2001

Stephan Tolksdorf wrote:
> # My attempt of a translation
> rex =
> re.compile("/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|(\"(\\.|[^\"\\])*\"|\'(\\.
> |[^\'\\])*\'|.[^/\"\'\\]*)", re.M | re.S)

use a "raw" string (just add an "r" between compile( and the first quote)

> content = rex.sub('\2', content)

this won't work: the pattern doesn't always set group 2, so
you'll probably get an "empty group" error (or "group did not
contribute to match", depending on Python version)

(obviously, Perl treats "no match" as an empty string in this
context)

try this instead:

    content = rex.sub(lambda m: m.group(2) or "", content)

an alternative (and probably must faster) solution is to use "(?:x)" for
all groups but the second, and use "findall" to find all code segments:

    rex = re.compile(r'''(?ms)/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//[^\n]*|("(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|.[^/"'\\]*)''')
    content = string.join(rex.findall(content), "")

hope this helps!

</F>

<!-- (the eff-bot guide to) the python standard library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->