Translating a Perl regex into Python

Fredrik Lundh fredrik at pythonware.com
Sat Sep 8 09:12:29 EDT 2001


Stephan Tolksdorf wrote:
> # My attempt of a translation
> rex =
> re.compile("/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|(\"(\\.|[^\"\\])*\"|\'(\\.
> |[^\'\\])*\'|.[^/\"\'\\]*)", re.M | re.S)

use a "raw" string (just add an "r" between compile( and the first quote)

> content = rex.sub('\2', content)

this won't work: the pattern doesn't always set group 2, so
you'll probably get an "empty group" error (or "group did not
contribute to match", depending on Python version)

(obviously, Perl treats "no match" as an empty string in this
context)

try this instead:

    content = rex.sub(lambda m: m.group(2) or "", content)

an alternative (and probably must faster) solution is to use "(?:x)" for
all groups but the second, and use "findall" to find all code segments:

    rex = re.compile(r'''(?ms)/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//[^\n]*|("(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|.[^/"'\\]*)''')
    content = string.join(rex.findall(content), "")

hope this helps!

</F>

<!-- (the eff-bot guide to) the python standard library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->





More information about the Python-list mailing list