[Python-Dev] Re: pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching

Thu Aug 12 20:18:27 CEST 2004

Mike Coleman wrote:

> Re maintenance, yeah regexp is pretty terse and ugly.  Generally, though, I'd
> rather deal with a reasonably well-considered 80 char regexp than 100 lines of
> code that does the same thing.

well, the examples in your PEP can be written as:

    data = [line[:-1].split(":") for line in open(filename)]

and

    import ConfigParser

    c = ConfigParser.ConfigParser()
    c.read(filename)

    data = []
    for section in c.sections():
        data.append((section, c.items(section)))

both of which are shorter than your structparse examples.

and most of the one-liners in your pre-PEP can be handled with a
combination of "match" and "finditer".  here's a 16-line helper that
parses strings matching the "a(b)*c" pattern into a prefix/list/tail tuple.

    import re

    def parse(string, pat1, pat2):
        """Parse a string having the form pat1(pat2)*"""
        m = re.match(pat1, string)
        i = m.end()
        a = m.group(1)
        b = []
        for m in re.compile(pat2 + "|.").finditer(string, i):
            try:
                token = m.group(m.lastindex)
            except IndexError:
                break
            b.append(token)
            i = m.end()
        return a, b, string[i:]

>>> parse("hello 1 2 3 4 # 5", "(\w+)", "\s*(\d+)")
('hello', ['1', '2', '3', '4'], ' # 5')

tweak as necessary.

</F>