regex plea for help

Skip Montanaro skip at pobox.com
Fri Jun 27 16:00:30 EDT 2003


    >> I'm trying to process through an apache log file and bust up the
    >> individual sections into a list for further processing. There is a
    >> regex I got from a php example that matches an entire line, but
    >> obviously, that only returns a single element list. 

You perhaps want something like this:

    #!/usr/bin/env python

    import re
    import sys

    logpat = re.compile(r"(?P<host>[^ ]+) "
                        r"(?P<dash>[^ ]+) "
                        r"(?P<user>[^ ]+) "
                        r"\[(?P<timestamp>[^]]+)\] "
                        r'"(?P<method>[^ ]+) '
                        r"(?P<path>[^ ]+) "
                        r'(?P<version>[^"]+)" '
                        r"(?P<response>[0-9]+) "
                        r"(?P<size>[0-9]+)$")

    for line in sys.stdin:
        mat = logpat.match(line.strip())
        if mat is not None:
            print mat.groups()

which when run against my laptop's access_log emits lines like this:

    ('127.0.0.1', '-', 'skip', '06/Jun/2003:11:41:44 -0500', 'GET', '/nagios/cgi-bin/status.cgi?hostgroup=all', 'HTTP/1.1', '200', '11778')
    ('127.0.0.1', '-', '-', '06/Jun/2003:11:41:44 -0500', 'GET', '/nagios/stylesheets/status.css', 'HTTP/1.1', '200', '7952')

I can never remember what the second field is.  It's always been a dash in
any logfiles I've ever seen.

Skip






More information about the Python-list mailing list