Re for Apache log file format

Neil Cerutti neilc at norwich.edu
Tue Oct 8 08:50:22 EDT 2013


On 2013-10-08, Sam Giraffe <sam at giraffetech.biz> wrote:
>
> Hi,
>
> I am trying to split up the re pattern for Apache log file format and seem
> to be having some trouble in getting Python to understand multi-line
> pattern:
>
> #!/usr/bin/python
>
> import re
>
> #this is a single line
> string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0"
> 302 276 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"'
>
> #trying to break up the pattern match for easy to read code
> pattern = re.compile(r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+'
>                      r'(?P<ident>\-)\s+'
>                      r'(?P<username>\-)\s+'
>                      r'(?P<TZ>\[(.*?)\])\s+'
>                      r'(?P<url>\"(.*?)\")\s+'
>                      r'(?P<httpcode>\d{3})\s+'
>                      r'(?P<size>\d+)\s+'
>                      r'(?P<referrer>\"\")\s+'
>                      r'(?P<agent>\((.*?)\))')

I recommend using the re.VERBOSE flag when explicating an re.
It'll make your life incrementally easier.

pattern = re.compile(
     r"""(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+
         (?P<ident>\-)\s+
         (?P<username>\-)\s+
         (?P<TZ>\[(.*?)\])\s+    # You can even insert comments.
         (?P<url>\"(.*?)\")\s+
         (?P<httpcode>\d{3})\s+
         (?P<size>\d+)\s+
         (?P<referrer>\"\")\s+
         (?P<agent>\((.*?)\))""", re.VERBOSE)

-- 
Neil Cerutti



More information about the Python-list mailing list