Regex error in python (weird?)

Aleksandar Alimpijevic aa44 at uow.edu.au
Tue Aug 29 17:31:26 EDT 2000


Hi, I have a weird problem when using my regular expression to parse a
lino from a WebServer log file. The program looks like it got stuck into
some weird loop somewhere. The problem seems to appear in only one case.

This is the format of the line in the file. The regular expression tends
to check if the line has exactly the same format. It works fine for
everything except
        201.120.68.38 - - [05/Jun/2000:16:30:29 +1000] "HEAD /index.html
HTTP/1.0" 304 -
If I change line to  (change only in HTTP/1.0 part)
        201.120.68.38 - - [05/Jun/2000:16:30:29 +1000] "HEAD /index.html
HaTTP/1.0" 304 -
the program seems to get lost on that line.

My regular expression is:
        CHKformat_rex = re.compile('(?P<IP>\d{1,3}(\.\d{1,3}){3,3}) \S+
\S+ '

'(?P<date>\[\d{2,2}/[a-zA-z]{3,3}/\d{4,4}(:\d{2,2}){3,3} [+-]\d{4,4}\])
'
                    '"(?P<request>GET|HEAD|POST) '
                    '(?P<req_fname>/(\S+/?)*) HTTP/\d\.\d{1,2}"
(?P<reply_code>\d{3,3}) (?P<reply_size>\d+|-)')

and the actual call that should check if the line matches it is:

            CHKformat_rex.match(tmp).group('IP', 'request','req_fname',
'reply_code', 'reply_size');

it is in try{}except{} block

I've been trying to figure out if there is anything wrong with my
expression for few days, but I can't see anything. So if anyone can
help...
Thanks
Aleks




More information about the Python-list mailing list