[Tutor] regex woes in finding an ip and GET string

Peter Lavelle lists at solderintheveins.co.uk
Sun Jun 19 13:36:06 CEST 2011


Looking at the regex you have to match an IP address, I think you would 
need to put a range limit on each of the four octets you are searching 
for (as each one would be between 1 and 3 digits long.)

For example: r = 
re.match(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",line) has worked for me.

I am no expert on regex (it scares me!) I got the example above from:
http://www.regular-expressions.info/examples.html


Hope my semi-coherent ramblings have been of some help

Regards

Peter

On 19/06/11 12:25, Gerhardus Geldenhuis wrote:
> Hi
> I am trying to write a small program that will scan my access.conf 
> file and update iptables to block anyone looking for stuff that they 
> are not supposed to.
>
> The code:
> #!/usr/bin/python
> import sys
> import re
>
> def extractoffendingip(filename):
>   f = open(filename,'r')
>   filecontents = f.read()
> #193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET 
> /admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-" "Mozilla/4.0 
> (compatible; MSIE 6.0; Windows 98)"
>   tuples = re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', 
> filecontents)
>   iplist = []
>   for items in tuples:
>     (ip, getstring) = items
>     print ip,getstring
>     #print item
>     if ip not in iplist:
>       iplist.append(ip)
>   for item in iplist:
>     print item
>   #ipmatch = re.search(r'', filecontents)
>
> def main():
>   extractoffendingip('access_log.1')
>
> if __name__ == '__main__':
>   main()
>
> logfile=http://pastebin.com/F3RXDYBW
>
>
> I could probably have used ranges to be more correct about finding 
> ip's but I thought that apache should take care of that. I am assuming 
> a level or integrity in the log file with regards to data...
>
> The first problem I ran into was that I added a ^ to my search string:
> re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents)
>
> but that finds only two results a lot less than I am expecting. I am a 
> little bit confused, first I thought that it might be because the 
> string I am searching is now only one line because of the method of 
> loading and the ^ should only find one instance but instead it finds two?
>
> So removing the ^ works much better but now I get mostly correct 
> results but I also get a number of ip's with an empty get string, only 
> thought there should be only one in the log file. I would really 
> appreciate any pointers as to what is going on here.
>
> Regards
>
> -- 
> Gerhardus Geldenhuis
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor


-- 
LinkedIn Profile: http://linkedin.com/in/pmjlavelle
Twitter: http://twitter.com/pmjlavelle

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110619/c0178116/attachment-0001.html>


More information about the Tutor mailing list