[Tutor] regex woes in finding an ip and GET string
Gerhardus Geldenhuis
gerhardus.geldenhuis at gmail.com
Sun Jun 19 13:25:07 CEST 2011
Hi
I am trying to write a small program that will scan my access.conf file and
update iptables to block anyone looking for stuff that they are not supposed
to.
The code:
#!/usr/bin/python
import sys
import re
def extractoffendingip(filename):
f = open(filename,'r')
filecontents = f.read()
#193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET
/admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-" "Mozilla/4.0 (compatible;
MSIE 6.0; Windows 98)"
tuples = re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents)
iplist = []
for items in tuples:
(ip, getstring) = items
print ip,getstring
#print item
if ip not in iplist:
iplist.append(ip)
for item in iplist:
print item
#ipmatch = re.search(r'', filecontents)
def main():
extractoffendingip('access_log.1')
if __name__ == '__main__':
main()
logfile=http://pastebin.com/F3RXDYBW
I could probably have used ranges to be more correct about finding ip's but
I thought that apache should take care of that. I am assuming a level or
integrity in the log file with regards to data...
The first problem I ran into was that I added a ^ to my search string:
re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents)
but that finds only two results a lot less than I am expecting. I am a
little bit confused, first I thought that it might be because the string I
am searching is now only one line because of the method of loading and the ^
should only find one instance but instead it finds two?
So removing the ^ works much better but now I get mostly correct results but
I also get a number of ip's with an empty get string, only thought there
should be only one in the log file. I would really appreciate any pointers
as to what is going on here.
Regards
--
Gerhardus Geldenhuis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110619/01319759/attachment.html>
More information about the Tutor
mailing list