[Tutor] Fw: File handling: open a file at specified byte?

Mon Feb 20 09:01:31 CET 2006

Forwarding for list visibility

----- Original Message ----- 
From: "Brian Gustin" <brian at daviesinc.com>
To: "Alan Gauld" <alan.gauld at freenet.co.uk>
Sent: Monday, February 20, 2006 2:23 AM
Subject: Re: [Tutor] File handling: open a file at specified byte?


> 
> > look at the file tell() and seek() methods.
> >
> > They will tell you the current location and allow you to move to a
> > specific location.
> 
> 
> OK..I did try using seek and tell, and couldnt get working code to do 
> what I needed it to, however, it did lead me to discover the fileinput 
> module, so.. Ive tested it on my test file, and it works quite well, I'd 
> like to see if you can offer any better suggestions - keeping in mind a 
> log file can grow to as large as 3 GB, so memory management will bee 
> important, as will execution time (I will need this parser to execute on 
> a file as large as 3 - 4 GB in under 10 minutes time, ideally shooting 
> for less than 1 minute)
> 
> Code follows:
> ##START CODE ##########
> #!/usr/bin/python
>  #for testing of tux parser
> # read "live" log file and parse it into separate domain files
> import string
> import re
> import fileinput
> 
> myfiles={}
> line=1
> last=0
> try:
>     bkmk = open('bookmark','r')
>     last = bkmk.readline()
>     bkmk.close()
> except:
>     pass
> for outputdata in fileinput.input('./testfile.tuxlog'):
>     #sourcelist.sort()
>     #print outputdata
>     if fileinput.filelineno() < int(last):
>         continue
>     else:
>         info = re.search('(?<=GET )([a-zA-Z0-9\-\.]+)', outputdata)
>         try:
>             namecheck = info.group(0)
>         except AttributeError:
>             continue
>         try:
>             namecheck=namecheck.replace('www.','')
>             check = re.search('(\.[a-z]+$)',namecheck)
>             if check == None:
>                 domain = 'Errors'
>             else:
>                 res = re.search('(\ (301|404|403|302)\ 0)',outputdata)
>                 if res == None:
>                     domain = namecheck
>                 else:
>                     domain = '404_301errors'
>             outputdata=outputdata.replace(' '+domain+'/',' /')
>             if myfiles.has_key(domain):
>                 domhandle = myfiles.get(domain)
>             else:
> 
> domhandle=open('/var/log/tuxp/'+domain+'-access.log.1','w+')
>                 myfiles[domain] = domhandle
> 
> 
>             domhandle.write(outputdata)
>         except:
>             continue
>     bookmark = fileinput.lineno() #get the last line no handled. could 
> this instead be run just before closing the handle?
> rel = open('./bookmark','w')
> rel.write(str(bookmark))
> rel.close()
> #print "BOOKMARK: %s"%bookmark,
> #print domain+' - ',
> #print namecheck,
> #    line +=1
>         #print str(line)+"\n"
>     #print fileinput.filelineno()
> fileinput.close()
> 
> 
> ############ END CODE############
>