Parse ASCII log ; sort and keep most recent entries

Larry Bates lbates at swamisoft.com
Wed Jun 16 19:59:19 EDT 2004


Here's a quick solution.

Larry Bates
Syscon, Inc.


def cmpfunc(x,y):
    xdate=x[0]
    xtime=x[1]
    ydate=y[0]
    ytime=y[1]
    if xdate == ydate:
        #
        # If the two dates are equal, I must check the times
        #
        if xtime > ytime: return 1
        elif xtime == ytime: return 0
        else: return -1
    elif xdate > ydate: return 1
    return -1

fp=file(yourlogfilepath, 'r')
lines=fp.readlines()
fp.close()
list=[]
months={'JAN': '01', 'FEB': '02', 'MAR': '03', 'APR': '04',
        'MAY': '05', 'JUN': '06', 'JUL': '07', 'AUG': '08',
        'SEP': '09', 'OCT': '10', 'NOV': '11', 'DEC': '12'}

logdict={}

for line in lines:
    if not line.strip(): break
    print line
    pid, name, date, time=[x.strip() for x in line.rstrip().split(' ')]
    #
    # Must zero pad time for proper comparison
    #
    stime=time.zfill(8)
    #
    # Must reformat the data as YYMMDD
    #
    sdate=date[-2:]+months[date[2:5]]+date[:2]
    list.append((sdate, stime, pid, name, date, time))

list.sort(cmpfunc)
list.reverse()

for sdate, stime, pid, name, date, time in list:
    if logdict.has_key(pid): continue
    logdict[pid]=(pid, name, date, time)

for key in logdict.keys():
    pid, name, date, time=logdict[key]
    print pid, name, date, time



"Nova's Taylor" <novastaylor at hotmail.com> wrote in message
news:fda4b581.0406161306.c5de18f at posting.google.com...
> Hi folks,
>
> I am a newbie to Python and am hoping that someone can get me started
> on a log parser that I am trying to write.
>
> The log is an ASCII file that contains a process identifier (PID),
> username, date, and time field like this:
>
> 1234 williamstim 01AUG03 7:44:31
> 2348 williamstim 02AUG03 14:11:20
> 23 jonesjimbo 07AUG03 15:25:00
> 2348 williamstim 17AUG03 9:13:55
> 748 jonesjimbo 13OCT03 14:10:05
> 23 jonesjimbo 14OCT03 23:01:23
> 748 jonesjimbo 14OCT03 23:59:59
>
> I want to read in and sort the file so the new list only contains only
> the most the most recent PID (PIDS get reused often). In my example,
> the new list would be:
>
> 1234 williamstim 01AUG03 7:44:31
> 2348 williamstim 17AUG03 9:13:55
> 23 jonesjimbo 14OCT03 23:01:23
> 748 jonesjimbo 14OCT03 23:59:59
>
> So I need to sort by PID and date + time,then keep the most recent.
>
> Any help would be appreciated!
>
> Taylor
>
> NovasTaylor at hotmail.com





More information about the Python-list mailing list