issues simply parsing a whitespace-delimited textfile in python script

Damon Getsman dgetsman at amirehab.net
Wed May 21 11:59:57 EDT 2008


Okay so I'm writing a script in python right now as a dirty fix for a
problem we're having at work..  Unfortunately this is the first really
non-trivial script that I've had to work with in python and the book
that I have on it really kind of sucks.

I'm having an issue parsing lines of 'last' output that I have stored
in a /tmp file.  The first time it does a .readline() I get the full
line of output, which I'm then able to split() and work with the
individual fields of without any problem.  Unfortunately, the second
time that I do a .readline() on the file, I am only receiving the
first character of the first field.  Looking through the /tmp file
shows that it's not corrupted from the format that it should be in at
all...  Here's the relevant script:

----
    #parse
    Lastdump = open('/tmp/esd_tmp', 'r')

    #find out what the last day entry is in the wtmp
    cur_rec = Lastdump.readline()
    work = cur_rec.split()

    if debug == 1:
        print work
        print " is our split record line from /tmp/esd_tmp\n"

    startday = work[3]

    if debug == 1:
        print startday + " is the starting day\n"
        print days
        print " is our dictionary of days\n"
        print days[startday] + " is our ending day\n"

    for cur_rec in Lastdump.readline():
        work = cur_rec.split()

        if debug == 1:
            print "Starting table building pass . . .\n"
            print work
            print " is the contents of our split record line now\n"
            print cur_rec + " is the contents of cur_rec\n"

        #only go back 2 days

        while work[0] != days[startday]:
            tmp = work[1]
            if table.has_key(work[0]):
                continue
            elif tmp[0] != ':':
                #don't keep it if it isn't a SunRay terminal
identifier
                continue
            else:
                #now we keep it
                table[work[0]] = tmp
----

the first and second sets of debugging output show everything as they
should be...  the third shows that the next working line (in cur_rec),
and thus 'work', as well, only hold the first character of the line.
Here's the output:

----
Debugging run


Building table . . .

['dgetsman', 'pts/3', ':0.0', 'Wed', 'May', '21', '10:21', 'still',
'logged',
'in']
 is our split record line from /tmp/esd_tmp

Wed is the starting day

{'Wed': 'Mon', 'Sun': 'Fri', 'Fri': 'Wed', 'Thurs': 'Tues', 'Tues':
'Sun',
'Mon': 'Sat', 'Sat': 'Thurs'}
 is our dictionary of days

Mon is our ending day

Starting table building pass . . .

['d']
 is the contents of our split record line now

d is the contents of cur_rec

----
And thus everything fails when I try to work with the different fields
in subsequent script afterwards.  Does anybody have an idea as to why
this would be happening?

Oh, and if relevant, here's the datafile's first few lines:

----
dgetsman pts/3        :0.0             Wed May 21 10:21   still logged
in
dgetsman pts/2        :0.0             Wed May 21 09:04   still logged
in
dgetsman pts/1        :0.0             Wed May 21 08:56 - 10:21
(01:24)
dgetsman pts/0        :0.0             Wed May 21 08:56   still logged
in

I would really appreciate any pointers or suggestions you can give.

<a href="http://www.zoominfo.com/people/Getsman_Damon_-214241.aspx">
*Damon Getsman
Linux/Solaris System Administrator
</a>



More information about the Python-list mailing list