issues simply parsing a whitespace-delimited textfile in python script

Paul McGuire ptmcg at austin.rr.com
Wed May 21 12:15:04 EDT 2008


On May 21, 10:59 am, Damon Getsman <dgets... at amirehab.net> wrote:
> I'm having an issue parsing lines of 'last' output that I have stored
> in a /tmp file.  The first time it does a .readline() I get the full
> line of output, which I'm then able to split() and work with the
> individual fields of without any problem.  Unfortunately, the second
> time that I do a .readline() on the file, I am only receiving the
> first character of the first field.  Looking through the /tmp file
> shows that it's not corrupted from the format that it should be in at
> all...  Here's the relevant script:
>
> ----
>     #parse
>     Lastdump = open('/tmp/esd_tmp', 'r')
>
>     #find out what the last day entry is in the wtmp
>     cur_rec = Lastdump.readline()
>     work = cur_rec.split()
>
>     if debug == 1:
>         print work
>         print " is our split record line from /tmp/esd_tmp\n"
>
>     startday = work[3]
>
>     if debug == 1:
>         print startday + " is the starting day\n"
>         print days
>         print " is our dictionary of days\n"
>         print days[startday] + " is our ending day\n"
>
>     for cur_rec in Lastdump.readline():
>         work = cur_rec.split()
>
<snip>


    for cur_rec in Lastdump.readline():

is the problem.  readline() returns a string containing the next
line's worth of text, NOT an iterator over all the subsequent lines in
the file.  So your code is really saying:

    next_line_in_file = Lastdump.readline():
    for cur_rec in next_line_in_file:

which of course, is iterating over a string character by character.

Since you are opening Lastdump (not great casing for a variable name,
BTW - looks like a class name with that leading capital letter), it
gives you an iterator already.  Try this instead:

lastdump = open('/tmp/esd_tmp', 'r')

cur_rec = lastdump.next()

...

    for cur_rec in lastdump:

...

This should get you over the hump on reading the file.

Also, may I suggest this method for splitting up each record line, and
assigning individual fields to variables:

    user,s1,s2,day,month,date,time,desc = cur_rec.split(None,7)

-- Paul




More information about the Python-list mailing list