issues simply parsing a whitespace-delimited textfile in python script
Paul McGuire
ptmcg at austin.rr.com
Wed May 21 12:15:04 EDT 2008
On May 21, 10:59 am, Damon Getsman <dgets... at amirehab.net> wrote:
> I'm having an issue parsing lines of 'last' output that I have stored
> in a /tmp file. The first time it does a .readline() I get the full
> line of output, which I'm then able to split() and work with the
> individual fields of without any problem. Unfortunately, the second
> time that I do a .readline() on the file, I am only receiving the
> first character of the first field. Looking through the /tmp file
> shows that it's not corrupted from the format that it should be in at
> all... Here's the relevant script:
>
> ----
> #parse
> Lastdump = open('/tmp/esd_tmp', 'r')
>
> #find out what the last day entry is in the wtmp
> cur_rec = Lastdump.readline()
> work = cur_rec.split()
>
> if debug == 1:
> print work
> print " is our split record line from /tmp/esd_tmp\n"
>
> startday = work[3]
>
> if debug == 1:
> print startday + " is the starting day\n"
> print days
> print " is our dictionary of days\n"
> print days[startday] + " is our ending day\n"
>
> for cur_rec in Lastdump.readline():
> work = cur_rec.split()
>
<snip>
for cur_rec in Lastdump.readline():
is the problem. readline() returns a string containing the next
line's worth of text, NOT an iterator over all the subsequent lines in
the file. So your code is really saying:
next_line_in_file = Lastdump.readline():
for cur_rec in next_line_in_file:
which of course, is iterating over a string character by character.
Since you are opening Lastdump (not great casing for a variable name,
BTW - looks like a class name with that leading capital letter), it
gives you an iterator already. Try this instead:
lastdump = open('/tmp/esd_tmp', 'r')
cur_rec = lastdump.next()
...
for cur_rec in lastdump:
...
This should get you over the hump on reading the file.
Also, may I suggest this method for splitting up each record line, and
assigning individual fields to variables:
user,s1,s2,day,month,date,time,desc = cur_rec.split(None,7)
-- Paul
More information about the Python-list
mailing list