[Tutor] Confused about lists...
Chris Watson
chris@voodooland.net
Fri, 23 Feb 2001 14:27:36 -0600 (CST)
> Commenting is a good thing, but commenting every line is a little bit
> of... overkill. *grin*
True :-) I just wanted to make clear what I was *trying* to do so
if I failed to explain it well hopefully the comments would explain it :-)
> Let's take a look at the code itself:
>
> > f = open("/var/log/maillog", "r")
> > for i in f.readlines():
> > if i[4] == 'open-sytems':
> > f += 1
> > print f
> > f.close()
> One thing I see is that you're using the name 'f' for two purposes: first,
> as a handle to some file, and second, as a counter. You might want to
> separate the usage of these into two variables. Let's call them 'f' for
> the file, and 'c' for the counter. If you have time, you might want to
> think of slightly more descriptive names for your variables to make things
> easier to read.
for i in f.readlines():
if i[3] == 'open-sytems':
counter += 1
print counter
I changed the code to the above. But it fails to do what I want :-/ It
ignores the if i[3] line and just prints 0's down the screen. If I pipe it
to |wc -l it tells me it is counting each line in the maillog twice. If
there are 100 lines in the maillog it prints 200 0's. Weird.
> > to write a simple parser for a maillog to count things like total messages
> > received/sent, connections/day, total time spent on connections, etc..
> > I had 'f +=1' and 'print f' changed to use i instead of f. But that
> > printed out TWICE as many lines as it should have. It seems to be just
>
> Can you explain more what you mean by twice? Oh! I think I see what you
> mean. This part of the code might be what's causing the duplicate
> printing:
I mean it's counting each line in the logfile twice. For some reason. :-/
> > if i[4] == 'open-sytems':
> > c += 1 ## [some text changed from the original]
> > print c
>
>
> In this case, regardless if we see an 'open-sytems' or not, the program
> will print the value of f. This might lead to the following output:
>
> ###
> 0
> 0
> 1
> 1
> 2
> 2
> 2
> 3
> ###
Sort of. It's actually printing:
0
0
0
0
0
...
And for each line in the log its printing two 0's for each line.
So its counting each line twice and ignoring my if i[3] part. Hrmm.
> which would look like its doubling up. You probably want to print out the
> value of your counter only if its just recently changed. If so, try
> this:
>
> if i[4] == 'open-sytems':
> c += 1 # let's change it to 'c'
> print c # because 'f' sounds like a 'file'
for i in f.readlines():
if i[3] == 'open-sytems':
counter += 1
print counter
Moving the print counter line inline with counter prints one 0. Hmm.
> The other thing you'll need to check involves this part:
>
> ###
> for i in f.readlines():
> if i[4] == 'open-sytems':
> ###
>
> Could you show us an example of what your file would look like? The only
> thing that worries me is that 'i' will be a line, but 'i[4]' is going to
> be a single character --- Python will not automatically pull columns out
> of a string without some help. For example, say that '/var/log/maillog'
> contains the following line of text:
Yes. I was afraid of that too. I want the 4th field. In the log
below you note that each field is seperated by whitespace. I.e. i[1] would
I think/want to grab "delay=00:00:00". 0 being the the "to" line 1 being
the delay. See what I mean?
> to=<dyoo@hkn.eecs.berkeley.edu>, delay=00:00:00, xdelay=00:00:00,\
> mailer=esmtp, relay=hkn.eecs.berkeley.edu. [128.32.138.117],\
> stat=Sent (EAA01739 Message accepted for delivery)"
>
> i[4] looks like the '1' from the date 'Feb 19', and not 'c82114-a'.
Well but its ignoring the if i[n] line anyway because it isnt
counting the hostname. It's just counting the whole line.
> We need to tell Python how we break up a string into columns. We could
> separate things between commas, or between spaces---but we need to give
> Python a "delimiter" character that separates the columns.
Hrmm I guess thats true. Since the log is alread pre-formatted I
didnt think it was neccessary. Maybe thats goofing things up.
> You might want to play around with string.split():
>
> ###
> >>> string.split('this is a short string', ' ')
> ['this', 'is', 'a', 'short', 'string']
> >>> string.split('i,could,be a line,from a,,csf file', ',')
> ['i', 'could', 'be a line', 'from a', '', 'csf file']
> ###
>
> I think I went a little fast though; if you have any questions, please
> feel free to ask the tutor list again. Good luck to you.
I'm about to try bob's example. And see if string.split(i)... work.
I still would like to know why it's ignoring my if i[4] line though and
counting each whole line twice. hrmm. I really appreciate the explanation
of things so far. I should of used better counter names :-) It does make
things much easier to know whats going on.
--
=============================================================================
-Chris Watson (316) 326-3862 | FreeBSD Consultant, FreeBSD Geek
Work: scanner@jurai.net | Open Systems Inc., Wellington, Kansas
Home: scanner@deceptively.shady.org | http://open-systems.net
=============================================================================
WINDOWS: "Where do you want to go today?"
LINUX: "Where do you want to go tomorrow?"
BSD: "Are you guys coming or what?"
=============================================================================
irc.openprojects.net #FreeBSD -Join the revolution!
ICQ: 20016186