[Tutor] Confused about lists...

Chris Watson chris@voodooland.net
Fri, 23 Feb 2001 14:27:36 -0600 (CST)


> Commenting is a good thing, but commenting every line is a little bit
> of... overkill.  *grin*

	True :-) I just wanted to make clear what I was *trying* to do so
if I failed to explain it well hopefully the comments would explain it :-)

> Let's take a look at the code itself:
> 
> > f = open("/var/log/maillog", "r")
> > for i in f.readlines():
> >     if i[4] == 'open-sytems':
> >         f += 1
> >     print f
> > f.close()

> One thing I see is that you're using the name 'f' for two purposes: first,
> as a handle to some file, and second, as a counter.  You might want to
> separate the usage of these into two variables.  Let's call them 'f' for
> the file, and 'c' for the counter.  If you have time, you might want to
> think of slightly more descriptive names for your variables to make things
> easier to read.

for i in f.readlines():
    if i[3] == 'open-sytems':
        counter += 1
    print counter

I changed the code to the above. But it fails to do what I want :-/ It
ignores the if i[3] line and just prints 0's down the screen. If I pipe it
to |wc -l it tells me it is counting each line in the maillog twice. If
there are 100 lines in the maillog it prints 200 0's. Weird.

> > to write a simple parser for a maillog to count things like total messages
> > received/sent, connections/day, total time spent on connections, etc..
> > I had 'f +=1' and 'print f' changed to use i instead of f. But that
> > printed out TWICE as many lines as it should have. It seems to be just
> 
> Can you explain more what you mean by twice?  Oh!  I think I see what you
> mean.  This part of the code might be what's causing the duplicate
> printing:

I mean it's counting each line in the logfile twice. For some reason. :-/

> >     if i[4] == 'open-sytems':
> >         c += 1                 ## [some text changed from the original]
> >     print c
> 
> 
> In this case, regardless if we see an 'open-sytems' or not, the program
> will print the value of f.  This might lead to the following output:
> 
> ###
> 0
> 0
> 1
> 1
> 2
> 2
> 2
> 3
> ###

	Sort of. It's actually printing:

0
0
0
0
0
...

	And for each line in the log its printing two 0's for each line.
So its counting each line twice and ignoring my if i[3] part. Hrmm.

> which would look like its doubling up.  You probably want to print out the
> value of your counter only if its just recently changed.  If so, try
> this:
> 
>      if i[4] == 'open-sytems':
>          c += 1                  # let's change it to 'c'
>          print c                 # because 'f' sounds like a 'file'

for i in f.readlines():
    if i[3] == 'open-sytems':
        counter += 1
        print counter

Moving the print counter line inline with counter prints one 0. Hmm.

> The other thing you'll need to check involves this part:
> 
> ###
> for i in f.readlines():
>      if i[4] == 'open-sytems':
> ###
> 
> Could you show us an example of what your file would look like?  The only
> thing that worries me is that 'i' will be a line, but 'i[4]' is going to
> be a single character --- Python will not automatically pull columns out
> of a string without some help.  For example, say that '/var/log/maillog'
> contains the following line of text:

	Yes. I was afraid of that too. I want the 4th field. In the log
below you note that each field is seperated by whitespace. I.e. i[1] would
I think/want to grab "delay=00:00:00". 0 being the the "to" line 1 being
the delay. See what I mean?

> to=<dyoo@hkn.eecs.berkeley.edu>, delay=00:00:00, xdelay=00:00:00,\
> mailer=esmtp, relay=hkn.eecs.berkeley.edu. [128.32.138.117],\
> stat=Sent (EAA01739 Message accepted for delivery)"
> 
> i[4] looks like the '1' from the date 'Feb 19', and not 'c82114-a'.

	Well but its ignoring the if i[n] line anyway because it isnt
counting the hostname. It's just counting the whole line.

> We need to tell Python how we break up a string into columns.  We could
> separate things between commas, or between spaces---but we need to give
> Python a "delimiter" character that separates the columns.

	Hrmm I guess thats true. Since the log is alread pre-formatted I
didnt think it was neccessary. Maybe thats goofing things up.

> You might want to play around with string.split():
> 
> ###
> >>> string.split('this is a short string', ' ')                
> ['this', 'is', 'a', 'short', 'string']
> >>> string.split('i,could,be a line,from a,,csf file', ',')
> ['i', 'could', 'be a line', 'from a', '', 'csf file']
> ###
> 
> I think I went a little fast though; if you have any questions, please
> feel free to ask the tutor list again.  Good luck to you.

I'm about to try bob's example. And see if string.split(i)... work.
I still would like to know why it's ignoring my if i[4] line though and
counting each whole line twice. hrmm. I really appreciate the explanation
of things so far. I should of used better counter names :-) It does make
things much easier to know whats going on.

--
=============================================================================
-Chris Watson         (316) 326-3862 | FreeBSD Consultant, FreeBSD Geek 
Work:              scanner@jurai.net | Open Systems Inc., Wellington, Kansas
Home:  scanner@deceptively.shady.org | http://open-systems.net
=============================================================================
WINDOWS: "Where do you want to go today?"
LINUX: "Where do you want to go tomorrow?"
BSD: "Are you guys coming or what?"
=============================================================================
irc.openprojects.net #FreeBSD -Join the revolution!
ICQ: 20016186