[Tutor] Hi, First question

Steve Willoughby steve at alchemy.com
Sun Jun 16 18:32:33 CEST 2013


On 16-Jun-2013, at 09:21, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:

> On 16/06/2013 16:55, Chris “Kwpolska” Warrick wrote:
>> On Sat, Jun 15, 2013 at 7:22 AM, Patrick Williams <pdw0005 at gmail.com> wrote:
>>> Hi so I am making a bit of code to extract a bit of numbers data from a file
>>> and then find the average of that data, however while I can get the code to
>>> extract each specific piece of data I need, I can't seem to get the numbers
>>> to add separately  so I can get a proper average. My sum1 variable seems to
>>> only take the last bit of data entered. I was just wondering if anyone knows
>>> what I'm doing wrong, the course I'm following hadn't started using regex
>>> (or even proper lists) at this point, so there must be a way to do it
>>> without. here's the code. the average of the data should be 0.6789 or
>>> something, but I get 0.0334343 or something.
>>> 
>>> count=0
>>> lst=list()
>> 
>> `lst = []` is the preferred syntax.
>> 
>>> fname='mbox-short.txt'
>>> fhand=open(fname)
>>> for line in fhand:
>>>     if line.startswith('X-DSPAM-Confidence:'):
>>>         count=count+1
>>>         colpos=line.find(':')
>>>         zpos=line.find('0',colpos)
>>>         num=float(line[zpos:50])
>>>         sum1=0+num
>>>         avg=float(sum1)/int(count)
> 
> I'll assume unless someone tells me differently that sum1 does not need reinitialising every time, and that avg needs to be calculated when the loop has finished.
> 
>>> print 'Count-', count,'--', 'Average-', avg
>>> 
>>> Any help at all is appreciated, and thanks in advance.
>>> 
>> 
>> I don’t know what file you used, but the message you sent got this
>> header from Gmail, and the format doesn’t seem to be much different:
>> 
>>> X-Spam-Evidence: '*H*': 0.79; '*S*': 0.00; 'separately': 0.09;
>>>        'wrong,': 0.09; 'subject:question': 0.10; 'code.': 0.18;
>>>        'variable': 0.18; 'bit': 0.19; 'advance.': 0.19; 'seems': 0.21;
>>>        '8bit%:5': 0.22; 'print': 0.22; 'skip:l 30': 0.24; '\xa0so': 0.24;
>>> [snip 11 more lines]
>> (replaced tabstops with spaces)
>> 
>> Can you guess what’s wrong in your code?
>> 
>> You are reading only the first line.                 >
> 
> What does "for line in fhand:" do then?

I think what that was referring to was the assumption that you're reading mail header lines from that file, and they can be split out over multiple lines (see the example cited above).  If that's the case, then "for line in fhand" will iterate over each line in the file, but you're only looking for lines which start with "X-Spam-.." which would only be the FIRST part of the header if it's split out like that.

If your file is NOT organized like that, then your situation is different.  However, if your files are like that, you're going to randomly miss data if the fields you're looking for don't happen to be on the first line of the multi-line header.

Now if you are reading RFC-822 (et al) standard mail messages in those files, there are bits of the Python standard library which will be immensely useful to you in parsing out those headers rather than trying to do it yourself.  That's something you're going to find to be the case frequently with Python.
 
> 
> -- 
> "Steve is going for the pink ball - and for those of you who are watching in black and white, the pink is next to the green." Snooker commentator 'Whispering' Ted Lowe.
> 
> Mark Lawrence
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list