Problem with reading CSV file from URL, last record truncated.

MRAB python at mrabarnett.plus.com
Mon Aug 3 19:47:23 EDT 2009


KB wrote:
> On Aug 3, 3:54 pm, KB <ke... at nekotaku.com> wrote:
>> Hi,
>>
>> I am trying to download from a URL, a CSV using the following:
>>
>> import re
>> import urllib, urllib2, cookielib
>> import mechanize
>> import csv
>> import numpy
>> import os
>>
>> def return_ranking():
>>
>>         cj = mechanize.MSIECookieJar(delayload=True)
>>         cj.load_from_registry()  # finds cookie index file from registry
>>
>>         # set things up for cookies
>>
>>         opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
>>
>>         urllib2.install_opener(opener)
>>
>>         reply = opener.open('http://ichart.finance.yahoo.com/table.csv?
>> s=CSCO&a=00&b=01&c=2009&d=01&e=2&f=2010&g=d&ignore=.csv').read()
>>
>>         fout=open('csco.csv','wb')
>>         fout.write(reply)
>>         fout.close

This should be:

          fout.close()

>>
> 
>> return_ranking()
>>
[snip]

> By moving:
>>         fin=open('csco.csv','rb')
>>         table = csv.reader(fin)
>>         fin.close

This should be:

          fin.close()

>>
>>         for row in table:
>>                 print row
> 
> outside of the routine and into the mainline, it works like a charm.
> 
> Would like to know why though, so would love to hear any clues!
> 
The parentheses aren't optional; without them you're just referring to
the method, not calling it.

Because you weren't closing the file the text wasn't all written to
disk. When it returns from return_ranking() there's no longer any
reference to 'fout', so the file object is available for collection by
the garbage collector. When the file object is collected it writes the
remaining text to disk. In CPython the file object is collected as soon
as there's no reference to it, but in other implementations that might
not be the case.



More information about the Python-list mailing list