Downloading multiple files based on info extracted from CSV

Thu Dec 12 17:20:59 EST 2013

On Fri, Dec 13, 2013 at 8:43 AM, Matt Graves <tunacubes at gmail.com> wrote:
> ###This SHOULD plug in the URL for F, and the client name for G.
> def downloadFile(urls, clientname):
>     urllib.urlretrieve(f, "%g.csv") % clientname
>
> downloadFile(f,g)
>
> When I run it, I get : AttributeError: 'file' object has no attribute 'strip'

When showing errors like this, you really need to copy and paste.
Fortunately, I can see where the problem is, here. You're referencing
the file object still in f, which is now a closed file object, instead
of the parameter urls.

But you're also passing f and g as parameters, instead of urls and
clientname. In fact, the downloadFile function isn't really achieving
much; you'd do better to simply inline its code into the main routine
and save yourself the hassle.

While you're at it, there are two more problems in that line of code.
Firstly, you're going to save everything into a file called "%g.csv",
and then try to modulo the return value of urlretrieve with the
clientname; I think you want the close parens at the very end of that
line. And secondly, %g is a floating-point encoder - you want %s here,
or simply use string concatenation:

urllib.urlretrieve(urls, clientname + ".csv")

Except that those are your lists, so that won't work without another
change. We'll fix that later...

> ###This will set column 7 to be a list of urls
> with open('clients.csv', 'r') as f:
>     reader = csv.reader(f)
>     for column in reader:
>         urls.append(column[7])
>
> ###And this will set column 0 as a list of client names
> with open('clients.csv', 'r') as g:
>     reader = csv.reader(g)
>     for column in reader:
>         clientname.append(column[0])

You're reading the file twice. There's no reason to do that; you can
read both columns at once. (By the way, what you're iterating over is
actually rows; for each row that comes out of the reader, do something
with one element from it. So calling it "column" is a bit confusing.)
So now we come to a choice. Question: Is it okay to hold the CSV file
open while you do the downloading? If it is, you can simplify the code
way way down:

import urllib
import csv

# You actually could get away with not using a with
# block here, but may as well keep it for best practice
with open('clients.csv') as f:
    for client in csv.reader(f):
        urllib.urlretrieve(client[7], client[0] + ".csv")

Yep, that's it! That's all you need. But retrieving all that might
take a long time, so it might be better to do all your CSV reading
first and only *then* start downloading. In that case, I'd make a
single list of tuples:

import urllib
import csv

clients = []
with open('clients.csv') as f:
    for client in csv.reader(f):
        clients.append((client[7], client[0] + ".csv"))

for client in clients:
    urllib.urlretrieve(client[0], client[1])

And since the "iterate and append to a new list" idiom is so common,
it can be simplified down to a list comprehension; and since "call
this function with this tuple of arguments" is so common, it has its
own syntax. So the code looks like this:

import urllib
import csv

with open('clients.csv') as f:
    clients = [client[7], client[0]+".csv" for client in csv.reader(f)]

for client in clients:
    urllib.urlretrieve(*client)

Again, it's really that simple! :)

Enjoy!

ChrisA