[SQL] Pick random rows from SELECT?

Peter Otten __peter__ at web.de
Mon Sep 21 06:52:58 EDT 2009


Gilles Ganault wrote:

> I have a working Python script that SELECTs rows from a database to
> fetch a company's name from a web-based database.
> 
> Since this list is quite big and the site is the bottleneck, I'd like
> to run multiple instances of this script, and figured a solution would
> be to pick rows at random from the dataset, check in my local database
> if this item has already been taken care of, and if not, download
> details from the remote web site.
> 
> If someone's done this before, should I perform the randomization in
> the SQL query (SQLite using the APSW wrapper
> http://code.google.com/p/apsw/), or in Python?
> 
> Thank you.
> 
> Here's some simplified code:
> 
> sql = 'SELECT id,label FROM companies WHERE activity=1'
> rows=list(cursor.execute(sql))
> for row in rows:
>         id = row[0]
>         label = row[1]
> 
>         print strftime("%H:%M")
>         url = "http://www.acme.com/details.php?id=%s" % id
>         req = urllib2.Request(url, None, headers)
>         response = urllib2.urlopen(req).read()
>         
>         name = re_name.search(response)
>         if name:
>                 name = name.group(1)
>         sql = 'UPDATE companies SET name=? WHERE id=?'
>         cursor.execute(sql, (name,id) )
 
I don't think you need to randomize the requests. Instead you could control 
a pool of worker processes using

http://docs.python.org/library/multiprocessing.html

Peter




More information about the Python-list mailing list