Help me optimize my feed script.

Jason Scheirer jason.scheirer at gmail.com
Thu Jun 26 17:56:56 EDT 2008


On Jun 26, 12:30 pm, bsag... at gmail.com wrote:
> I wrote my own feed reader using feedparser.py but it takes about 14
> seconds to process 7 feeds (on a windows box), which seems slow on my
> DSL line. Does anyone see how I can optimize the script below? Thanks
> in advance, Bill
>
> # UTF-8
> import feedparser
>
> rss = [
> 'http://feeds.feedburner.com/typepad/alleyinsider/
> silicon_alley_insider',
> 'http://www.techmeme.com/index.xml',
> 'http://feeds.feedburner.com/slate-97504',
> 'http://rss.cnn.com/rss/money_mostpopular.rss',
> 'http://rss.news.yahoo.com/rss/tech',
> 'http://www.aldaily.com/rss/rss.xml',
> 'http://ezralevant.com/atom.xml'
> ]
> s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'
>
> s += '<style>\n'\
>      'h3{margin:10px 0 0 0;padding:0}\n'\
>      'a.x{color:black}'\
>      'p{margin:5px 0 0 0;padding:0}'\
>      '</style>\n'
>
> s += '</head>\n<body>\n<br />\n'
>
> for url in rss:
>         d = feedparser.parse(url)
>         title = d.feed.title
>         link = d.feed.link
>         s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
>         # aldaily.com has weird feed
>         if link.find('aldaily.com') != -1:
>                 description = d.entries[0].description
>                 s += description + '\n'
>         for x in range(0,3):
>                 if link.find('aldaily.com') != -1:
>                         continue
>                 title = d.entries[x].title
>                 link = d.entries[x].link
>                 s += '<a href="'+ link +'">'+ title +'</a><br />\n'
>
> s += '<br /><br />\n</body>\n</html>'
>
> f = open('c:/scripts/myFeeds.htm', 'w')
> f.write(s)
> f.close
>
> print
> print 'myFeeds.htm written'

I can 100% guarantee you that the extended run time is network I/O
bound. Investigate using a thread pool to load the feeds in parallel.
Some code you might be able to shim in:

# Extra imports
import threading
import Queue

# Function that fetches and pushes
def parse_and_put(url, queue_):
  parsed_feed = feedparser.parse(url)
  queue_.put(parsed_feed)

# Set up some variables
my_queue = Queue.Queue()
threads = []

# Set up a thread for fetching each URL
for url in rss:
  url_thread = threading.Thread(target=parse_and_put, name=url,
args=(url, my_queue))
  threads.append(url_thread)
  url_thread.setDaemonic(False)
  url_thread.start()

# Wait for threads to finish
for thread in threads:
  thread.join()

# Push the results into a list
feeds_list = []
while not my_queue.empty():
  feeds_list.append(my_queue.get())

# Do what you were doing before, replacing the for url in rss with for
d in feedS_list
for d in feeds_list:
        title = d.feed.title
        link = d.feed.link




More information about the Python-list mailing list