Help me optimize my feed script.
Jason Scheirer
jason.scheirer at gmail.com
Thu Jun 26 17:56:56 EDT 2008
On Jun 26, 12:30 pm, bsag... at gmail.com wrote:
> I wrote my own feed reader using feedparser.py but it takes about 14
> seconds to process 7 feeds (on a windows box), which seems slow on my
> DSL line. Does anyone see how I can optimize the script below? Thanks
> in advance, Bill
>
> # UTF-8
> import feedparser
>
> rss = [
> 'http://feeds.feedburner.com/typepad/alleyinsider/
> silicon_alley_insider',
> 'http://www.techmeme.com/index.xml',
> 'http://feeds.feedburner.com/slate-97504',
> 'http://rss.cnn.com/rss/money_mostpopular.rss',
> 'http://rss.news.yahoo.com/rss/tech',
> 'http://www.aldaily.com/rss/rss.xml',
> 'http://ezralevant.com/atom.xml'
> ]
> s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'
>
> s += '<style>\n'\
> 'h3{margin:10px 0 0 0;padding:0}\n'\
> 'a.x{color:black}'\
> 'p{margin:5px 0 0 0;padding:0}'\
> '</style>\n'
>
> s += '</head>\n<body>\n<br />\n'
>
> for url in rss:
> d = feedparser.parse(url)
> title = d.feed.title
> link = d.feed.link
> s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
> # aldaily.com has weird feed
> if link.find('aldaily.com') != -1:
> description = d.entries[0].description
> s += description + '\n'
> for x in range(0,3):
> if link.find('aldaily.com') != -1:
> continue
> title = d.entries[x].title
> link = d.entries[x].link
> s += '<a href="'+ link +'">'+ title +'</a><br />\n'
>
> s += '<br /><br />\n</body>\n</html>'
>
> f = open('c:/scripts/myFeeds.htm', 'w')
> f.write(s)
> f.close
>
> print
> print 'myFeeds.htm written'
I can 100% guarantee you that the extended run time is network I/O
bound. Investigate using a thread pool to load the feeds in parallel.
Some code you might be able to shim in:
# Extra imports
import threading
import Queue
# Function that fetches and pushes
def parse_and_put(url, queue_):
parsed_feed = feedparser.parse(url)
queue_.put(parsed_feed)
# Set up some variables
my_queue = Queue.Queue()
threads = []
# Set up a thread for fetching each URL
for url in rss:
url_thread = threading.Thread(target=parse_and_put, name=url,
args=(url, my_queue))
threads.append(url_thread)
url_thread.setDaemonic(False)
url_thread.start()
# Wait for threads to finish
for thread in threads:
thread.join()
# Push the results into a list
feeds_list = []
while not my_queue.empty():
feeds_list.append(my_queue.get())
# Do what you were doing before, replacing the for url in rss with for
d in feedS_list
for d in feeds_list:
title = d.feed.title
link = d.feed.link
More information about the Python-list
mailing list