[Tutor] Feedparser and Google News feeds

David Kim dkdropbox at gmail.com
Thu Mar 11 03:17:12 CET 2010


I have been working through some of the examples in the Programming
Collective Intelligence book by Toby Segaran. I highly recommend it, btw.

Anyway, some of the exercises use feedparser to pull in RSS/Atom feeds from
different sources (before doing more interesting things). The algorithm
stuff I pretty much follow, but one thing is driving me CRAZY: I can't seem
to pull more than 10 items from a google news feed. For example, I'd like to
pull 1000 google news items (using some search term, let's say
'lightsabers'). The associated atom feed url, however, only holds ten items.
And its hard to do some of the clustering analysis with only ten items!

Anyway, I imagine this must be a straightforward thing and I'm being a
moron, but I don't know where else to ask this question (none of my friends
are web-savvy programmers). I did see some posts about an n=100 term one can
add to the url (the limit seems to be 100 items), but it only seems to
effect the webpage view and not the feed. I've also tried subscribing to the
feed in Google Reader and making the feed public, but I seem to be running
into the same problem. Is this a feedparser thing or a google thing?

The url I'm using is
http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&as_scoring=r&as_maxm=3&q=health+information+exchange&as_qdr=a&as_drrb=q&as_mind=8&as_minm=2&cf=all&as_maxd=100&output=rss

Can anyone help me? I'm tearing my hair out and want to choke my computer.
It's probably not relevant, but I'm running Snow Leopard and Python 2.6
(actually EPD 6.1).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100310/ab791b30/attachment.html>


More information about the Tutor mailing list