retrieving ATOM/FSS feeds

_spitFIRE timid.gentoo at gmail.com
Mon Aug 13 06:07:14 EDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lawrence Oluyede wrote:
> If the content producer doesn't provide the full article via RSS/ATOM
> there's no way you can get it from there. Search for full content feeds
> if any, otherwise get the article URL and feed it to BeautifulSoup to
> scrape the content.
> 

For the same feed (where the content producer doesn't provide the full
article!) I was able to see the complete post in other RSS aggregators (like
Blam). I wanted to know how they were able to collect the feed!

I knew for sure that you can't do screen scraping separately for each and
every blog and that there has be a standard way or atleast that blogs
maintain a standard template for rendering posts. I mean if each of the site
only offered partial content and the rest had to be scraped from the page,
and the page maintained a non-standard structure which is more likely, then
it would become impossible IMHO for any aggregator to aggregate feeds!

I shall for now try with BeautifulSoup, though I'm still doubtful about it.

- --
_ _ _]{5pitph!r3}[_ _ _
__________________________________________________
“I'm smart enough to know that I'm dumb.”
  - Richard P Feynman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGwC1SA0th8WKBUJMRAs4eAJ0bLJVzEZls1JtE6e8MUrqdapXGPwCfVO02
yYzezvhJFY1SDHUGxrJdR5M=
=rfLo
-----END PGP SIGNATURE-----



More information about the Python-list mailing list