Downloading the feed using feedparser

Irmen de Jong irmen.NOSPAM at xs4all.nl
Wed Sep 4 14:26:17 EDT 2013


On 4-9-2013 13:12, mukesh tiwari wrote:
> Hello all, I am trying to download the feed of http://blogs.forrester.com/feed but I
> am stuck with a problem.
> 
>>>> import feedparser d = feedparser.parse('http://blogs.forrester.com/feed') 
>>>> d.etag
> u'"1378291653-1"'
>>>> d.modified
> 'Wed, 04 Sep 2013 10:47:33 +0000'
> 
>>>> feedparser.parse('http://blogs.forrester.com/feed', etag=d.etag,
>>>> modified=d.modified).status
> 200
> 
> When I am running this, should not this be 304 ( The content can't be change so fast
> in a moment or this server is not configured properly ). If I rely on this then
> whenever I run the code, I will download the content irrespective of content changed
> or not. Could some one please suggest me how to avoid the duplicate download ?

No it's correct because repeatedly downloading that URL gives me a different etag and
last-modified header in the server's response. Their server is very likely to be
generating the data on the fly every time you retrieve that location. Why do you assume
this can't change so fast? It is very likely not a static file that is being retrieved,
but rather a piece of content that is generated for every request, by their server
application.


> 
> The below one is working fine so if I try to download again then I will get 304
> response since no data is changed on server.
> 
>>>> d = feedparser.parse("feed://feeds.huffingtonpost.com/HP/MostPopular") d.etag

http, I pressume...........^^^^

But yeah, that url gives the same etag and last-modified header in the response, when
repeatedly downloading it. This is probably a static file that is being updated once in
a while.

Irmen



More information about the Python-list mailing list