Testing for changes on a web page (was: how to find difference in number of characters)

Stefan Behnel stefan_ml at behnel.de
Sat Oct 9 08:41:27 EDT 2010


harryos, 09.10.2010 14:24:
> I am trying to determine if a wep page is updated by x number of
> characters..Mozilla firefox plugin 'update scanner' has a similar
> functionality ..A user can specify the x ..I think this would be done
> by reading from the same url at two different times and finding the
> change in body text.

"Number of characters" sounds like a rather useless measure here. I'd 
rather apply an XPath, CSS selector or PyQuery expression to the parsed 
page and check if the interesting subtree of it has changed at all or not, 
potentially disregarding any structural changes by stripping all tags and 
normalising the resulting text to ignore whitespace and case differences.

Stefan




More information about the Python-list mailing list