help!! *extra* tricky web page to extract data from...

Diez B. Roggisch deets at nospam.web.de
Tue Mar 13 19:08:10 EDT 2007


Paul Rubin schrieb:
> "Diez B. Roggisch" <deets at nospam.web.de> writes:
>> Nice idea, but not really helpful in the end. Besides the rather nasty
>> parts of the DOMs that make JS programming the PITA it is, I think the
>> whole event-based stuff makes this basically impossible.
> 
> Obviously the Python interface would need ways to send events into the
> DOM, simulating timer ticks, mouse clicks, and so forth, just like
> urllib in a sense simulates a user navigating a browser.

Obviously this wouldn't really help, as you can't predict what a website 
actually wants which events, in possibly which order. Especially if the 
site does not _want_ to be scrapable- think of a simple "click on the 
images in the order of the numbers shown on them" captcha.

Most time it's easier to sniff the http stream & grab the data directly.

Diez



More information about the Python-list mailing list