When to use mechanize and Windmill library during WebScraping ?

Javier Collado javier.collado at gmail.com
Sat Dec 12 06:52:27 EST 2009


Hello,

If a script that uses mechanize fails to find an html node that has
been identified with Firebug, this is probably because that node has
been autogenerated (provided that the expression to get the node is
correct).

As an alternative to verify this, you can try to download the html
page and open it in your favourite editor. If some of the nodes that
you can see in your browser are missing or empty, then one of the
JavaScript scripts in the page should have created/populated it.

If you're in doubt, you can try to use mechanize and, if you have
problems such as the described above, then you can move to windmill or
some other tool that executes JavaScript code before trying to get the
desired data.

Best regards,
    Javier

2009/12/11 Raji Seetharaman <sraji.me at gmail.com>:
> Hi
>
> For 'Webscraping with Python' mechanize or urllib2 and windmill or selenium
> libraries are used  to download the webpages.
>
> http://www.packtpub.com/article/web-scraping-with-python
>
> The above link makes use of mechanize library to download the web pages.
>
> The below link uses windmill library to download the web pages.
>
> http://www.packtpub.com/article/web-scraping-with-python-part-2
>
> I dont know when to use mechanize or windmill library
>
> It has been said that Windmill library is used when the HTML file is auto
> generated by the JavaScript code.
>
> Also i dont know how to identify whether the HTML file is auto generated by
> the JavaScript code or not ?
>
> Suggest me
>
> Thanks
>
> Raji. S
> http://sraji.wordpress.com/
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>



More information about the Python-list mailing list