internet searching program

Michael Tobis mtobis at gmail.com
Sat Aug 9 12:07:25 EDT 2008


I think you are talking about "screen scraping".

Your program can get the html for the page, and search for an
appropriate pattern.

Look at the source for a YouTube view page and you will see a string

         var embedUrl = 'http://....

You can write code to search for that in the html text.

But embedUrl is not a standard javascript item; it is part of the
YouTube design. You will be relying on that, and if YouTube changes
how they provide such a string, you will be out of luck; at best you
will have to rewrite part of your code.

In other words, screen scraping relies on the site not doing a major
reworking and on you doing a little reverse engineering of the page
source.

So how to get the html into your program? In python the answer to that
is urllib.

http://docs.python.org/lib/module-urllib.html

mt



More information about the Python-list mailing list