web page text extractor
Jon Rosebaugh
jon at turnthepage.org
Thu Jul 12 10:22:51 EDT 2007
On 2007-07-12 04:42:25 -0500, kublai <restycena at gmail.com> said:
> For a project, I need to develop a corpus of online news stories. I'm
> looking for an application that, given the url of a web page, "copies"
> the rendered text of the web page (not the source HTNL text), opens a
> text editor (Notepad), and displays the copied text for the user to
> examine and save into a text file. Graphics and sidebars to be
> ignored. The examples I have come across are much too complex for me
> to customize for this simple job. Can anyone lead me to the right
> direction?
You may find BeautifulSoup or templatemaker to be of assistance:
http://www.crummy.com/software/BeautifulSoup/
http://www.holovaty.com/blog/archive/2007/07/06/0128
More information about the Python-list
mailing list