web page text extractor

kublai restycena at gmail.com
Thu Jul 12 05:42:25 EDT 2007


Hello,

For a project, I need to develop a corpus of online news stories.  I'm
looking for an application that, given the url of a web page, "copies"
the rendered text of the web page (not the source HTNL text), opens a
text editor (Notepad), and displays the copied text for the user to
examine and save into a text file. Graphics and sidebars to be
ignored. The examples I have come across are much too complex for me
to customize for this simple job. Can anyone lead me to the right
direction?

Thanks,
gk




More information about the Python-list mailing list