[Tutor] extract plain english words from html

Marc Buehler marc_buehler at yahoo.com
Sat Oct 15 00:50:07 CEST 2005


hi.

i have a ton of html files from which i want to
extract the plain english words, and then write
those words into a single text file.

example:
<html>
<head>
<... all kinds html tags ...>
<font color=99cccc size=5>
this is text
</font>

from the above, i want to extract the string 
'this is text' and write it out to a text file.
note that all of the html files have the same 
format, i.e. the text is always surrounded by the same
html tags.
also, i am sorting through thousands of
html files, so whatever i do needs to be
fast.

any ideas?

marc


---------------------------------------------------------------------------------------
The apocalyptic vision of a criminally insane charismatic cult leader 

   http://www.marcbuehler.net
----------------------------------------------------------------------------------------


		
__________________________________ 
Yahoo! Music Unlimited 
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/


More information about the Tutor mailing list