[Tutor] extract plain english words from html
Marc Buehler
marc_buehler at yahoo.com
Sat Oct 15 00:50:07 CEST 2005
hi.
i have a ton of html files from which i want to
extract the plain english words, and then write
those words into a single text file.
example:
<html>
<head>
<... all kinds html tags ...>
<font color=99cccc size=5>
this is text
</font>
from the above, i want to extract the string
'this is text' and write it out to a text file.
note that all of the html files have the same
format, i.e. the text is always surrounded by the same
html tags.
also, i am sorting through thousands of
html files, so whatever i do needs to be
fast.
any ideas?
marc
---------------------------------------------------------------------------------------
The apocalyptic vision of a criminally insane charismatic cult leader
http://www.marcbuehler.net
----------------------------------------------------------------------------------------
__________________________________
Yahoo! Music Unlimited
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/
More information about the Tutor
mailing list