Taking data from a text file to parse html page

Fredrik Lundh fredrik at pythonware.com
Thu Aug 24 11:58:19 EDT 2006


DH wrote:

> I have a plain text file containing the html and words that I want
> removed(keywords) from the html file, after processing the html file it
> would save it as a plain text file.
> 
> So the program would import the keywords, remove them from the html
> file and save the html  file as something.txt.
> 
> I would post the data but it's secret. I can post an example:
> 
> index.html (html page)
> 
> "
> <div><p><em>"Python has been an important part of Google since the
> beginning, and remains so as the system grows and evolves.
> "</em></p>
> <p>-- Peter Norvig, <a class="reference"
> "
>  
> replace.txt (keywords)
> "
> <div id="quote" class="homepage-box">
> 
> <div><p><em>"
> 
> "</em></p>
> 
> <p>-- Peter Norvig, <a class="reference"
> 
> "
> 
> something.txt(file after editing)
> 
> "
> 
> Python has been an important part of Google since the beginning, and
> remains so as the system grows and evolves.
> "

reading and writing files is described in the tutorial; see

     http://pytut.infogami.com/node9.html

(scroll down to "Reading and Writing Files")

to do the replacement, you can use repeated calls to the "replace" method

     http://pyref.infogami.com/str.replace

but that may cause problems if the replacement text contains things that 
should be replaced.  for an efficient way to do a "parallel" replace, see:

     http://effbot.org/zone/python-replace.htm#multiple


</F>




More information about the Python-list mailing list