Taking data from a text file to parse html page

DH dylanhughes at gmail.com
Thu Aug 24 10:58:36 EDT 2006


Frederic,
Good points...

I have a plain text file containing the html and words that I want
removed(keywords) from the html file, after processing the html file it
would save it as a plain text file.

So the program would import the keywords, remove them from the html
file and save the html  file as something.txt.

I would post the data but it's secret. I can post an example:

index.html (html page)

"
<div><p><em>"Python has been an important part of Google since the
beginning, and remains so as the system grows and evolves.
"</em></p>
<p>-- Peter Norvig, <a class="reference"
"


replace.txt (keywords)
"
<div id="quote" class="homepage-box">

<div><p><em>"

"</em></p>

<p>-- Peter Norvig, <a class="reference"

"

something.txt(file after editing)

"

Python has been an important part of Google since the beginning, and
remains so as the system grows and evolves.
"


Larry,

I've looked into using BeatifulSoup but came to the conculsion that my
idea would work better in the end.


Thanks for the help.


Anthra Norell wrote:
> DH,
>       Could you be more specific describing what you have and what you want? You are addressing people, many of whom are good at
> stripping useless junk once you tell them what 'useless junk' is.
>       Also it helps to post some of you data that you need to process and a sample of the same data as it should look once it is
> processed.
>
> Frederic
>
> ----- Original Message -----
> From: "DH" <dylanhughes at gmail.com>
> Newsgroups: comp.lang.python
> To: <python-list at python.org>
> Sent: Thursday, August 24, 2006 2:11 AM
> Subject: Taking data from a text file to parse html page
>
>
> > Hi,
> >
> > I'm trying to strip the html and other useless junk from a html page..
> > Id like to create something like an automated text editor, where it
> > takes the keywords from a txt file and removes them from the html page
> > (replace the words in the html page with blank space) I'm new to python
> > and could use a little push in the right direction, any ideas on how to
> > implement this?
> >
> > Thanks!
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list