is there a safe/easy way to search and replace text w/in an html page?

Phil Mayes nospam at bitbucket.com
Sun Nov 28 03:06:49 EST 1999


Levi Cook wrote in message <3840593B.2D1A2E30 at earthlink.net>...
>Hello,
>
>I'm new to python and I've been tasked w/ writing a script that fetches
>a web-page and highlights the term that the user searched for. Passing
>the URL and search term to my script was a cinch and fetching the page
>was also quite easy.
>
>The crux of my problem lies with establishing a strategy for searching
>out the users term and replacing it with css tags surrounding the
>original term.
>At first glance the re module seemed sufficient (and maybe still is ),
>but it doesn't really have any knowledge of HTML semantics.

Wrapping the found text in <TAG of your choice> should work fine.
One caveat is if tags appear within the target text, eg Netscape pages
used to bump the font size of the first letter of a title by one:
  <FONT SIZE=+1>N</FONT>etscape

>My next
>venture was to look at the htmllib module. This appears to parse the
>html fine, but i'm still at a bit of a loss on what this actually does
>for me, particularly how do I get my parser to output my page.
>
>Does anyone know where I can find code that does anything remotely like
>this?
>Does anyone know where I can find good reference on the htmllib module,
>the python library reference is pretty bleak in this area.

> [code sample snipped]

The htmllib module is fairly primitive, eg it doesn't do tables, frames,
images or css, so unless you are able to control the HTML contents
pretty strictly, you would be better off asking a competent browser to
display your modified page - write it somewhere and load file://...

What browser to talk to, and how, is going to be machine-specific.
On Win98, Internet Explorer can be controlled with PythonWin.
Look at win32com\test\testExplorer.py
--
Phil Mayes    pmayes AT olivebr DOT com
Olive Branch Software







More information about the Python-list mailing list