Best way to match everything between tags

Mark Pilgrim f8dy at my-deja.com
Wed Jan 31 20:16:20 EST 2001


In article <DA0e6.94$o3.4354 at news.world-online.no>,
  "Henning VON ROSEN" <hvrosen at world-online.no> wrote:
> Hi!
> I am learning regular expressions.
>
> What is thenatural way to match everything that is not "something"
> fx i want to maipulate all the text of a html document, but none of the tags

Regular expressions may not be the best solution for this.  Try subclassing
SGMLParser and passing the document through the parser.  Specific class
methods will be called for each start tag, each end tag, and each block of
text in between.  This gives you free reign to manipulate the
text-between-the-tags any way you like, and just pass the start/end tags
through unaltered.

See http://www.faqts.com/knowledge_base/view.phtml/aid/4200/fid/549 for a
working example.  The method you'd want to do your manipulation in is
"handle_data".

-M
--
You're smart; why haven't you learned Python yet?  http://diveintopython.org/


Sent via Deja.com
http://www.deja.com/



More information about the Python-list mailing list