Buffering HTML as HTMLParser reads it?

chrispwd at gmail.com chrispwd at gmail.com
Sun Aug 5 14:10:13 EDT 2007


On Aug 1, 4:08 pm, Paul McGuire <pt... at austin.rr.com> wrote:
> On Aug 1, 1:31 pm, chris... at gmail.com wrote:
> <snip>
>
>
>
> > I'm thinking maybe somehow haveHTMLParserappend each character it
> > reads except for data inside tags in some kind of buffer? This way I
> > can have the HTML contents read into a buffer, then when I do my own
> > handle_ overrides, I can also append to that buffer with the
> > transformed data. Once the HTML page is finished parsing, ideally I
> > would be able to print the contents of the buffer and the HTML would
> > be identical except for the string transformations.
>
> > I also need to make sure that all newlines, tags, spacing, etc are
> > kept in tact -- this part is a requirement for other reasons.
>
> > Thanks!
>
> What you describe is almost exactly how pyparsing implements
> transformString.  See below:
>
> from pyparsing import *
>
> boldStart,boldEnd = makeHTMLTags("B")
>
> # convert <B> to <div class="bold"> and </B> to </div>
> boldStart.setParseAction(replaceWith('<div class="emphatic">'))
> boldEnd.setParseAction(replaceWith('</div>'))
> converter = boldStart | boldEnd
>
> html = "Display this in <b>bold</b>"
> print converter.transformString(html)
>
> Prints:
>
> Display this in <div class="emphatic">bold</div>
>
> All text not matched by a pattern in the converter is left as-is.  (My
> CSS style/form may not be up to date, but I hope you get the idea.)
>
> -- Paul

Hello,

Sorry for the delay in reply, and that you for the info. Though, I
think either I am mis-understanding your post or its not the solution
I'm looking for.

How does this fit into what I'm looking to do with HTMLParser?

Thanks!




More information about the Python-list mailing list