[XML-SIG] Round-tripping HTML fragment to XML node

Thomas B. Passin tpassin@comcast.net
Sat, 26 Apr 2003 13:20:55 -0400


[Andrew Ittner]

> I have an HTML fragment: <P>this is<BR>a paragraph</P>
> I want to convert it to XHTML: <p>this is<br/>a paragraph</p>
> And store it as a Node in an XML document.
>
> Then, I want to pull the Node back out and convert back to an HTML
fragment.
>
> I want to do this automatically (not using regexp, etc.) because:
> -each HTML fragment is a separate weblog entry (for Yet Another Weblog
Maker
> (c))
> -I store it in XML to publish using XSL

If you are going to use xslt to produce the results, you do not have to do
anything different except use the html output method in your stylesheet.
That will output the html that you want.

Otherwise, I would try adding a single space to those normally empty nodes -
e.g.

<img src='...'> </img>

Strictly speaking, the img and br elements are supposed to be empty, but
most browsers will accept a space and I bet the wxWindows (which is wrapped
by wxPython) viewer will too.  That would be a lot easier than fussing
around or writing a custom serializer.

> -even though I'm probably not going to use any other singletons besides
<BR>
> & <IMG>, I want the parser to handle conversion to well-formed XML
> automagically
> -my HTML viewer (courtesy wxPython) needs HTML and cannot understand XHTML
>

Cheers,

Tom P