Is there a HTML parser who can reconstruct the original html EXACTLY?

Stefan Behnel stefan.behnel-n05pAM at web.de
Wed Jan 23 10:18:01 EST 2008


Hi,

kliu wrote:
> what I really need is the mapping between each DOM nodes and
> the corresponding original source segment.

I don't think that will be easy to achieve. You could get away with a parser
that provides access to the position of an element in the source, and then map
changes back into the document. But that won't work well in the case where the
parser inserts or deletes content to fix up the structure.

Anyway, the normal focus of broken HTML parsing is in fixing the source
document, not in writing out a broken document. Maybe we could help you better
if you explained what your actual intention is?

Stefan



More information about the Python-list mailing list