[XML-SIG] lxml - html entities
Stefan Behnel
stefan_ml at behnel.de
Tue Jul 29 07:43:28 CEST 2008
(this is being discussed on the lxml mailing list)
spencer.c wrote:
> I am using lxml to process some xhtml files. The files have html character
> codes embedded in them. For instance: ' rather than a '. When I parse
> the files, edit them, and then write them back out, I want my edits to be
> the only changes in the output files, but lxml is replacing the character
> codes with the actual characters they are supposed to represent as well.
>
> So if I have:
> It& #39;s an example. <-- Space inserted to help readability.
>
> It is writing out:
> It's an example.
>
> I've tried setting resolve_entities to false, ala:
> tree = etree.parse(input, etree.XMLParser(resolve_entities=False))
>
> But this seems to have no effect.
>
> There a way to tell lxml to ignore these/leave them as is?
>
> Thanks.
>
> -s
More information about the XML-SIG
mailing list