[XML-SIG] elementtree and uncomplete parsing

Jean-Marc Chourot jeanmarc.chourot at free.fr
Sat Jun 21 10:23:08 CEST 2008


> Hi,
> 
> jeanmarc.chourot at free.fr wrote:
> > <node>
> > This text <thistag> is completely crap </thistag> because <anothertag> blabla
> > </anothertag>
> > </node>
> > <node>
> > This is another <thisnotag> node </thisnotag> with <anothertaggy> random tags
> > </anothertaggy>
> > </node>
> > 
> > I would like to retrieve what is between the tags <node> ...</node> into
> > strings, the "subelements" being considered as simple string and not processed
> > by elelement tree.
> 
> You are trying to make an XML parser not parse XML, that's bound to fail.
> 
> 
> > In other words, this could be badly formed HTML  not processed embeded into
> > well formed xml tags.
> 
> If you really have something like "embedded HTML", it must be escaped in your
> data to be parsable. There is no way an XML parser can return what you want
> without modifying your 'data' (at least loosing whitespace etc.).
> 
> I think the easiest option (if you have it) is to talk to the idiots who sent
> you the data and have them fix it.
> 
> Stefan
> 
Thanks for you help, 
The real problem is not about "badly formed HTML" : each node will
correspond to a leaf of a wx.TreeCtrl and the data associated to the
leaf will be the content of a wx.RichTextCtrl. When saving the whole
tree content in one file, I want to be able to get the structure of the
tree and relocate the data to each leaf and definitely not touch the
content which is parse the wxrichTxtCtrl. 
I was hoping Elementtree could help with this.. but maybe I am wrong and
should think of a simplier system of tags in the text.




More information about the XML-SIG mailing list