[Expat-discuss] & symbol workaround

Enrico Weigelt weigelt at metux.de
Mon Feb 23 20:55:41 CET 2009


* Brad Causey <bradcausey at gmail.com> wrote:
> All,
> 
> It sounds like the consensus is that I need to mod the incoming badly 
> formatted xml. This is my solution, and it worked for what I needed it for:
> 
>    fileo = open(i,'r')
>    file = open('buffer.xml','w')
>    unfixml = fileo.read()
>    fixml = string.replace(unfixml,'&',' ')
      ^^^^^^^

This will make trouble if you get some escaped symbol (eg. &amp;).
So, you'll have to find the &'s, check what comes after and then 
decide whether to fixup or let it pass.

BTW: is there any way for hooking into the parser (some callback) 
to catch those errors and then continue parsing ?
That would allow building an auto-fixing parser, especially for
cases like Brad's.


cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service - http://www.metux.de/
---------------------------------------------------------------------
 Please visit the OpenSource QM Taskforce:
 	http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
	http://patches.metux.de/
---------------------------------------------------------------------


More information about the Expat-discuss mailing list