Limited XML tidy

Toby White tow21 at cam.ac.uk
Fri Aug 26 05:22:35 EDT 2005


uche.ogbuji at gmail.com writes:

>> The problem is that when the sax handler raises an exception,
> I can't see how to find out why. What I want to do is for
> DodgyErrorHandler to do something different depending on
> where we are in the course of parsing. Is there anyway
> to get that information back from xml.sax (or indeed from
> any other sax handler?)
>
> You can get raw location information, yes.  See:
>
> http://www.xml.com/pub/a/2004/11/24/py-xml.html
>
> But I don't think this is enough for you.  You also need recovery,
> which you're implementing in crude form.

(If you're referring to the Locator objects), yes I'm aware
that's possible. But what I want is not my location in the 
document, but for the parser to say "this is an error because
I am in the middle of a tag & the document ended", or "I
was in the middle of a text section and the document ended", or
"I was in the middle of an attribute value and the document
ended", etc, so that I can then construct a simple end to the
document, inserting quote marks, finishing the tag, and closing 
all unclosed tags as appropriate.

I have just realised that I might be able to grab the message
that the exception gives me, look at the expat source code 
and work out what parsing events cause which error messages.
Which is a bit round the houses, but I think ought to work.


> I tend to agree with Magnus that using an SGML parser might be your
> best bet.  You might even be able to turn that SGML into XML using a
> tool such as James Clark's SX:
>
> http://www.jclark.com/sp/sx.htm

If I can't get my scheme above to work, I'll have a go. But I was
hoping to do this without requiring additional packages. And in
any case, it doesn't need to be perfectly robust. As long as it
handles 99% of cases, I'll be happy.

-- 
Dr. Toby White 
Dept. of Earth Sciences, Downing Street, Cambridge CB2 3EQ. UK
Email: <tow21 at cam.ac.uk>



More information about the Python-list mailing list