[XML-SIG] Error handling in PyExpat

Radestock, Guenter guenter.radestock@sap.com
Wed, 21 Mar 2001 16:57:23 +0100


> From: Martin v. Loewis [mailto:martin@loewis.home.cs.tu-berlin.de]
> Sent: Mittwoch, 21. M=E4rz 2001 14:59
> To: Radestock, Guenter
> Cc: XML-SIG@python.org; Faerber, Franz
> Subject: Re: [XML-SIG] Error handling in PyExpat
>=20
>=20
> > I am using PyExpat to parse XML files and sometimes these files are
> > not correct.  If I find an error in my handler (start_element,
> > end_element or characters), I raise an exception and abort
> > processing the XML file.  If I raise the exception my self in the
> > handler, parser.ErrorLineNumber (and other variables describing the
> > error position) are not available to my code (ErrorLineNumber
> > contains a random value); that is in the exception handler that
> > catches my exception.
>=20
> Yes, expat does not support user-identified error lines. However, it
> should be possible to propagate such information with the exception
> that you raise.

Sorry - I missed it somehow.  ErrorLineNumber gave me numbers outside=20
the document - probably because I called it only after parsing,
but ErrorByteIndex has the right value, at least before I raise the
exception.  The values will be incorrect in the exception handler
because the parser continues, I guess.  Probably the parsing will =
continue,
but my handlers will not be called anymore because PyExpat (not Expat
itself)
knows about the exception?


> > Unfortunately the (C level) handlers are void functions so there
> > must be another way to tell expat that processing has failed.
>=20
> I don't think so. This is C, so there is no means of exception
> handling. Once a callback is invoked, it is safe to assume that the
> XML in itself is correct. You have to let expat finish parsing before
> it returns to you (AFAIK).

OK so there is no way to stop Expat when things go south in the C level
handler (they could have defined handlers int instead of void and =
stopped
parsing when somebody returned -1 ...).
Seems PyExpat can't do any better this way.

Thanks a lot.

- Guenter

PS: if you would stop calling handlers after a handler has raised an
exception,
you could freeze ErrorLine, ErroColumn and ErrorByteIndex to the values =
they
had when the (Python) handler returned to you.  But it seems you don't =
stop
calling handlers.  Probably I should do something like this in my =
script.