[XML-SIG] Re: SAX exceptions are odd

Jeremy Hylton jeremy@beopen.com
Thu, 5 Oct 2000 16:59:08 -0400 (EDT)


[Lars M. writes:]
>* Jeremy Hylton
>| 
>| If I call on parse on an empty file, I get no exception.  Is this
>| desirable?  I assume it means that "" is well-formed XML, but that
>| doesn't seem like a very helpful definition.  Is this right?
>
>No, it's not right.  You should get an error telling you that the
>document element is required.

Ok.  Then consider it a bug report :-).  Can you fix this and add a
test case to the test suite?
> 
>| If I get almost any other exception I get an error message that says
>| something like: "not well-formed at None:1:7"
>
>Expat is not very good at providing informative error messages, so I
>don't think you can expect much more.  If you want better error
>messages you should probably use xmlproc or xmllib.

I think the explanation part of the error message is okay, could be
better but not terrible.  The part that's confusing is the
formatting. 

>As for the None that should imply that you just gave the parser a
>string to parse and didn't provide it with a system identifier (ie:
>URL or file name).

How does it know when I pass it a string and when I pass it a system
identifier?  In Python, system identifiers are strings?!?  What if I
have a file called "<foo>" will it open that file or attempt to parse
it as a string?

>| Why is None being printed?  It gave me the initial impression that my
>| error was no setting up parse call correctly.  I assumed that the None
>| was the cause of the exception and that under normal circumstances it
>| would have said something like "not well-formed at foo.xml:1:7".
>
>If you told it that you were parsing from foo.xml it should definitely
>return that information in the error message.  Can you show us the
>exact call to parse?

I have a file foo in my current directory.  I fire up Python:
> ls -l foo
-rw-rw-r--   1 jeremy   admin           0 Oct  5 16:57 foo
c> python
Python 2.0b2 (#18, Oct  5 2000, 09:53:11) 
[GCC 2.95.2 19991024 (release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from xml.sax import parse, ContentHandler
>>> parse("foo", ContentHandler())
>>>

>| What is a system identifier and why should it be reported in an
>| exception when it is None?
>
>The system identifier is SGML-speak (and XML-speak) for the location
>of the document being parsed. I guess we could leave it out in the
>cases where it is None, if people prefer that. (I personally have no
>opinion on that.)

I personally prefer that.

> 
>| I also think the format is odd.  There are three different pieces of
>| information separated by colons.  I am accustomed to the notation
>| filename:line number, but not another colon for the cursor position.
>| It would have been clearer, I think, if the message were more
>| verbose and explained what each field was.
>
>How about this:
>
>  "Not well-formed in foo.xml at line %d, column %d."
>
>If you prefer that I'd be happy to change both that and the lost
>system identifier (if that is indeed the problem).

I would like this a lot better.  It will be appreciated by novice
programmers and whiners like me.

Jeremy