[XML-SIG] Big Bug? (was:Pretty-printing DOM trees)

Christian Tismer tismer@appliedbiometrics.com
Sun, 24 Jan 1999 14:22:55 +0100


Greg Stein wrote:
[but the file was from my course, and I'm correcting their homework]

> Push back against where the file came from. What if somebody sent you a
> bad executable? Do you try to correct it? What if they send a bad MSFT
> Word file? Do you try to correct it? Makefiles with spaces instead of
> tabs? crontab files with a missing column? etc. etc.

:-) Of course, I usually don't correct them. No exes.
Word files: Sometimes, if they come to me, whining about
their single copy of a Word file which is broke. I can give
them the plain text back in most cases, and this is ok.

> Well, the same for XML. If it is bad, then you ask for a correct one. Why
> should XML be any different than the multitude of documents that you deal
> with every day?

I'd say, since XML is not binary but very redundant ascii which
I can read, and also most often understand and correct by hand,
it is not so simple. You could also throw a faulty C program
away since ti is no proper C. Instead, I correct it.
Well, this was a bit far off, but somewhere between is the truth.

...
> By default, it should not correct it. That simply continues to encourage
> poor XML authoring. As a programmer, if you want to try to auto-correct,
> then okay, but I would not recommend it.

150% agreed.

[correcting parser]
> No. No. No. No....
> 
> HTML is a huge mess because people started writing parsers that were
> flexible and would correct things for you. Go try to write an HTML parser
> that works against all the stuff out on the Internet. It is frightening
> how difficult that is. There is just so much crap out there because people
> said, "well, we can just correct that for them." Mismatched tags. Missing
> quotes. Illegal characters. Missing close brackets. Simply crap.

Yes, I also don't want this again. You are right.

> With XML, the designers said, "No way. The document has to be correct, or
> it gets rejected. Tough shit for the authors of bad documents."
> 
> Yes, I'm rather fascist on this one :-). I simply cannot condone or
> recommend *any* allowance of flexibility in parsers. That will just lead
> us back to the horrible situation that we are in now with HTML.

Ok, let me name it different since my thought was different.

I don't want bad XML to be corrected automatically.
Instead, when it is rejected, I thought of generating a
different document, say an "error document" which gives
a description of the errors. This is a new (well-formed:)
XML document which wraps the source, inserts comments
or anything where the parsing broke, leaves correct
passages intact so far, but of course does not try
to produce correct XML from wrong XML. I'd apply this
tool to a file after I know it is wrong, for debuging
purposes. A little like a compiler listing.
Maybe it would suffice to escape the wrong parts and add the
XML error code and message to the error doc.

This was my reason to write the little indenter - debugging.

Thanks for your commitment, we're on the same side - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.skyport.net
10553 Berlin                 :     PGP key -> http://pgp.ai.mit.edu/
     we're tired of banana software - shipped green, ripens at home