XML

Alan Kennedy alanmk at hotmail.com
Sat Jun 21 10:54:50 EDT 2003


Alan Kennedy wrote:

>>Of course the point remains that XML is hugely resource inefficient for
>>many problems. But maybe in 5 years we'll have so many terahertz and
>>gigabytes and megabits-per-sec that we won't care.

Roman Suzi wrote:

> Well, when XML was entering the arena, it promised to _lessen_
> the burden on telecommunications... Where is that claim now?

I've been "following" XML since seeing Jon Bosak lecture on the subject in May
1997. I've been subscribed, periodically, to xml-dev and xsl-list, and I don't
think I've ever heard, read or seen anyone seriously claim that XML would lessen
the bandwidth used for communicating information. Is that what you meant? 

And the same argument could be made about HTML: it is just as resource
inefficient as XML, perhaps more so. So why is HTML the most widely used
document format in human history?

If any claims have been made about communications, then I think they would be
more likely to be made in relation to how much easier it would be to *process*
XML, compared something like EDI, for example.

> And one more point: processing of XML is much less
> reliable than that of ASCII or even simple Python literals.

I have to disagree with that statement. I don't think that a character encoding
can be compared to XML: for example, XML can be stored in ascii. "Processing"
ascii is easy, but extracting semantics from an ascii byte stream is harder than
extracting semantics from XML, i.e. deciding on your tokens, lexing them,
writing a grammar, translating that grammar to running code that builds data
structures, etc, etc. Processing simple strings like "2+(2*2)" is easy: what
about "I would like a deep pan 12-inch pizza with bacon and banana, two portions
of garlic bread and 2 litres of sparkling mineral water delivered to my house
please".

And can you point me to code/programs/utilities to parse python literals in
Java, VB, lisp, smalltalk, javascript, ocaml, etc, etc? Character encoded python
literals don't travel as well as XML, as of today anyway.

And processing XML is completely reliable, as long as your XML tools library is
compliant with standards. Non-compliant tools don't last long, because people
stop using them.

Now, it is definitely the case that there is a lot of "XML" out there that isn't
really XML at all: it's XML structures with broken HTML embedded inside. For
example, have you tried to process any RSS feeds lately (it's excruciatingly
painful)? But that's more of a social issue (people not being strict with their
RSS parsing) than a technical issue. If they stuck by the strict rules of XML,
their processing would be very easy, but they would lose 50% or more of the
information sources that are syntactically broken. 

As an aside, I wouldn't bother processing not well-formed RSS. If the publisher
can't be bothered to check for compliance of their published documents with a
simple standard like XML, then they probably haven't put much effort into
assuring the quality of their information either. But that's a social decision
for me, not a technical one:

try:
    parse(RSS)
except XMLIsNotWellFormedError:
    if not wantToSpend3MonthsWritingAParserForTheMorassThatIsRSS():
        bin(RSS)
        markOriginatingFeedAsRubbish()

> The reliablity is what worries me more. It's too easy to get into some
> trap (I remember talks here or in XML-SIG when it was discussed
> what is better None or "" for representing namespace).

But the latter is a minor, python-specific, concern, and one that was
satisfactorily resolved relatively quickly, by the excellent PyXML people
(thanks Martin von L et al :-). Once that empty namespace gets serialised into
an XML byte stream, and then further processed in another language, the semantic
travels seamlessly to other languages. XML travels generally much more reliably,
across the wide range of available platforms, than any other non-trivial
data/document representation.

> So far all arguments pro-XML in this thread are like "XML is good
> because X, M and L are already here" (be it SGML, javascript, Java,
> developers expertise or whatever). But I wonder if there are pure
> technical merits of XML itself apart
> from it being involved. 

Well, I suppose where you and I differ is that I believe that something being
easy for ordinary people to understand and work with *is* a technical merit.

I think that if we are to progress further, you would have to define what you
mean by "technical merit".

> XML is not well-based scientifically (like RDBMS)

Can you give an example of something comparable to XML that is "well-based
scientifically"? I don't understand what you mean.

> XML for ? is like CSV for a RDBMS
> 
> However, I have no idea what the question stands for...

RDBMS is a very broad term. Try comparing something like SQLLite or MS Access on
a LAN with a group of large multi-processor, multi-machine, data-replicating,
distributed (across time zones) transaction processing Sybase servers, for
example.

XML database technology has a long way to go before it challenges RDBMS as a
reliable, fast and powerful storage mechanism for information. But there are
products, projects and languages out there. Compared to your statement above,
perhaps statements like these could be made

 o XML is to a DOM as CSV is to an array
 o XML is to Apache Xindice as CSV is to a MySQL server.
 o XML is to an XML-based content server as CSV is to an RDBMS + a raft of
content->relational table mappings.
 o Xpath/XQuery/XPointer is to an XML repository as SQL is to an RDBMS

For me, the most important aspect of XML is as a social phenomenon, not a
technical one.

-- 
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan




More information about the Python-list mailing list