XML

Sat Jun 21 13:22:49 EDT 2003

On Sat, 21 Jun 2003, Alan Kennedy wrote:

>Alan Kennedy wrote:
>
>>>Of course the point remains that XML is hugely resource inefficient for
>>>many problems. But maybe in 5 years we'll have so many terahertz and
>>>gigabytes and megabits-per-sec that we won't care.
>
>Roman Suzi wrote:
>
>> Well, when XML was entering the arena, it promised to _lessen_
>> the burden on telecommunications... Where is that claim now?
>
>I've been "following" XML since seeing Jon Bosak lecture on the subject in May
>1997. I've been subscribed, periodically, to xml-dev and xsl-list, and I don't
>think I've ever heard, read or seen anyone seriously claim that XML would lessen
>the bandwidth used for communicating information. Is that what you meant? 

Yes. Because XML could be received once (as a recordset) and then processed
locally, with further subqueries etc.

>And the same argument could be made about HTML: it is just as resource
>inefficient as XML, perhaps more so. So why is HTML the most widely used
>document format in human history?

Because Microsoft made Outlook Express use HTML for letters by default (and
Netscape too). Oh, yes: spammers like HTML a lot. I do not have figures ready,
but 95 percent emails addressed to me which use HTML are spams.

>If any claims have been made about communications, then I think they would be
>more likely to be made in relation to how much easier it would be to *process*
>XML, compared something like EDI, for example.

Yes, I believe BASIC interpreter is much easier than Ada's one...

>> And one more point: processing of XML is much less
>> reliable than that of ASCII or even simple Python literals.
>
>I have to disagree with that statement. I don't think that a character encoding
>can be compared to XML: for example, XML can be stored in ascii. "Processing"
>ascii is easy, but extracting semantics from an ascii byte stream is harder than
>extracting semantics from XML, i.e. deciding on your tokens, lexing them,
>writing a grammar, translating that grammar to running code that builds data
>structures, etc, etc. Processing simple strings like "2+(2*2)" is easy: what
>about "I would like a deep pan 12-inch pizza with bacon and banana, two portions
>of garlic bread and 2 litres of sparkling mineral water delivered to my house
>please".

He we came to the whole point of having XML. If ASCII text can be compared
with string or list of strings, then XML is like a dict (probably, with
subdicts).

But still I can't imagine that having structured the above sentence will
help to make an order to robotized pizza delivery! I will always fear that
my garlic bread will swim in a mineral water.

XML is pure formal language and its applications are sometimes overdesigned.

For example, the above mentioned 2+(2*2) could be as simple as

>>> 2+(2*2)
6

or very complex, requiring put everything into XML and applying XSLT to it to
get the answer. BTW, I have almost the same pessimism toward OOrientation.
Some time ago I came across a program (I shall not name it here because it's
quite a useful program and I do not want to scare it's author) which was very
object oriented. I needed to make changes to it because I wanted one of it's
functions to be dumbed down and be faster. I managed to do it dropping many
classes. Some time ago I needed another version to output some parameter
interesting to me. Luckily, the program runs only once per file, so after
tracing class structure to understand where can I return needed value, I just
added a global variable which I fill at one location and output in the other
location! And now I want to have a dumbed down version of another function of
the program: oh no... I better rewrite it in simple structural/modular way: it
will be much shorter, much easier to "refacture".

Again, is it me who refuses to see how wonderful and intuitive OO technology
was to make my life happier? Or is it misuse of technology?

Another example is about Linux. Slackware had simple and strightforward init
script(s). Now, Look at RedHat initscripts. No wonder it takes so much time to
boot RedHat Linux... And if I needed to have, say, some network added in
Slackware, I put ifconfig into place where I needed it. In RedHat I am at loss
and I put it to rc.local (which get executed last).

I know why all that fancy was needed (to allow separation of services and
easier automated add/remove), but what they did is IMHO overkill...

So, in my opinion, XML is far from KISS principle.

>And can you point me to code/programs/utilities to parse python literals in
>Java, VB, lisp, smalltalk, javascript, ocaml, etc, etc? Character encoded python
>literals don't travel as well as XML, as of today anyway.
>
>And processing XML is completely reliable, as long as your XML tools library is
>compliant with standards. 

>Non-compliant tools don't last long, because people
>stop using them.
>
>Now, it is definitely the case that there is a lot of "XML" out there that isn't
>really XML at all: it's XML structures with broken HTML embedded inside. 

This means they are using broken tools to generate such XML... XML is fragile.
Much more fragile than ASCII text. Yes, XML serves more complex tasks,
but does it really need to be so unreliable?

>> The reliablity is what worries me more. It's too easy to get into some
>> trap (I remember talks here or in XML-SIG when it was discussed
>> what is better None or "" for representing namespace).
>
>But the latter is a minor, python-specific, concern, and one that was
>satisfactorily resolved relatively quickly, by the excellent PyXML people
>(thanks Martin von L et al :-). Once that empty namespace gets serialised into
>an XML byte stream, and then further processed in another language, the semantic
>travels seamlessly to other languages. XML travels generally much more reliably,
>across the wide range of available platforms, than any other non-trivial
>data/document representation.

It may travel, but I do not think this helps. If robotized pizza
delivery doesn't have a recipe for sparkling mineral water, it will
give ParsingError. Because you thought it has it and in reality
it doesn't. So what is the advantage to have flexible transport
if target isn't that flexible (semantically, of course)?
If my browser do not understand blink, what it is supposed to do
instead? Drop the whole text? Do not blink it? Give a warning?
Segmentation fault?

>> So far all arguments pro-XML in this thread are like "XML is good
>> because X, M and L are already here" (be it SGML, javascript, Java,
>> developers expertise or whatever). But I wonder if there are pure
>> technical merits of XML itself apart
>> from it being involved. 
>
>Well, I suppose where you and I differ is that I believe that something being
>easy for ordinary people to understand and work with *is* a technical merit.

Hmmm... XML document is a tree with marked nodes and ordered vertices.
But is XML a best markup format for representing a tree in a plain text?
Yes, we are all accustomed to it so we think it's its technical merit.

>I think that if we are to progress further, you would have to define what you
>mean by "technical merit".

Technical merits could be different:
1- simplicity of processing 
2- easiness for "ordinary people to understand and work with"
3- flexibility and power of expression

>> XML is not well-based scientifically (like RDBMS)
>
>Can you give an example of something comparable to XML that is "well-based
>scientifically"? I don't understand what you mean.

Relational databases are based on well-defined set theory. Contemporary
digital computers and circuitry are based on mathematical logic. Programming
languages - lambda calculus, Turing machines - also well known objects.  
Regular expressions correspond to grammars and to finite automata (also
criss-cross researched things).

But what about XML? Where can I find an algebra of XML? Or a "truth table" of
XSLT? The same problem I have with OOP. What well-established math theory is
behind OOP? Gurus' opinions? Maybe XML and OOP aren't science at all
but special kind of literature. The first one soap opera and the second...
science fiction.

>> XML for ? is like CSV for a RDBMS
>> 
>> However, I have no idea what the question stands for...
>
>RDBMS is a very broad term. Try comparing something like SQLLite or MS Access on
>a LAN with a group of large multi-processor, multi-machine, data-replicating,
>distributed (across time zones) transaction processing Sybase servers, for
>example.
>
>XML database technology has a long way to go before it challenges RDBMS as a
>reliable, fast and powerful storage mechanism for information. But there are
>products, projects and languages out there. Compared to your statement above,
>perhaps statements like these could be made
>
> o XML is to a DOM as CSV is to an array
> o XML is to Apache Xindice as CSV is to a MySQL server.
> o XML is to an XML-based content server as CSV is to an RDBMS + a raft of
>content->relational table mappings.
> o Xpath/XQuery/XPointer is to an XML repository as SQL is to an RDBMS
>
>For me, the most important aspect of XML is as a social phenomenon, not a
>technical one.

That is why I wrote the initial message: because I want to understand the
society of XMLists and learn. 

Sincerely yours, Roman Suzi
-- 
rnd at onego.ru =\= My AI powered by GNU/Linux RedHat 7.3