XML overuse? (was Re: Python to XML to Python conversion)

Christopher Browne cbbrowne at acm.org
Sat Jul 13 12:25:57 EDT 2002


In the last exciting episode, pinard at iro.umontreal.ca (François Pinard) wrote:
> [Jonathan Hogg]
>> I really am willing to eat humble pie here and admit that I'm
>> mistaken if someone can give me a similar list of good reasons to
>> *not* use XML for off-line hierarchically structured data.
>
> Any file is a hierarchy of some sort.  We often see a file being a
> sequence of lines, a line being a sequence of fields or tokens, and
> tokens being a sequence of characters.  In many, many, really many
> applications, this organisation in lines and fields is wholly
> satisfactory.  Reusing the enumeration above, it is easy to parse,
> easy to validate, easy to edit, easy to query, easy to transform and
> easy to store.  Let's be honest.  People are comfortable with lines
> and fields, examples and tools merely _abound_.
>
> XML becomes more sensible when you have a _lot_ of structure,
> something which is complex, difficult, and which you have to
> exchange with away parties.  For simple things, it is just annoying
> and heavy overkill, really...

Heavens, that is an _excellent_ description of what's going on.

I think that also nicely describes the way that trees tend to get
nasty in SQL, too.  

In fact, it more than likely characterizes why "object oriented
databases" are a controversial matter.

> Speaking for my own situation only, as a Python lover, XML is gross
> overkill even for quite complex things.  It is extremely simple to
> pickle rather complex structures, transmit them over wires to
> applications on other machines, and unpickle them there.  Using
> Python as an API for such usages is natural and very comfortable,
> and not to say, immensely faster than XML.

SOAP is a nice example of something that _sounds_ good, but whose
implementation turns out to be a lot uglier than you'd ideally want.

There's little point to it if you're trying to pass around
parameters/results looking like:
  P = [1, 4, 7, 27, 12341, "foo"]

Many simpler and greatly more efficient marshalling schemes are
available there.

The place where it's more interesting are when you've got an XML
message looking like:

<hostlist>
  <host>
   <ip> 1.2.3.4 </ip>
   <mainname> foo.bar.com </mainname>
   <anothername> foo </mainname>
   <anothername> bar.com </mainname>
   <anothername> cache.bar.com </mainname>
  </host>
  <host>
   <ip> 1.2.3.5 </ip>
   <anothername> frobozz </mainname>
   <anothername> mail </mainname>
   <mainname> frobozz.bar.com </mainname>
  </host>
</hostlist>

or 
<contactlist>
 <entry> <surname> Pinard </surname> <firstname> Francois </firstname>
 </entry>
 <entry> <company> IBM Inc </company> <url> http://www.ibm.com/ </url>
 </entry>
 <entry> <company> Transmeta Inc </company> <surname> Torvalds
   </surname> <firstname> Linus </firstname> </entry>
</contactlist>

These are simple enough examples; the _problem_ is that the most
typical sort of SOAP handling is for these to respectively translate
into something like:

H = [ ["1.2.3.4", "foo.bar.com", "foo", "bar.com", "cache.bar.com"],
      ["1.2.3.5", "frobozz", "mail", "frobozz.bar.com"] ]

 and

C = [[ "Pinard", "Francois"],
     ["IBM Inc", "http://www.ibm.com/"],
     ["Transmeta Inc", "Torvalds", "Linus"]]

Which are in a sense convenient enough ways to express the
information, however you're left puzzling over what the actual
intended structure is.

Perl's SOAP::Lite actually expresses the results of SOAP queries as
the full trees of elements and attributes, allowing you to walk the
tree.

Unfortunately, when it proves necessary to write a program to walk the
tree, that shows that the "S" for "Simple" part just got more than a
tad less "simple."

The Python SOAP bindings don't handle this terribly wonderfully. 

>> Perhaps I'm missing something blindingly obvious here, but what
>> benefits would I gain from coming up with my own format?
>
> For simple things?  Ease, speed, simplicity, readability.  Don't
> fear it.  The world will survive, you know, even if you sometimes
> don't use XML. :-)

I think the widespread lemming-rush to try to get _everything_ mapped
onto XML-based formats is a demonstration that a whole lot of people
have never even _looked at_ Lex and Yacc.

Not everything in this world should require writing your own recursive
descent parser.  But I rather suspect that there are a lot of
situations where the programming of parsing tasks might be handled
with less code, less debugging, and less overall effort (mental and
chronological) by building a Lex-based parser than is required to
integrate an XML library into an application and then add the hooks to
provide semantics for what it parses.
-- 
(reverse (concatenate 'string "moc.enworbbc@" "enworbbc"))
http://cbbrowne.com/info/linux.html
MICROS~1 has  brought the  microcomputer OS to  the point where  it is
more bloated than even OSes from what was previously larger classes of
machines   altogether.   This  is   perhaps  Bill's   single  greatest
accomplishment.



More information about the Python-list mailing list