XML overuse? (was Re: Python to XML to Python conversion)

Tue Jul 16 09:18:30 EDT 2002

Huaiyu Zhu wrote:
> holger krekel <pyth at devel.trillke.net> wrote:
> >Huaiyu Zhu wrote:
> >> Readability for machines does not have to come at the expense of readability
> >> for humans.  A few years back I experimented with an indentation based data
> >> format that is:
> >> 
> >> - as readable as emacs's outline mode
> >> - reduce to common conventions like this paragraph for simple cases
> >> - allow mixed nested structures of set, sequence, dictionary, and seqdict
> >> - can include binary data 
> >> - can handle different encodings/encryptions in different elements
> >> - with average less than 5% bloat, in contrast to XML's over 100% bloat
> >
> >do you have any code or design documents for this?  
> >
> >Sounds quite interesting.
> 
> The basic idea is quite simple: consider a data structure as a tree; denote
> the type of branching at each node; indent the subtrees.  It appears to me
> that indentation is easier to handle than quotes and escapes.  Here's a
> simple example:
>
> ...snipped...
>
> OK, hope this makes sense.

It does and it's very interesting.  It does sound a lot like 
http://yaml.org to me, though  (They even have an RFC).
Don't you think YAML might be a superset of your ideas?

Let me add some random thoughts/questions about your/yaml's scheme 
(i hope i am not missing something obvious):

- how is a binary data-stream's size determined? What about
  open-ended streams?  Embedding of arbitrary data-streams
  is very useful (IMO).

- somehow your and yaml's scheme remind me of todays wiki techniques.  
  E.g. Wikis have methods of sequence-detection (bullets ...) and they
  have a commitment to readability. Of course, they are generally more 
  concerned with graphical views than with beeing a concise persistence scheme.  

- Is there a canonical conversion between XML and your scheme/YAML?
  Shouldn't be too hard, anyway...

- how do you express external addresses akin XPATH? 
  Ideas:
    - Mappings are easy, just take the 'key'. 
    - Sequences are easy (take the sequence number) but not very robust
      to deletions and insertions of items.
    - tag-names (IDs) which can be associated with any item might be interesting.
      readability is likely to suffer, probably.

btw, I wonder whether some form of your and/or YAML's ideas should play a
role in the new persistence-SIG.  While the actual persistence mappings 
are not in the focus there are certainly some interesting connections 
between the two areas.

>  If this is still interesting I'll dig the thing
> out.  I have documents and code (perl and python) at home, but I'll have to 
> ...

this sure is useful. Especially for me since i work with a (perl-) 
friend on a project which needs to address the persistence-question. And
we want to have it interoperable, simple and fast.  I guess looking
at YAML might avoid that you have to dig too much into old harddisks :-)

    holger