XML

Mel Wilson mwilson at the-wire.com
Thu Jun 26 12:50:40 EDT 2003


In article <slrnbfgddn.1sv.bignose-hates-spam at rose.localdomain.fake>,
Ben Finney <bignose-hates-spam at and-zip-does-too.com.au> wrote:
>On Mon, 23 Jun 2003 20:49:16 +0400 (MSD), Roman Suzi wrote:
>> OK. When I was talking about plain text, I had in mind that it has
>> some proprietary format. For example, I can easily write:
>> ------------------
>> foo = "123"
>> bar = "456"
>> zoo = "la\"lala"
>> ------------------
>> And it's not very hard to parse that.
>> In case of XML I will need something like
>>
>><?xml version="1.0"?>
>><foo>123</foo><bar>456</bar><zoo>la"lala</zoo>
                                     ^
No need to escape the quote character here.  That was
only necessary in the symbol=string protocol.  A
symbol=stuff\n protocol would be simpler still.

As a punch-card, what you have is

123456la"lala

> [ ... ]                    Hope I get the point across -- XML formats
>don't give any particular advantage over simple, record-based flat ASCII
>data; they do give advantage once the structure needs to become more
>complex.

   The true value, maybe the only value of any communication
protocol is that others can interpret it.  If OP fully
documented the symbol=string protocol (indicating for
instance, that the \ didn't really mean \, etc.) and
disseminated the documentation to all users, and
especially if the protocol were required to handle something
a bit more complicated than in the sample above, OP might
be glad of XML.

   And there isn't much to XML.  There's a first line of
version information, of the sort all good protocol designers
define, after their second or third protocols.  Then there's
the XML object, of the form

        <bracket metadata> contents </bracket>

where you change the text "bracket" to denote the kind of
object it is. metadata is an optional blank-separated list
of symbol=string elements such as appeared earlier in this
post. contents is any stuff at all, even including another
XML object.  A rule is required in contents to escape '<'
and '>' so as not to inadverntently make contents that looks
like XML markup.

   Add a shortcut <bracket metatdata /> for objects that
never have contents.

   Round it out with some busy work that describes to
automated parsers what things are valid and invalid.  Say
what you think the character encodings mean.

   Many people seem to be confused about XML because there
is less there than they think there should be.

        Regards.        Mel.




More information about the Python-list mailing list