[XML-SIG] Anything else to go in?

Ken MacLeod ken@bitsko.slc.ut.us
Thu, 15 Oct 1998 13:40:56 -0500 (CDT)


> 	* A module to marshal simple Python data types into XML.
> There's still no obvious DTD to choose for this, though; I'm starting
> to think that I should drop xml/marshal.py, and wait until version 1.1
> of the package to add this; perhaps by then one DTD will have emerged
> as the standard for doing this.

I've just completed a draft DTD for Casbah/LDO XML serialization (below).
This DTD can be targeted, so you can be as specific about Python types
as pickle, or interoperable as with Casbah/LDO.

Note specifically that any object can be used as a key in a dictionary,
as Python (and SmallTalk, for example) support.  Most other DTDs I've
seen only support string keys.

As an implementation note, LDO's Python binary serialization uses pickle's
`dump' and `load' methods, it also can act as a stream-head so it supports
`flush' as well.  The source is in CVS at:

  CVSROOT=:pserver:anonymous@ntlug.org:/home/cbsrc/cvsroot
  password: anonymous
  module:   LDO

or viewable at <http://www.ntlug.org/cgi-bin/cvsweb/>.

This is meant to be an open spec, please feel free to comment on it
and make suggestions, either here, on the Casbah list, or
to me.

Thanks,

  -- Ken

-------- cut here --------
<!-- ...................................................................... -->
<!-- Lightweight Distributed Objects XML Serialization DTD V0.1 ........... -->
<!-- File ldo-xml.dtd ..................................................... -->
<!-- $Id: ldo-xml.dtd,v 1.1 1998/10/15 18:46:51 kmacleod Exp $ -->

<!-- Copyright 1998 The Casbah Project
     <http://www.ntlug.org/casbah/>

     Please direct all questions, bug reports, or suggestions for
     changes to the casbah@ntlug.org mailing list or to the maintainer:

     o Ken MacLeod
       <ken@bitsko.slc.ut.us>

-->

<!-- ...................................................................... -->

<!-- This DTD defines an object serialization format for use by
     messaging, remote procedure, distributed object system protocols,
     and language or application data marshaling needs.

     This DTD is intended to be minimal, flexible, reusable, and
     targetable.  Most applications will want to further specify how
     internal objects are represented as types or languages will want
     to specify how language-specific features are encoded.

     One application that may be used to further specify
     representation is the Lightweight Distributed Objects (LDO)
     Request Encoding as Objects specification available at the Casbah
     LDO web page:

         <http://bitsko.slc.ut.us/~ken/casbah/ldo/>  (for now)
         <http://www.ntlug.org/casbah/ldo/>          (soon)

     XML Serialization provides four elements for encoding objects, a
     `dictionary', a `list', a `value', and a `ref' element.
     `dictionary', `list', and `scalar' elements support reusing
     content with an `id' attributed that can be referred to using the
     `ref' attribute of the `ref' element.  `dictionary', `list' and
     `value' elements support a `type' attribute to declare the type
     or class of the object.  `value' elements have an `encoding'
     attribute to declare it's encoding (currently either `base64' or
     unspecified).  `dictionary' elements with a `class' attribute
     marks the dictionary as an object, with the keys as field or
     property names.

     Untyped dictionaries are unordered and may be keyed by any item
     and contain any item as values.

     Untyped lists are ordered sequences of any items.

     Untyped values are 8-bit strings.  Strings that contain
     characters that are not valid XML characters should be encoded
     using MIME BASE64 and the `encoding' attribute should be set to
     `base64'.

     Note also that Tim Bray is promoting the use of an
     `xml:packed="base64"' attribute for generic use.

     TBD: Extended attributes are allowed, with or without XML
     namespaces.  XML namespaces would naturally avoid name space
     conflict though :-).

     TBD: Element-level extension hasn't been evaluated yet, but we
     would like to support it.

     TBD: `dictionary', `list', and `value' elements support a
     `length' attribute that gives the number of pairs in a
     dictionary, the number of elements in a list, or the parsed or
     stored length of the data in a `value'.

     TBD: I'm not sure `value' is the perfect name to convey what it
     contains.  Alternatives are `data', `datum', `scalar', or
     `primitive'.

     TBD: In some cases, it will be desirable to use references
     (`ref') simply for compression (reuse of serialized data, such as
     dictionary keys) as well as for marshaling objects that are
     multiply-referenced.  The distinction is not clarified here.  One
     solution is that applications will use `ref' for simple data
     reuse and use an application defined object (via a dictionary) to
     store multiple references to an object.

     An example serialization of the following value:

         record = ( month: 'April', day: 5, year: 1997 ) 
         encode(record, "a day in the life") 

     would be:

         <?xml version="1.0"?>
         <!DOCTYPE list
           PUBLIC "-//The Casbah Project//DTD LDO XML Serialization V1.0//EN"
                  "ldo-xml.dtd">
         <list> 
           <dictionary> 
             <value>month</value><value>April</value> 
             <value>day</value><value>5</value> 
             <value>year</value><value>1997</value> 
           </dictionary> 
           <value>a day in the life</value> 
         </list> 


-->

<!-- ...................................................................... -->

<!ENTITY % item        "(dictionary | list | value | ref)">

<!ELEMENT dictionary   (%item;, %item;)* > 
<!ATTLIST dictionary
        type           CDATA           #IMPLIED
        length         CDATA           #IMPLIED
        id             ID              #IMPLIED
>

<!ELEMENT list         (%item;)*         > 
<!ATTLIST list
        type           CDATA           #IMPLIED
        length         CDATA           #IMPLIED
        id             ID              #IMPLIED
>

<!ELEMENT value        (#PCDATA)         > 
<!ATTLIST value
        type           CDATA           #IMPLIED
        length         CDATA           #IMPLIED
        id             ID              #IMPLIED
        encoding       CDATA           #IMPLIED
>

<!ELEMENT ref          EMPTY             > 
<!ATTLIST ref
        ref            IDREF           #REQUIRED
>