DOM as a flat dictionary

Paul Boddie paul at boddie.net
Tue Jul 29 07:58:34 EDT 2003


Neil Padgen <neil.padgen at mon.bbc.co.uk> wrote in message news:<bg3gf3$msl$1 at nntp0.reith.bbc.co.uk>...
> On Friday 25 July 2003 13:38, don't dash! wrote:
> 
> > My thought was to generate a flat dictionary representation of the
> > foreign and local formats with the absolute Xpath expression as
> > dictionary key.
> 
> You'll lose any ordering of the elements.

Not if you employ position indicators, as defined in XPath. Of course,
this might not make the "flat" descriptors very readable, but there
are other solutions.

> With the XML
> 
> <spam>
>   <eggs/>
>   <bacon/>
>   <lobster_thermidor accompaniment="crevettes" sauce="mornay"
>                      topping="fried_egg">
>     <more_spam/>
>   <lobster_thermidor>

Editing this to be a closing tag, of course...

> </spam>
> 
> translated into a flat dictionary
> 
> {
>   '/spam': True,
>   '/spam/eggs': True,
>   '/spam/bacon': True,
>   '/spam/lobster_thermidor': True,
>   '/spam/lobster_thermidor/more_spam': True,
> }
> 
> there is no way that you can tell whether /spam/eggs comes before
> /spam/bacon in the original XML.

You could employ something like this:

  /spam/*[1] -> refers to "eggs"
  /spam/*[2] -> refers to "bacon"

This isn't nice to read, as I noted above, and in practice it would
also rely on you having some kind of schema information for you to
know in advance which kind of element was being referred to. For the
desired application, I doubt that this is acceptable.

You could make things more complicated:

  /spam/eggs[position() = 1] -> refers to "eggs" but only as the
                                first element in the sequence
  /spam/bacon[position() = 2] -> refers to "bacon" but only as the
                                 second element in the sequence

This does indicate which element is being referred to and where that
element resides in the sequence of elements. The reconstruction of a
document from this information could be easy enough to achieve,
although the parsing of the conditional part is slightly more
complicated than other (non-XPath) notations.

You could invent a simplified (non-XPath) notation:

  /spam/eggs:1 -> refers to "eggs" but only appearing first
  /spam/bacon:2 -> refers to "bacon" but only appearing second

In the past, I've adopted such notations myself in order to represent
hierarchies in rendered HTML forms. There can be alternative
interpretations of the position numbers, however, since if you have a
schema to work from then you could decide to interpret the numbers as
the position of a particular element amongst elements of only that
kind, comparable to the following XPath expressions:

  /spam/eggs[1] -> refers to the first "eggs" element
  /spam/bacon[1] -> refers to the first "bacon" element
                    (not giving any information about the relative
                     ordering of different elements)

There are plenty of alternatives, so I hope one of them is useful. :-)

Paul




More information about the Python-list mailing list