XML DTD for Python source?

Neel Krishnaswami neelk at brick.cswv.com
Wed Mar 1 19:45:29 EST 2000


Greg Wilson <gvwilson at nevex.com> wrote:
>
> Has anyone defined an XML DTD for storing Python source code?  If
> so, I'd be grateful for a copy or pointer.  If not, I'd like to hear
> from anyone who'd be interested...

Can I ask what the purpose of this utility would be?

If you just want to store programs in a well-structured form, then
I'll point out that a Python script that the interpreter doesn't barf
on is *already* in a well-structured form. If you want to do program
analysis, then I'd recommend using John Aycock's SPARK parsing
framework to get a nice and pretty syntax tree to work with. (If you
don't care about pretty, you can use the AST module in the standard
distribution.) If you want to do metaprogramming, then use SPARK and
write something like HTMLgen that emits Python code.

This seems to cover all the possible uses of XML; am I just being
dense? XML is a way of defining external representations (the marked
text) plus a nice definition of an interface to internal
representations (the DOM). Do you really like the interface to the DOM
so much that you want to use it to write Python-writing programs with?
(Not a rhetorical question: I haven't it used enough to say.)

The weirdness of this question becomes more apparent if you ask the
same question about Scheme: what would an XML representation for
Scheme programs look like?

(define
  (lambda (n)
    (if (= n 0)
        1
        (* n (fact (- n 1))))))

This would become something like

<paren>
  <symbol value="define"/> 
  <symbol value="fact"/>
  <paren>
    <symbol value="lambda"/> 
    <paren>
      <symbol value="n"/>
    </paren>
    <paren>
      <symbol value="if"/> 
      <paren>
        <symbol value="n"/> 
        <symbol value="="/> 
        <literal representation="0"/>
      </paren>
      <literal representation="1"/>
      <paren>
        <symbol value="*"/> 
        <symbol value="n"/> 
        <paren>
          <symbol value="fact"/>
          <paren>
            <symbol value="-"/> 
            <symbol value="n"/>
            <literal representation="1"/>
          </paren>
        </paren>
      </paren>
    </paren>
  </paren>
</paren>

It's obvious that you don't get any additional structuring information
to work with. This fact is transparent in the case of Scheme, but no
less true for Python.

Wait -- I just thought of a use of XML in this case. Do you have tools
that accept only XML, and you want to XML-ize Python code so that it
can eat it? In that case, it's probably straightforward to take SPARK
and define parser actions that emit XML tags instead of source
text. You can define a DTD straight from the Python grammar; each
production becomes an element declaration in the DTD.

Also, please let me know if I've gone completely off-base. This 
question strikes me as odd enough that there's a good chance I've
completely misunderstood you.


Neel



More information about the Python-list mailing list