From chrish at cryptocard.com  Mon Dec  1 13:51:07 2003
From: chrish at cryptocard.com (Chris Herborth)
Date: Mon Dec  1 13:49:38 2003
Subject: [XML-SIG] Provide your own SAX parser to the DOM?
Message-ID: <3FCB8D9B.9080900@cryptocard.com>

I've got PyXML 0.8.3 installed here, and I'm generating the DOM for some 
documents thusly:

reader = xml.dom.ext.reader.Sax2.Reader()

# snipped: setting up an external entity resolver and error handler

dom = reader.fromStream( file( an_xml_filename ) )

Is it possible to use a different SAX parser and still get the advantages of 
using the PyXML DOM goodness?  I'm thinking ahead to when I want to use a 
validating parser, although the xml.dom.ext.reader.Sax2.Reader() appears to 
already dig through my DTD...

The reason why I'm asking is because I'm using the resulting DOM to generate 
HTML 3.2 for JavaHelp.  My DTD uses XHTML 1.0 entities and, for the most 
part, I'd like to _not_ have the Sax2.Reader() translating the entities into 
their Unicode characters (I've referenced the XHTML 1.0 entities from my DTD)...

I want to be able to leave the entities in place and/or translate them into 
something myself.  For example, JavaHelp 2.0 implements (most of) the 
Latin-1 accented character entities, but almost none of the others, so I'll 
have to handle &trade; (for example) "by hand".

-- 
Chris Herborth                                     chrish@cryptocard.com
Documentation Overlord, CRYPTOCard Corp.      http://www.cryptocard.com/
Never send a monster to do the work of an evil scientist.


From dieter at handshake.de  Tue Dec  2 13:41:22 2003
From: dieter at handshake.de (Dieter Maurer)
Date: Tue Dec  2 14:45:45 2003
Subject: [XML-SIG] Provide your own SAX parser to the DOM?
In-Reply-To: <3FCB8D9B.9080900@cryptocard.com>
References: <3FCB8D9B.9080900@cryptocard.com>
Message-ID: <16332.56530.567033.265903@gargle.gargle.HOWL>

Chris Herborth wrote at 2003-12-1 13:51 -0500:
 > I've got PyXML 0.8.3 installed here, and I'm generating the DOM for some 
 > documents thusly:
 > 
 > reader = xml.dom.ext.reader.Sax2.Reader()
 > 
 > # snipped: setting up an external entity resolver and error handler
 > 
 > dom = reader.fromStream( file( an_xml_filename ) )
 > 
 > Is it possible to use a different SAX parser and still get the advantages of 
 > using the PyXML DOM goodness?

The "Reader" class has an optional "parser" argument.
Look at its source...

-- 
Dieter

From juhtolv at cc.jyu.fi  Mon Dec  8 10:38:45 2003
From: juhtolv at cc.jyu.fi (Juhapekka Tolvanen)
Date: Mon Dec  8 10:38:49 2003
Subject: [XML-SIG] Any XBEL to OPML converters out there?
Message-ID: <20031208153844.GA11878@heresy.ainola.jyu.fi>


Some universal format for outline editors has been developed. It is called
OPML:

http://www.opml.org/

I'd like to find a way to convert my XBEL-bookmarks to OPML, too. Do you
know any software for that purpose? Or could you write it right now? It
would better be free (in the sense of freedom) software.

If I could convert my bookmarks to OPML-format, I could participate to
this:

http://www.superopendirectory.com/

But hey, how about creating system, that is just like SuperOpenDirectory,
but uses XBEL-format?

Here is some information of outline editors:

http://www.troubleshooters.com/tpromag/199911/199911.htm

http://www.outliners.com/


P.S: I don't subscribe to this list. I am smart enough to read archives
from WWW, but please, Cc: to me.


-- 
Juhapekka "naula" Tolvanen * http colon slash slash iki dot fi slash juhtolv
"Rakkaudesta ruikuttajat, halusta ulvojat kiert?? kaupungin syd?nt? vaanien
verta. Omiin synkkiin linnoihinsa vallitusten taa pelokkaammat piilee
hautomaan haamujaan."                                                    CMX

From walter at livinglogic.de  Mon Dec  8 15:47:27 2003
From: walter at livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Mon Dec  8 15:47:32 2003
Subject: [XML-SIG] ANN: XIST 2.3
Message-ID: <3FD4E35F.5020403@livinglogic.de>

XIST 2.3 has been released!

What is it?
===========

XIST is an XML-based extensible HTML generator written in Python.
XIST is also a DOM parser (built on top of SAX2) with a very simple
and Pythonesque tree API. Every XML element type corresponds to a
Python class, and these Python classes provide a conversion method
to transform the XML tree (e.g., into HTML). XIST can be considered
"object oriented XSL".

What's new in version 2.3?
==========================

     * Namespace handling has been rewritten to be more standard
       compliant (no more namespace prefixes for entity references
       or processing instructions).

     * Global attributes will now always generate the appropriate
       xmlns attributes.

     * Support for uTidylib has been added and arguments
       can be passed to tidy now.

     * The HTMLParser can handle global attributes now.

     * When parsing from an URL the base URL will be correct now
       even if the request gets redirected
       (thanks to ll-url 0.11.6).

      * Various other small bugfixes and enhancements.

For changes in older versions see:
http://www.livinglogic.de/Python/xist/History.html

Where can I get it?
===================

XIST can be downloaded from http://ftp.livinglogic.de/xist/
or ftp://ftp.livinglogic.de/pub/livinglogic/xist/

Web pages are at
http://www.livinglogic.de/Python/xist/

ViewCVS access is available at
http://www.livinglogic.de/viewcvs/


Bye,
     Walter D?rwald

From tpassin at comcast.net  Tue Dec  9 22:29:28 2003
From: tpassin at comcast.net (Thomas B. Passin)
Date: Tue Dec  9 22:28:27 2003
Subject: [XML-SIG] Any XBEL to OPML converters out there?
In-Reply-To: <20031208153844.GA11878@heresy.ainola.jyu.fi>
References: <20031208153844.GA11878@heresy.ainola.jyu.fi>
Message-ID: <3FD69318.3050000@comcast.net>

Juhapekka Tolvanen wrote:

> Some universal format for outline editors has been developed. It is called
> OPML:
> 
> http://www.opml.org/
> 
> I'd like to find a way to convert my XBEL-bookmarks to OPML, too. Do you
> know any software for that purpose? Or could you write it right now? It
> would better be free (in the sense of freedom) software.
> 

That should be fairly easy to do by means of an xslt stylesheet.  I do 
not know of any, but that is the way I would do it.  This has actually 
been the subject of a homework assignment - see

http://cscisl.dce.harvard.edu/assignments/2

OPLM is not a particularly well-designed format, so I would not 
recommend it unless you plan to use it with some system that requires it 
(which it seems you do).

Cheers,

Tom P


From lalleman at mfps.com  Wed Dec 10 16:02:29 2003
From: lalleman at mfps.com (Alleman, Lowell)
Date: Wed Dec 10 16:03:27 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
Message-ID: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>

Hi,

I'm working with an application that is very picky about the XML it accepts
(basically it's non-compliant).  The company's support team isn't giving me
many options.  Certain things that the XML spec say the parser shouldn't
care about, this utility cares about.  Things like the order of attributes
and whether an empty element is written as "<a></a>" or "</a>" need to be
presented in a specific way.

Any ideas on how to work around some of these issues.  Python XML tools
would be preferred, but at this point all ideas and/or tools are welcome.
All I need is to be able to dictate the order in which the attributes appear
and whether or not empty elements should be written using the shortcut
('<a/>') form.

The changes I am making to the XML document are rather trivial.  I've
considered simply using a slew of string.replace() and few regular
expressions to get job done, but there maybe a few cases where the DOM
approach would be preferable over the raw text manipulation approach.

FYI:  So far I have tried using minidom and 4DOM (the one from PyXML 0.8.2).
I haven't seen the flexibility that I require so far, but I'm not very
familiar with either parser.  minidom would be my preference, since it is
installed as part of the standard library.


Thanks in advance,

- Lowell Alleman


From rsalz at datapower.com  Wed Dec 10 16:15:59 2003
From: rsalz at datapower.com (Rich Salz)
Date: Wed Dec 10 16:10:12 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
Message-ID: <3FD78D0F.9010304@datapower.com>

> Any ideas on how to work around some of these issues

You might take a look at the c14n code in dom/ext/c14n.py; it does more 
than what you want, but it shows how to walk a dom, sort attributes, etc.
	/r$


-- 
Rich Salz, Chief Security Architect
DataPower Technology                           http://www.datapower.com
XS40 XML Security Gateway   http://www.datapower.com/products/xs40.html
XML Security Overview  http://www.datapower.com/xmldev/xmlsecurity.html


From fredrik at pythonware.com  Thu Dec 11 02:31:52 2003
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu Dec 11 02:40:22 2003
Subject: [XML-SIG] Re: Working with non-compliant XML utilities
References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
Message-ID: <br96hj$5g5$2@sea.gmane.org>

Lowell Alleman wrote:

> I'm working with an application that is very picky about the XML it accepts
> (basically it's non-compliant).  The company's support team isn't giving me
> many options.  Certain things that the XML spec say the parser shouldn't
> care about, this utility cares about.  Things like the order of attributes
> and whether an empty element is written as "<a></a>" or "</a>" need to be
> presented in a specific way.
>
> Any ideas on how to work around some of these issues.  Python XML tools
> would be preferred, but at this point all ideas and/or tools are welcome.
> All I need is to be able to dictate the order in which the attributes appear
> and whether or not empty elements should be written using the shortcut
> ('<a/>') form.

sounds like you need a custom XML writer.

a quick solution is to take a copy of the writexml() method from the
minidom's Element class and make it into a function (i.e. operate on
element nodes instead of self, change the recursive writexml method
call to a recursive function call, and use the _write_data from the
minidom module).

from xml.dom import minidom
from xml.dom import Node

def writexml(node, writer, indent="", addindent="", newl=""):

    if node.nodeType != Node.ELEMENT_NODE:
        # use standard serializer for everything but elements
        node.writexml(writer, indent, addindent, newl)
        return

    writer.write(indent+"<" + node.tagName)

    attrs = node._get_attributes()
    a_names = attrs.keys()
    a_names.sort()

    for a_name in a_names:
        writer.write(" %s=\"" % a_name)
        minidom._write_data(writer, attrs[a_name].value)
        writer.write("\"")
    if node.childNodes:
        writer.write(">%s"%(newl))
        for node in node.childNodes:
            writexml(node,writer,indent+addindent,addindent,newl)
        writer.write("%s</%s>%s" % (indent,node.tagName,newl))
    else:
        writer.write("/>%s"%(newl))

usage example:

    import sys

    node = minidom.parseString("<foo><bar/>hello</foo>")
    writexml(node, sys.stdout)

when this works, tweak the code (it's trivial) until it does exactly
what you want.

hope this helps!

</F>


From and-xml at doxdesk.com  Thu Dec 11 12:46:05 2003
From: and-xml at doxdesk.com (Andrew Clover)
Date: Thu Dec 11 13:04:46 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
Message-ID: <20031211174605.GA4930@doxdesk.com>

Lowell Alleman <lalleman@mfps.com> wrote:

> Certain things that the XML spec say the parser shouldn't
> care about, this utility cares about.  Things like the order of attributes

Urgh. Nasty.

Well, you could try pxdom:

  http://www.doxdesk.com/software/py/pxdom.html

A special feature of this DOM implementation is that it will maintain a
fixed order of attributes, so you can rely on the output being in the order
you want.

> and whether an empty element is written as "<a></a>" or "</a>" need to be
> presented in a specific way.

Is it always one way or always the other, or a mix?

pxdom will use the short form where possible, unless you ask it to do
canonicalisation (using the DOM Level 3 'canonical-form' parameter).
Unfortunately if you did canonicalisation, the attribute order would be
changed. I might add a separate option as a non-standard extension to turn
off short-forms in 1.0 if anyone else would find it useful - alteratively,
hack line 4193 in version 0.9.

If you need to output short forms in some cases but not in others, that's a
bit more work. What you could do to fool the serialiser is put a Text node
of an empty string inside every element that you want to be output in the
longer form, eg.:

  element.appendChild(element.ownerDocument.createTextNode(''))

Just don't normalise it before you serialise or the empty text nodes will
disappear!

Actually, it looks like this trick works in minidom, too.

-- 
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/

From lalleman at mfps.com  Thu Dec 11 14:04:25 2003
From: lalleman at mfps.com (Alleman, Lowell)
Date: Thu Dec 11 14:05:22 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
Message-ID: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>


> -----Original Message-----
> From: Andrew Clover [mailto:and-xml@doxdesk.com]
> Sent: Thursday, December 11, 2003 12:46 PM
> To: xml-sig@python.org
> Subject: Re: [XML-SIG] Working with non-compliant XML utilities
> 
> 
> > and whether an empty element is written as "<a></a>" or 
> "</a>" need to be
> > presented in a specific way.
> 
> Is it always one way or always the other, or a mix?

It is per-element.  For example element 'a' would always be <a></a>, but 'b'
would have to be shown as '<b/>'.  If 'a' was written as '<a/> or 'b' as
<b></b>, the application chokes.  It's pretty annoying.

The good news is that when it comes down to actuality, only a few elements
need to be tweaked.  It's always in the form of forcing "<e/>" to be written
as "<e></e>", but never the other way around.


Thanks for your suggestions.

- Lowell

From Alexandre.Fayolle at logilab.fr  Thu Dec 11 15:21:22 2003
From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle)
Date: Thu Dec 11 15:21:27 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>
Message-ID: <20031211202122.GE30399@calvin>

On Thu, Dec 11, 2003 at 02:04:25PM -0500, Alleman, Lowell wrote:
 
> It is per-element.  For example element 'a' would always be <a></a>, but 'b'
> would have to be shown as '<b/>'.  If 'a' was written as '<a/> or 'b' as
> <b></b>, the application chokes.  It's pretty annoying.
> 
> The good news is that when it comes down to actuality, only a few elements
> need to be tweaked.  It's always in the form of forcing "<e/>" to be written
> as "<e></e>", but never the other way around.

This reminds me of DTD validation of EMPTY elements:
if an element is declared EMPTY in a DTD, then it has to use the
shortcut notation, otherwise the document is not valid. 

Now I agree that mandating some elements to use the <A></A> notation
denotes a severely broken parser. 

-- 
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
D?veloppement logiciel avanc? - Intelligence Artificielle - Formations

From lalleman at mfps.com  Thu Dec 11 15:54:58 2003
From: lalleman at mfps.com (Alleman, Lowell)
Date: Thu Dec 11 15:55:57 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
Message-ID: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>


Unfortunately, it looks like I have to do the exact opposite.  Most XML
writers automatically condense to the <e/> form.  I need to tell the writer
not to do so for certain elements.

The sad part about all of this really is that the tool that I'm having these
issues with is a data translation tool (sometimes called data mapping).
It's primary job is converting and processing data in various formats.  

Speaking of DTDs.... I have some new questions:

The order that the attributes should appear happens to be the same order
that they are listed in the <!ATTRLIST> in the DTD.  I've tried to pull out
the DTD info using 4DOM and minidom, but haven't had much success.  (I
confess that I didn't spend too much time trying to find the appropriate
documentation.)  If I can pullout the information in the <!ATTRLIST>, I can
quickly build a dictionary of elements which contain a list of ordered
attributes.  (I've tested this idea building a small dictionary manually,
but it would be nice to do this using the DTD.)

FYI:  I tried pulling in the DTD info using an external reference as well as
placing it inline.  (I tried the inline DTD when using for minidom.  I
assumed that minidom wouldn't pick it up automatically, as it is not a
validating parser.  But I wasn't sure if it would simply ignore the DTD).

I did notice that 4DOM seemed to choke on ENTITY references ( %entity_ref; )
when the DTD was inline.  Can anyone confirm that?


Feel free to send URLs.

Thanks again,

- Lowell


-----Original Message-----
From: Alexandre Fayolle [mailto:Alexandre.Fayolle@logilab.fr]
Sent: Thursday, December 11, 2003 3:21 PM
To: xml-sig@python.org
Subject: Re: [XML-SIG] Working with non-compliant XML utilities


On Thu, Dec 11, 2003 at 02:04:25PM -0500, Alleman, Lowell wrote:
 
> It is per-element.  For example element 'a' would always be <a></a>, but
'b'
> would have to be shown as '<b/>'.  If 'a' was written as '<a/> or 'b' as
> <b></b>, the application chokes.  It's pretty annoying.
> 
> The good news is that when it comes down to actuality, only a few elements
> need to be tweaked.  It's always in the form of forcing "<e/>" to be
written
> as "<e></e>", but never the other way around.

This reminds me of DTD validation of EMPTY elements:
if an element is declared EMPTY in a DTD, then it has to use the
shortcut notation, otherwise the document is not valid. 

Now I agree that mandating some elements to use the <A></A> notation
denotes a severely broken parser. 

-- 
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
D?veloppement logiciel avanc? - Intelligence Artificielle - Formations

_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig

From martin at v.loewis.de  Thu Dec 11 15:59:37 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Thu Dec 11 16:00:00 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <20031211202122.GE30399@calvin>
References: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>
	<20031211202122.GE30399@calvin>
Message-ID: <m365gn135y.fsf@mira.informatik.hu-berlin.de>

Alexandre Fayolle <Alexandre.Fayolle@logilab.fr> writes:

> This reminds me of DTD validation of EMPTY elements:
> if an element is declared EMPTY in a DTD, then it has to use the
> shortcut notation, otherwise the document is not valid. 

That is not the case. In XML 1.0 (second edition), after clause 43, we
find the definitions

[Definition: An element with no content is said to be empty.] The
representation of an empty element is either a start-tag immediately
followed by an end-tag, or an empty-element tag.

So an <foo></foo> is also an empty element. After clause 44, we find

For interoperability, the empty-element tag should be used, and
should only be used, for elements which are declared EMPTY.

where "For interoperability" is defined as

for interoperability

    [Definition: Marks a sentence describing a non-binding
    recommendation included to increase the chances that XML documents
    can be processed by the existing installed base of SGML processors
    which predate the WebSGML Adaptations Annex to ISO 8879.]

So this is really "should", not "must".

Regards,
Martin

From martin at v.loewis.de  Thu Dec 11 16:07:25 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Thu Dec 11 16:07:51 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>
Message-ID: <m3wu93ysfm.fsf@mira.informatik.hu-berlin.de>

"Alleman, Lowell" <lalleman@mfps.com> writes:

> The order that the attributes should appear happens to be the same order
> that they are listed in the <!ATTRLIST> in the DTD.  I've tried to pull out
> the DTD info using 4DOM and minidom, but haven't had much success. 

You should explicitly use xmlproc, and install a DTDListener. The
add_attribute callbacks will come in the order of attribute declaration.

> I did notice that 4DOM seemed to choke on ENTITY references ( %entity_ref; )
> when the DTD was inline.  Can anyone confirm that?

No. 4DOM only uses some underlying parser, so it will never choke
itself - if something chokes, it is the underlying parser.

Regards,
Martin

From Alexandre.Fayolle at logilab.fr  Fri Dec 12 03:05:35 2003
From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle)
Date: Fri Dec 12 03:05:41 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <m365gn135y.fsf@mira.informatik.hu-berlin.de>
References: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>
	<20031211202122.GE30399@calvin>
	<m365gn135y.fsf@mira.informatik.hu-berlin.de>
Message-ID: <20031212080535.GA3080@calvin>

On Thu, Dec 11, 2003 at 09:59:37PM +0100, Martin v. L?wis wrote:
> Alexandre Fayolle <Alexandre.Fayolle@logilab.fr> writes:
> 
> > This reminds me of DTD validation of EMPTY elements:
> > if an element is declared EMPTY in a DTD, then it has to use the
> > shortcut notation, otherwise the document is not valid. 
> 
> That is not the case. In XML 1.0 (second edition), after clause 43, we
> find the definitions

<snip>
 
> So this is really "should", not "must".

Thanks a lot for the precision, Martin. I don't remember where I had got 
the feeling of a 'must', here. I guess I should read XML 1.0 again
-- this is also really a 'should' ;-)

-- 
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
D?veloppement logiciel avanc? - Intelligence Artificielle - Formations

From and-xml at doxdesk.com  Fri Dec 12 04:56:13 2003
From: and-xml at doxdesk.com (Andrew Clover)
Date: Fri Dec 12 05:14:52 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>
Message-ID: <20031212095613.GA26268@doxdesk.com>

Lowell Alleman <lalleman@mfps.com> wrote:

> I need to tell the writer not to do so for certain elements.

(Speaking of which: the empty-text-node trick seems to work with 4DOM too.
Yay!)

> The sad part about all of this really is that the tool that I'm having these
> issues with is a data translation tool

Aye, that's a pretty poor data translation tool.

> The order that the attributes should appear happens to be the same order
> that they are listed in the <!ATTRLIST> in the DTD.  I've tried to pull out
> the DTD info using 4DOM and minidom, but haven't had much success.

No, they don't make this available; as Martin says, you'll need to fiddle
with a processor to get at this info.

Alternatively, in another tiresome plug for my own imp, pxdom goes give one
access to the ATTLIST declararions, and guarantees the declarations will be
in document order. To get a list of attr names, you could say:

  decls= document.doctype.pxdomAttlists.getNamedItem('tagName').declarations
  attrNames= [decl.nodeName for decl in decls]

Or to sort an element's attributes in one go:

  def sortAttributesByAttlistOrder(element):
    doctype= element.ownerDocument.doctype
    if doctype is not None:
      attlist= doctype.pxdomAttlists.getNamedItem(el.tagName)
      if attlist is not None:
        for attdecl in attlists.declarations:
          attr= element.getAttributeNode(attdecl.nodeName)
          if attr is not None:
            element.removeAttributeNode(attr)
            element.setAttributeNode(attr)

The drawback is that pxdom doesn't (currently) use external entities,
including the DTD external subset, so you'd have to cram the <!ATTLIST>s
into the internal subset for it to work.

> (I tried the inline DTD when using for minidom.  I assumed that minidom
> wouldn't pick it up automatically, as it is not a validating parser.

Yes, minidom also does not use external entities.

> I did notice that 4DOM seemed to choke on ENTITY references ( %entity_ref; )
> when the DTD was inline.

Hmm. Using expat it (and minidom) seem to ignore parameter entities, but I
can't get it to choke as such. If you are getting an 'Illegal parameter
entity reference', that'll be because XML is stricter about where it allows
parameter entities in the internal subset than in an external DTD.

-- 
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/

From Alexandre.Fayolle at logilab.fr  Fri Dec 12 07:27:14 2003
From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle)
Date: Fri Dec 12 07:27:18 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
Message-ID: <20031212122713.GF3080@calvin>

On Wed, Dec 10, 2003 at 04:02:29PM -0500, Alleman, Lowell wrote:
 
> FYI:  So far I have tried using minidom and 4DOM (the one from PyXML 0.8.2).
> I haven't seen the flexibility that I require so far, but I'm not very
> familiar with either parser.  minidom would be my preference, since it is
> installed as part of the standard library.

A way to getting what you need could probably be to use SAX to
translate the document you have to what your appplication will
understand. Get the content handler to produce the text representation
of the contents read by the parser seems feasible. 

Some code to start from can be found in xml.sax.writer. The startElement
and endElement should be customized to produce attributes in the right
order, and to close elements correctly. 

The complexity of the task will depend on the genericity you want to
achieve, of course. 
 

-- 
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
D?veloppement logiciel avanc? - Intelligence Artificielle - Formations

From nhs at llnl.gov  Fri Dec 12 12:51:42 2003
From: nhs at llnl.gov (Norman Samuelson)
Date: Fri Dec 12 12:51:51 2003
Subject: [XML-SIG] Re: Working with non-compliant XML utilities
In-Reply-To: <E1AUqhL-0004q0-TB@mail.python.org>
References: <E1AUqhL-0004q0-TB@mail.python.org>
Message-ID: <6.0.0.22.2.20031212094847.048637f8@popeye.llnl.gov>

One way you may be able to do what you want with minimal effort would be to 
write the XML as usual, with whatever tool you care about, then process it 
with XSL to produce the strange results you need.

- Norm -


From tpassin at comcast.net  Fri Dec 12 18:26:04 2003
From: tpassin at comcast.net (Thomas B. Passin)
Date: Fri Dec 12 18:24:59 2003
Subject: [XML-SIG] Re: Working with non-compliant XML utilities
In-Reply-To: <6.0.0.22.2.20031212094847.048637f8@popeye.llnl.gov>
References: <E1AUqhL-0004q0-TB@mail.python.org>
	<6.0.0.22.2.20031212094847.048637f8@popeye.llnl.gov>
Message-ID: <3FDA4E8C.3010604@comcast.net>

Norman Samuelson wrote:
> One way you may be able to do what you want with minimal effort would be 
> to write the XML as usual, with whatever tool you care about, then 
> process it with XSL to produce the strange results you need.
> 
  He can't do that - xslt will only produce normal xml, not the "strange 
results" - no control over attribute order or empty element form unless 
he writes his own serializer.

Cheers,

Tom P


From zhaoxinzhi at hotmail.com  Sat Dec 13 05:22:26 2003
From: zhaoxinzhi at hotmail.com (Xinzhi Zhao)
Date: Sat Dec 13 05:22:32 2003
Subject: [XML-SIG] Parsing the XML file which has encoding 'gb2312' .
Message-ID: <Sea2-F11JgNgCKEjmY200058f6c@hotmail.com>

Hi,
My XML files have to use other encoding instead of the default one, i.e. 
'gb2312'. When I was parsing  my XML files by dint of DOM or SAX , some 
errors occurred. The Python xml packages can't do it now? Is there any way 
can finish my job? How shall I do it? Please help me.

Thanks,

Xinzhi Zhao
zhaoxinzhi@hotmail.com

-------------------------------------------------------------------------------
-- My xml file is shown as below,
----------------------------------------------

<?xml version = "1.0"  encoding = "gb2312"?>

<!-- This is my example. -->

<article>

<title> �򵥵� XML</title>

<data>December 12, 2003</data>

<author>
    <name>Xinzhi Zhao</name>
</author>

<summary>Parsing XML</summary>

<content>This XML is available in IE6. However,parsing it in Python by DOM 
or SAX will be failed.How shall I do it?
</content>

</article>

_________________________________________________________________
Add photos to your messages with MSN 8. Get 2 months FREE*. 
http://join.msn.com/?page=features/featuredemail


From mike at skew.org  Sat Dec 13 08:14:13 2003
From: mike at skew.org (Mike Brown)
Date: Sat Dec 13 08:14:17 2003
Subject: [XML-SIG] Parsing the XML file which has encoding 'gb2312' .
In-Reply-To: <Sea2-F11JgNgCKEjmY200058f6c@hotmail.com> "from Xinzhi Zhao at Dec
	13, 2003 10:22:26 am"
Message-ID: <200312131314.hBDDEDmi021838@chilled.skew.org>

Xinzhi Zhao wrote:
> Hi,
> My XML files have to use other encoding instead of the default one, i.e. 
> 'gb2312'. When I was parsing  my XML files by dint of DOM or SAX , some 
> errors occurred. The Python xml packages can't do it now? Is there any way 
> can finish my job? How shall I do it? Please help me.

Limitations of the underlying parser, Expat, prevent certain encodings from
being supported without an additional layer of code. GB2312 is among them.

I think you will have to transcode your document to one of the encodings that
is supported by Expat (UTF-16, UTF-16LE, UTF-16BE, UTF-8, ISO-8859-1, or
US-ASCII; you probably want UTF-8 or UTF-16), and then either rewrite the
encoding declaration in the XML, or find a way to make the declaration
externally. Expat does support external declaration of encoding, but I don't
know offhand how to do it from Python.

From martin at v.loewis.de  Sat Dec 13 08:45:12 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Dec 13 08:45:34 2003
Subject: [XML-SIG] Parsing the XML file which has encoding 'gb2312' .
In-Reply-To: <200312131314.hBDDEDmi021838@chilled.skew.org>
References: <200312131314.hBDDEDmi021838@chilled.skew.org>
Message-ID: <m3fzfoltlj.fsf@mira.informatik.hu-berlin.de>

Mike Brown <mike@skew.org> writes:

> I think you will have to transcode your document to one of the encodings that
> is supported by Expat (UTF-16, UTF-16LE, UTF-16BE, UTF-8, ISO-8859-1, or
> US-ASCII

Alternatively, you can use xmlproc, which supports any encoding for
which you have a Python codec.

Regards,
Martin


From zhaoxz at founder.com  Thu Dec 11 09:07:37 2003
From: zhaoxz at founder.com (=?ISO-8859-1?Q?=D5=D4=D0=C2=D6=BE?=)
Date: Sat Dec 13 09:41:54 2003
Subject: [XML-SIG] Parsing XML
Message-ID: <MAILSRVTANZwwPcMFlF0000018e@mailsrv.ecfounder.com>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: face-3(2)(1).GIF
Type: image/gif
Size: 842 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20031211/8937ad4b/face-321.gif
From fredrik at pythonware.com  Sat Dec 13 09:56:17 2003
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat Dec 13 09:56:21 2003
Subject: [XML-SIG] Re: Parsing XML
References: <MAILSRVTANZwwPcMFlF0000018e@mailsrv.ecfounder.com>
Message-ID: <brf9af$be4$1@sea.gmane.org>

zhaoxz@founder.com wrote:

> My XML files have to use encoding 'iso-8859-1',which is different
> from the default encoding 'utf-8'.
>
> When I was using the package from 4DOM(pyxml.souceforge.net)
> to parse my XML files,errors occured. The package for parsing xml
> only supports encoding 'utf-8', right?

if your XML files use ISO-8859-1 encoding, they should contain
an encoding directive in the ?xml header; see

    http://www.w3.org/TR/2000/REC-xml-20001006#NT-EncodingDecl

</F>


From mike at skew.org  Sat Dec 13 10:11:32 2003
From: mike at skew.org (Mike Brown)
Date: Sat Dec 13 10:11:39 2003
Subject: [XML-SIG] Parsing XML
In-Reply-To: <MAILSRVTANZwwPcMFlF0000018e@mailsrv.ecfounder.com>
	"from =?ISO-8859-1?Q?=D5=D4=D0=C2=D6=BE?= at Dec 11, 2003 10:07:37 pm"
Message-ID: <200312131511.hBDFBW88022355@chilled.skew.org>

> My XML files have to use encoding 'iso-8859-1',which is different
> from the default encoding 'utf-8'.

Technically, there is no default, but conforming parsers assume utf-16 until
they see there's no byte-order mark (BOM) at the beginning, and then assume
utf-8 until they see something else declared in the prolog.
 
> When I was using the package from 4DOM(pyxml.souceforge.net)
> to parse my XML files,errors occured.

What errors, specifically? 

Are you sure your XML files are actually iso-8859-1 encoded? 

Note: it is the XML author's responsibility to ensure that the encoding
declaration in the prolog accurate reflects the actual encoding of the
document. If you had a gb2312 file and just changed the declaration to say
iso-8859-1, you didn't change the actual encoding of the document, you just
made the declaration be wrong, which an XML parser is required to treat as a
fatal error.

> The package for parsing xml
> only supports encoding 'utf-8', right?

No, the parser that 4DOM uses (Expat) supports other encodings, as I mentioned
in my other message today. iso-8859-1 should work just fine.

If you are still trying to parse gb2312-encoded XML, you need to do more than
just replace 'gb2312' with 'iso-8859-1' in the encoding declaration. Use
Python's codecs module to wrap your gb2312 stream, decoding from gb2312 to
Unicode, at which point you can safely rewrite the declaration in the prolog
if necessary, and then wrap again, encoding from Unicode to utf-8 (or utf-16).
This is what I meant by 'transcode'. You won't need to rewrite the declaration
if you can figure out how to make Expat accept the external encoding
declaration from Python. I was hoping a PyExpat expert would suggest the
answer.

-Mike

From KSBeattie at lbl.gov  Mon Dec 22 22:00:02 2003
From: KSBeattie at lbl.gov (Keith Beattie)
Date: Mon Dec 22 22:00:17 2003
Subject: [XML-SIG] binding an unbound namespace prefix
Message-ID: <3FE7AFB2.50407@lbl.gov>

Hi all,

I'm trying to parse a string which is a segment of xml (in order to 
canonicalize it) which doesn't have all it's namespaces bound in the segment 
I'm trying to parse.  How do I pass the namespaces into minidom.parseString(), 
or Domlette.NonvalidatingReader.parseString(),such that they'll be happy with 
the 'unbound prefix'?  I hoped to see an nsdict kw arg or some such, but no 
luck.  Is building the dom myself the only way to do this?

Thanks,
ksb


From walter at livinglogic.de  Tue Dec 23 04:18:43 2003
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue Dec 23 04:19:02 2003
Subject: [XML-SIG] binding an unbound namespace prefix
In-Reply-To: <3FE7AFB2.50407@lbl.gov>
References: <3FE7AFB2.50407@lbl.gov>
Message-ID: <3FE80873.2020102@livinglogic.de>

Keith Beattie wrote:

> Hi all,
> 
> I'm trying to parse a string which is a segment of xml (in order to 
> canonicalize it) which doesn't have all it's namespaces bound in the 
> segment I'm trying to parse.  How do I pass the namespaces into 
> minidom.parseString(), or 
> Domlette.NonvalidatingReader.parseString(),such that they'll be happy 
> with the 'unbound prefix'?  I hoped to see an nsdict kw arg or some 
> such, but no luck.  Is building the dom myself the only way to do this?

You could try XIST (http://www.livinglogic.de/Python/xist/), which
supports passing a prefix mapping to the parser:

from ll.xist import xsc, parsers
from ll.xist.ns import html, svg, fo

e = parsers.parseString(
    "<h:html><s:svg><block/></s:svg></h:html>",
    prefixes=xsc.Prefixes(fo, s=svg, h=html)
)

Unfortunately this doesn't return a standard DOM, but of course
you could convert it into one.

Bye,
    Walter D?rwald


From fdrake at acm.org  Tue Dec 23 08:35:05 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue Dec 23 08:35:26 2003
Subject: [XML-SIG] binding an unbound namespace prefix
In-Reply-To: <3FE7AFB2.50407@lbl.gov>
References: <3FE7AFB2.50407@lbl.gov>
Message-ID: <16360.17545.295816.495961@sftp.fdrake.net>


Keith Beattie writes:
 > I'm trying to parse a string which is a segment of xml (in order to
 > canonicalize it) which doesn't have all it's namespaces bound in
 > the segment I'm trying to parse.  How do I pass the namespaces into
 > minidom.parseString(), or
 > Domlette.NonvalidatingReader.parseString(),such that they'll be
 > happy with the 'unbound prefix'?  I hoped to see an nsdict kw arg
 > or some such, but no luck.  Is building the dom myself the only way
 > to do this?

No, but working around the current API to do this is pretty painful at
the moment.  Please file a feature request for better fragment
support; you can assign it to me if you like.

There is some code in xml.dom.expatbuilder that shows how to do this;
it may be a bit difficult to decipher.  The code is mine, so feel free
to ask questions about it here on the XML-SIG mailing list.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From csad7 at t-online.de  Tue Dec 23 12:27:02 2003
From: csad7 at t-online.de (c.)
Date: Tue Dec 23 12:28:49 2003
Subject: [XML-SIG] empty EntityResolver for SAX
Message-ID: <3FE87AE6.6070501@cdot.de>

hi,
(the following description is a bit convoluted, sorry about that. i hope 
you understand it anyway...)

i thought of providing an empty EntityResolver to my parse function that 
  if i encounter xml files with DTDs in them these will not be processed.


class EmptyEntityResolver(xml.sax.handler.EntityResolver):

     def resolveEntity(self, publicId, systemId):
         return "http://localhost/empty.txt"

p = xml.sax.make_parser()
p.setContentHandler(handler)
p.setEntityResolver(EmptyEntityResolver())

i could use 
p.setFeature('http://xml.org/sax/features/external-general-entities',False)
of course but i thought something like the above might be better for my 
purpose.


my problem now is that something like
     return None
does not work. only the above with the dummy empty.txt file needs to be 
present.

is there a simpler way of returning an empty InputSource?


thanks a lot
chris


From shunting at etopicality.com  Tue Dec 23 16:32:42 2003
From: shunting at etopicality.com (Sam Hunting)
Date: Tue Dec 23 16:32:58 2003
Subject: [XML-SIG] Which version of PyXML do I install?
Message-ID: <Pine.LNX.4.58.0312231628520.30448@www1.martnet.com>

Here are the first few lines from dmesg:

   Linux version 2.4.23-xfs-031204 (...@...) (gcc version 2.95.4
   20011002 (Debian prerelease)) #1 SMP Thu Dec 4 17:08:50 CET 2003

I'd prefer to use an rpm if possible.

Sam Hunting
eTopicality, Inc.

---------------------------------------------------------------------------
Co-editor:  ISO Reference Model for Topic Maps
  Topic map consulting and training: www.etopicality.com
Free open source topic map tools:  www.gooseworks.org
  XML Topic Maps: Creating and Using Topic Maps for the Web.
Addison-Wesley, ISBN 0-201-74960-2.
---------------------------------------------------------------------------

From and-xml at doxdesk.com  Wed Dec 24 05:22:03 2003
From: and-xml at doxdesk.com (Andrew Clover)
Date: Wed Dec 24 05:41:18 2003
Subject: [XML-SIG] binding an unbound namespace prefix
In-Reply-To: <3FE7AFB2.50407@lbl.gov>
References: <3FE7AFB2.50407@lbl.gov>
Message-ID: <20031224102203.GA29545@doxdesk.com>

Keith Beattie <KSBeattie@lbl.gov> wrote:

> How do I pass the namespaces into minidom.parseString(), or
> Domlette.NonvalidatingReader.parseString(), such that they'll be happy
> with the 'unbound prefix'?

I know of no convenient way of doing this with either minidom or domlette.
Probably the quickest solution is to hack the input content so it's
surrounded with an element declaring all the known namespaces, then ignore
the root element of the result.

Alternatively, the DOM Level 3 method parseWithContext would let you insert
directly into the relevant part of the document (with namespaces declared
above). pxdom supports this method and the domConfig parameter
'canonical-form', so that might be a possibility too.

-- 
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/

From list-matt at reprocessed.org  Wed Dec 24 06:52:19 2003
From: list-matt at reprocessed.org (Matt Patterson)
Date: Wed Dec 24 06:52:25 2003
Subject: [XML-SIG] testing a document for validity against a schema,
	not a DTD
Message-ID: <A07DD708-3607-11D8-9042-000393CBB978@reprocessed.org>

Hello all,

I'm looking for a way to validate an XML document against a schema: 
nothing fancy, just a simple yes/no response from the parser would 
probably do.

I can do it several ways with DTDs, but I'm unsure about XML Schema 
support in Python.

Can anyone enlighten me?

Many thanks,

Matt Patterson


From chrish at cryptocard.com  Wed Dec 24 11:06:28 2003
From: chrish at cryptocard.com (Chris Herborth)
Date: Wed Dec 24 11:03:28 2003
Subject: [XML-SIG] Validating parser
Message-ID: <3FE9B984.9040600@cryptocard.com>

I'm upgrading my XML application to use the validating parser; I've been 
fixing previously-hidden bugs in my DTD and my document instances as I go... 
but now I've gotten to one that is baffling me... must be the seasonal 
distraction. ;-)

Here's the error:

Invalid XML, unable to continue.
   book.xml, line 11, column 3: Not a valid name

And here are the first 11 lines of book.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book
	PUBLIC
	"-//CRYPTOCard//DTD CRYPTO-Doc 1.0//EN"
	"CRYPTOCard.dtd"
[
	<!ENTITY % book.entities
		PUBLIC "-//CRYPTOCard//ENTITIES Book Entities//EN"
		"book.ent">
	%book.entities;
]>

If I remove the book.ent bit, it still complains at the end of the DOCTYPE 
declaration, so I'm guessing there's an invalid name somewhere in my DTD. 
Although I'm not sure why this error wouldn't be reported until the end of 
the declaration, instead of during DTD parsing like my other DTD-related 
errors...

Any help is grealy appreciated, thanks!

-- 
Chris Herborth                                     chrish@cryptocard.com
Documentation Overlord, CRYPTOCard Corp.      http://www.cryptocard.com/
Never send a monster to do the work of an evil scientist.


From xml-sig at thewrittenword.com  Mon Dec 29 18:04:20 2003
From: xml-sig at thewrittenword.com (Albert Chin)
Date: Mon Dec 29 18:04:28 2003
Subject: [XML-SIG] 4suite 1.0a3/PyXML 1.0a3 on HP-UX with Python 2.3.2]\
Message-ID: <20031229230420.GA56939@spuckler.il.thewrittenword.com>

I've installed PyXML 0.8.3 and 4Suite 1.0a3 on HP-UX 11.x and Solaris
2.x with GCC 3.3.2. The following program causes a failure on HP-UX
but works on Solaris:
  $ cat a.xml
<?xml version="1.0"?>
<A>
  <B/>
</A>
  $ cat a.py
#!/opt/TWWfsw/python232/bin/python

from xml.dom.ext.reader import PyExpat
from Ft.Xml.XPath import Evaluate

fd = open('a.xml', 'r')
reader = PyExpat.Reader()
dom = reader.fromStream(fd)

  $ python a.py
Traceback (most recent call last):
  File "./a.py", line 8, in ?
    dom = reader.fromStream(fd)
  File "/opt/TWWfsw/python232p/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/PyExpat.py", line 65, in fromStream
    success = self.parser.ParseFile(stream)
  File "/opt/TWWfsw/python232p/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/PyExpat.py", line 120, in startElement
    self._completeTextNode()
  File "/opt/TWWfsw/python232p/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/PyExpat.py", line 104, in _completeTextNode
    if self._currText and len(self._nodeStack) and
self._nodeStack[-1].nodeType != Node.DOCUMENT_NODE:
AttributeError: 'NoneType' object has no attribute 'nodeType'

I posted to the 4Suite-dev mailing list but the problem appears to be
a PyXML one. Any ideas?

-- 
albert chin (china@thewrittenword.com)

From zhaoxinzhi at hotmail.com  Mon Dec 29 21:36:44 2003
From: zhaoxinzhi at hotmail.com (Xinzhi Zhao)
Date: Mon Dec 29 23:03:09 2003
Subject: [XML-SIG] Does Python support XQuery?
Message-ID: <Sea2-F18m9XFM24ovzV0000531b@hotmail.com>

Does Python support XQuery? If it does, would you please show me a example?

ManyThanks.


--Xinzhi Zhao

_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail


From scout104 at comcast.net  Wed Dec 31 04:06:42 2003
From: scout104 at comcast.net (Janna)
Date: Wed Dec 31 04:06:23 2003
Subject: [XML-SIG] Buy Vicodin online today, overnight shipping xyiz kccg
 v
Message-ID: <3FF291A2.7080200@comcast.net>

can you give me more info on buying vicodin? Janna Kneale
scout104@comcast.net
thanks