From sbaush at gmail.com  Fri Feb  3 11:09:47 2006
From: sbaush at gmail.com (Sbaush)
Date: Fri, 3 Feb 2006 11:09:47 +0100
Subject: [XML-SIG] Use DOM for do it
In-Reply-To: <fc5d4c490602030148h62def83cq@mail.gmail.com>
References: <fc5d4c490602030148h62def83cq@mail.gmail.com>
Message-ID: <fc5d4c490602030209y6225c8eby@mail.gmail.com>

Hi all.
I've this function that write a XML string.
Is possible to do it without ElementTree but with DOM?
Thanks.

import sys
import elementtree.ElementTree as ET

root = ET.Element("manager")
req=ET.SubElement(root,"request")
app= ET.SubElement(req,"append")
app.set("mode","INPUT")
met=ET.SubElement(app,"method")
met.set("type","GOOD")
src=ET.SubElement(app,"source")
src.set("address"," 127.0.0.1")
act=ET.SubElement(app,"action")
act.set("option","OK")

tree = ET.ElementTree(root)
tree.write(sys.stdout)
print


--
Sbaush
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20060203/36438523/attachment.htm 

From bob at redivi.com  Fri Feb  3 23:30:23 2006
From: bob at redivi.com (Bob Ippolito)
Date: Fri, 3 Feb 2006 14:30:23 -0800
Subject: [XML-SIG] PyXML 0.8.4 and expat byteorder
Message-ID: <11FE3731-82C4-4D1E-9ECF-AE50ABE314E4@redivi.com>

Here's the PyXML patch that gets expat byteorder from pyconfig.h.  I  
don't know who the maintainer is nor do I have any interest in  
subscribing to xml-sig (this CC will probably bounce, or get stuck in  
mod queue for days/weeks/forever).  If you give a damn about PyXML  
please make sure to get the patch to the right person.

I've never even installed the 4Suite stuff, so I'm not going to put  
together a patch for that.  Such a patch should be roughly the same  
as this one.

-bob

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PyXML-0.8.4-byteorder.patch
Type: application/octet-stream
Size: 1002 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20060203/ac81574d/attachment.obj 
-------------- next part --------------


From noreply at sourceforge.net  Fri Feb  3 23:59:45 2006
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 03 Feb 2006 14:59:45 -0800
Subject: [XML-SIG] [ pyxml-Patches-1423775 ] expat byteorder breaks for OS X
	universal binary builds
Message-ID: <E1F59u5-00030Q-Uf@sc8-sf-web1.sourceforge.net>

Patches item #1423775, was opened at 2006-02-03 17:59
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=306473&aid=1423775&group_id=6473

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: expat
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mike Taylor (code-bear)
Assigned to: Nobody/Anonymous (nobody)
Summary: expat byteorder breaks for OS X universal binary builds

Initial Comment:
I've copying this patch from the pythonmac-sig mailing
list where the issue was talked about.  The author of
the patch is not part of the PyXML community and I
wanted to make sure the patch was noticed.

The patch mail entry:  

http://mail.python.org/pipermail/pythonmac-sig/2006-February/015878.html


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=306473&aid=1423775&group_id=6473

From bear42 at code-bear.com  Sat Feb  4 00:01:28 2006
From: bear42 at code-bear.com (bear)
Date: Fri, 03 Feb 2006 18:01:28 -0500
Subject: [XML-SIG] [Pythonmac-SIG] PyXML 0.8.4 and expat byteorder
In-Reply-To: <11FE3731-82C4-4D1E-9ECF-AE50ABE314E4@redivi.com>
References: <11FE3731-82C4-4D1E-9ECF-AE50ABE314E4@redivi.com>
Message-ID: <43E3E0C8.7080509@code-bear.com>

I've taken the patch and submitted it to the PyXML sourceforge project 
and included a link to your mailing list archive entry for reference.

http://sourceforge.net/tracker/index.php?func=detail&aid=1423775&group_id=6473&atid=306473


Bob Ippolito wrote:
> Here's the PyXML patch that gets expat byteorder from pyconfig.h.  I 
> don't know who the maintainer is nor do I have any interest in 
> subscribing to xml-sig (this CC will probably bounce, or get stuck in 
> mod queue for days/weeks/forever).  If you give a damn about PyXML 
> please make sure to get the patch to the right person.

From evdo.hsdpa at gmail.com  Sat Feb  4 01:38:04 2006
From: evdo.hsdpa at gmail.com (Robert Kim Wireless Internet Advisor)
Date: Fri, 3 Feb 2006 16:38:04 -0800
Subject: [XML-SIG] [Pythonmac-SIG] PyXML 0.8.4 and expat byteorder
In-Reply-To: <43E3E0C8.7080509@code-bear.com>
References: <11FE3731-82C4-4D1E-9ECF-AE50ABE314E4@redivi.com>
	<43E3E0C8.7080509@code-bear.com>
Message-ID: <1ec620e90602031638u643d37f6wc3dec8b325cdcc33@mail.gmail.com>

verrrrry cool! thanks! - bk

On 2/3/06, bear <bear42 at code-bear.com> wrote:
> I've taken the patch and submitted it to the PyXML sourceforge project
> and included a link to your mailing list archive entry for reference.


--
Robert Q Kim, Wireless Internet Advisor
http://hsdpa-coverage.com
http://www.antennacoverage.com/cell-repeater.html

2611 S. Pacific Coast Highway 101
Suite 102
Cardiff by the Sea, CA 92007
206 984 0880

From sbaush at gmail.com  Mon Feb  6 18:35:25 2006
From: sbaush at gmail.com (Sbaush)
Date: Mon, 6 Feb 2006 18:35:25 +0100
Subject: [XML-SIG] problem in ElementTree SubElement
Message-ID: <fc5d4c490602060935l4c74efecl@mail.gmail.com>

Hi all.
I would get this element in xml:

<date month="02" day="06"  />

I have write this:

date=ET.SubElement(idsreq,"date")
        date.set("month",month)
        date.set("day",day)

but i get this:

<date day="06" month="02" />

The attributes are not in my order!!
how i can get the attributes in right order???
Thanks all.

--
Sbaush
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20060206/c3128ac5/attachment.htm 

From radovan.chytracek at gmail.com  Mon Feb  6 19:21:18 2006
From: radovan.chytracek at gmail.com (Radovan Chytracek)
Date: Mon, 6 Feb 2006 19:21:18 +0100
Subject: [XML-SIG] problem in ElementTree SubElement
In-Reply-To: <fc5d4c490602060935l4c74efecl@mail.gmail.com>
References: <fc5d4c490602060935l4c74efecl@mail.gmail.com>
Message-ID: <e32b57660602061021h7f4e63ag6ebb65a90c177998@mail.gmail.com>

Hi,

   you simply can't rely on the order of attributes unless your XML
data are in canonical form which keeps attributes alphabetically
ordered. I guess this a very simple way of saying that the SAX parser
likely to be running behind ElementTree API layer does not preserve
the order of attributes. In general SAX(2) does not have to. Please
correct me if I am wrong about this.

Cheers
                 Radovan

On 2/6/06, Sbaush <sbaush at gmail.com> wrote:
> Hi all.
> I would get this element in xml:
>
> <date month="02" day="06"  />
>
> I have write this:
>
> date=ET.SubElement(idsreq,"date")
>         date.set("month",month)
>         date.set("day",day)
>
> but i get this:
>
> <date day="06" month="02" />
>
> The attributes are not in my order!!
> how i can get the attributes in right order???
> Thanks all.
>
> --
> Sbaush
> _______________________________________________
> XML-SIG maillist  -  XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>
>
>


--
Radovan Chytracek CERN IT PSS
mailto:Radovan.Chytracek at cern.ch
phone: +41227674578 fax: +41227669830

From sbaush at gmail.com  Mon Feb  6 20:10:31 2006
From: sbaush at gmail.com (Sbaush)
Date: Mon, 6 Feb 2006 20:10:31 +0100
Subject: [XML-SIG] problem in ElementTree SubElement
In-Reply-To: <e32b57660602061021h7f4e63ag6ebb65a90c177998@mail.gmail.com>
References: <fc5d4c490602060935l4c74efecl@mail.gmail.com>
	<e32b57660602061021h7f4e63ag6ebb65a90c177998@mail.gmail.com>
Message-ID: <fc5d4c490602061110kc2d8f8fg@mail.gmail.com>

is possible to preserve the order building the XML tree with DOM?

2006/2/6, Radovan Chytracek <radovan.chytracek at gmail.com>:
>
> Hi,
>
>    you simply can't rely on the order of attributes unless your XML
> data are in canonical form which keeps attributes alphabetically
> ordered. I guess this a very simple way of saying that the SAX parser
> likely to be running behind ElementTree API layer does not preserve
> the order of attributes. In general SAX(2) does not have to. Please
> correct me if I am wrong about this.
>
> Cheers
>                  Radovan
>
> On 2/6/06, Sbaush <sbaush at gmail.com> wrote:
> > Hi all.
> > I would get this element in xml:
> >
> > <date month="02" day="06"  />
> >
> > I have write this:
> >
> > date=ET.SubElement(idsreq,"date")
> >         date.set("month",month)
> >         date.set("day",day)
> >
> > but i get this:
> >
> > <date day="06" month="02" />
> >
> > The attributes are not in my order!!
> > how i can get the attributes in right order???
> > Thanks all.
> >
> > --
> > Sbaush
> > _______________________________________________
> > XML-SIG maillist  -  XML-SIG at python.org
> > http://mail.python.org/mailman/listinfo/xml-sig
> >
> >
> >
>
>
> --
> Radovan Chytracek CERN IT PSS
> mailto:Radovan.Chytracek at cern.ch
> phone: +41227674578 fax: +41227669830
> _______________________________________________
> XML-SIG maillist  -  XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>


--
Sbaush
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20060206/6f3ae73e/attachment.htm 

From fredrik at pythonware.com  Mon Feb  6 20:14:28 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 6 Feb 2006 20:14:28 +0100
Subject: [XML-SIG] problem in ElementTree SubElement
References: <fc5d4c490602060935l4c74efecl@mail.gmail.com><e32b57660602061021h7f4e63ag6ebb65a90c177998@mail.gmail.com>
	<fc5d4c490602061110kc2d8f8fg@mail.gmail.com>
Message-ID: <ds876k$nub$1@sea.gmane.org>

Sbaush wrote:

> is possible to preserve the order building the XML tree with DOM?

no, because the order isn't important in XML.  if you want to invent your own
file format, you shouldn't call it XML, and you shouldn't use XML tools.

</F> 


From dkgunter at lbl.gov  Wed Feb  8 05:13:49 2006
From: dkgunter at lbl.gov (Dan Gunter)
Date: Tue, 07 Feb 2006 20:13:49 -0800
Subject: [XML-SIG] problem in ElementTree SubElement
In-Reply-To: <e32b57660602061021h7f4e63ag6ebb65a90c177998@mail.gmail.com>
References: <fc5d4c490602060935l4c74efecl@mail.gmail.com>
	<e32b57660602061021h7f4e63ag6ebb65a90c177998@mail.gmail.com>
Message-ID: <43E96FFD.4000105@lbl.gov>

Right, in general XML processors don't care about attribute order (I 
don't know much about canonicalization but that does sound like the 
obvious exception). The XML Infoset specifically says they are an 
unordered set: http://www.w3.org/TR/xml-infoset/#infoitem.element ; so, 
if you care about order, rather than canonicalizing everything, maybe 
you should switch to using elements, e.g.

<date><month>02</month><day>06</day></date>

-Dan

Radovan Chytracek wrote:

>Hi,
>
>   you simply can't rely on the order of attributes unless your XML
>data are in canonical form which keeps attributes alphabetically
>ordered. I guess this a very simple way of saying that the SAX parser
>likely to be running behind ElementTree API layer does not preserve
>the order of attributes. In general SAX(2) does not have to. Please
>correct me if I am wrong about this.
>
>Cheers
>                 Radovan
>
>On 2/6/06, Sbaush <sbaush at gmail.com> wrote:
>  
>
>>Hi all.
>>I would get this element in xml:
>>
>><date month="02" day="06"  />
>>
>>I have write this:
>>
>>date=ET.SubElement(idsreq,"date")
>>        date.set("month",month)
>>        date.set("day",day)
>>
>>but i get this:
>>
>><date day="06" month="02" />
>>
>>The attributes are not in my order!!
>>how i can get the attributes in right order???
>>Thanks all.
>>
>>--
>>Sbaush
>>_______________________________________________
>>XML-SIG maillist  -  XML-SIG at python.org
>>http://mail.python.org/mailman/listinfo/xml-sig
>>
>>
>>
>>    
>>
>
>
>--
>Radovan Chytracek CERN IT PSS
>mailto:Radovan.Chytracek at cern.ch
>phone: +41227674578 fax: +41227669830
>_______________________________________________
>XML-SIG maillist  -  XML-SIG at python.org
>http://mail.python.org/mailman/listinfo/xml-sig
>  
>


From fredrik at pythonware.com  Wed Feb  8 09:09:47 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 8 Feb 2006 09:09:47 +0100
Subject: [XML-SIG] problem in ElementTree SubElement
References: <fc5d4c490602060935l4c74efecl@mail.gmail.com><e32b57660602061021h7f4e63ag6ebb65a90c177998@mail.gmail.com>
	<43E96FFD.4000105@lbl.gov>
Message-ID: <dsc90c$e6i$1@sea.gmane.org>

Dan Gunter wrote:

> Right, in general XML processors don't care about attribute order (I
> don't know much about canonicalization but that does sound like the
> obvious exception).

http://www.w3.org/TR/xml-c14n says to sort lexicographically on
(namespace uri, local tag).

(which, of course, is exactly what ET's default writer does)

</F>


From cesar.ortiz at gmail.com  Wed Feb  8 11:46:01 2006
From: cesar.ortiz at gmail.com (Cesar Ortiz)
Date: Wed, 8 Feb 2006 11:46:01 +0100
Subject: [XML-SIG] Encoding detection in the html parser from libxml2
Message-ID: <90255a70602080246xc182997s3c64229925e31133@mail.gmail.com>

Hi,

I am parsing html documents using the html parser from libxml2, and if
the encoding is included in the document it works perfectly but if it
is not, I think it does not work well (probably because I am doing
something wrong).

As it is said in
http://xmlsoft.org/encoding.html<http://www.google.com/url?sa=D&q=http://xmlsoft.org/encoding.html>the
parser should
detect the encoding. So I tested it putting an utf-8 word in a file and
it does not detect it (it generates a wrong string). Example:
reducci??n --> reducci???n.

I just use the parser as a SAX parser because I do not need a tree, so
to parse the file I use the htmlParseChunk() function and I create the
context with htmlCreatePushParser().

Is it posible that the encoding detection does not work with
htmlParseChunk? If it is so, what method should I use?
Thanks, Cesar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20060208/2c6c0901/attachment.htm 

From veillard at redhat.com  Wed Feb  8 12:55:31 2006
From: veillard at redhat.com (Daniel Veillard)
Date: Wed, 8 Feb 2006 06:55:31 -0500
Subject: [XML-SIG] Encoding detection in the html parser from libxml2
In-Reply-To: <90255a70602080246xc182997s3c64229925e31133@mail.gmail.com>
References: <90255a70602080246xc182997s3c64229925e31133@mail.gmail.com>
Message-ID: <20060208115531.GF30975@redhat.com>

On Wed, Feb 08, 2006 at 11:46:01AM +0100, Cesar Ortiz wrote:
> Hi,
> 
> I am parsing html documents using the html parser from libxml2, and if
> the encoding is included in the document it works perfectly but if it
> is not, I think it does not work well (probably because I am doing
> something wrong).

  Well first thing wrong is that this is not libxml2 help mailing list, see
    http://xmlsoft.org/bugs.html

> As it is said in
> http://xmlsoft.org/encoding.html<http://www.google.com/url?sa=D&q=http://xmlsoft.org/encoding.html>the
> parser should
> detect the encoding.

  autodetection is done on XML based on the XMLDecl and the default
values as specified by the XML specification. On HTML all bets are off
if you don't have a meta tag or if you didn't indicate the encoding to the
parser.

> So I tested it putting an utf-8 word in a file and
> it does not detect it (it generates a wrong string). Example:
> reducci??n --> reducci???n.

  encoding is an entity property (i.e. per file) not per word. So either
I don't understand your test or this just can't work.

  http://xmlsoft.org/html/libxml-HTMLparser.html#htmlCreatePushParserCtxt
  use the encoding field when creating your parser.
For further informations/help, subscribe and use the libxml2 mailing-list,

  thanks,

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard at redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

From clerc at uni-bremen.de  Thu Feb  9 14:09:23 2006
From: clerc at uni-bremen.de (Daniel Clerc)
Date: Thu, 9 Feb 2006 14:09:23 +0100
Subject: [XML-SIG] problems with encoding and SAX
Message-ID: <82d07e750602090509p6e07a5dcwbf74f1f665d4a84d@mail.gmail.com>

Hi everybody!

I have some trouble with SAX and encondings...

When I try to parse the following XML-code:

<?xml version="1.0" encoding="WINDOWS-1252" ?>
</TRANSACTION>
<TRANSACTION TIME="03.04.2003 01:52:15" TIME_CODED="37714.0779513889"
DURATION="1001">
  <QUESTION>K'R&#174;</QUESTION>
                       ^^^^^^^^^^^^^^
...

I get this error message.

<SNIPP>
    self._err_handler.fatalError(exc)
  File "C:\Python24\Lib\site-packages\_xmlplus\sax\handler.py", line
38, in fatalError
    raise exception
SAXParseException: xml_temp.xml:3766:13: not well-formed (invalid token)
...

Here you can find the python-code I use:

   http://knopaste.de/index.php?module=hilight&id=142

Maybe the encoding of the content between the xml-elements is
mismatching from the encoding specified. As I have to parse quite a
lot of log files (~1GB zipped), and there are only a handful of such
errors I would be very happy when I could find a way to tell sax just
not to worry and write the string anyway.

Parsing the xml-code with the MS-XML-DOM, or a JAVA-based parser is
not a problem, but I would prefer a solution in Python.

Thanks,

Daniel

From uche.ogbuji at fourthought.com  Thu Feb  9 23:08:36 2006
From: uche.ogbuji at fourthought.com (Uche Ogbuji)
Date: Thu, 09 Feb 2006 15:08:36 -0700
Subject: [XML-SIG] PyXML 0.8.4 and expat byteorder
In-Reply-To: <11FE3731-82C4-4D1E-9ECF-AE50ABE314E4@redivi.com>
References: <11FE3731-82C4-4D1E-9ECF-AE50ABE314E4@redivi.com>
Message-ID: <43EBBD64.9050502@fourthought.com>

Bob Ippolito wrote:
> Here's the PyXML patch that gets expat byteorder from pyconfig.h.  I
> don't know who the maintainer is nor do I have any interest in
> subscribing to xml-sig (this CC will probably bounce, or get stuck in
> mod queue for days/weeks/forever).  If you give a damn about PyXML
> please make sure to get the patch to the right person.
>
> I've never even installed the 4Suite stuff, so I'm not going to put
> together a patch for that.  Such a patch should be roughly the same as
> this one.

Never a worry.  4Suite developers track expat *very* closely (and even
contribute back to expat itself).  We came across and addressed this
issue months ago.


-- 
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/


From uche.ogbuji at fourthought.com  Thu Feb  9 22:57:50 2006
From: uche.ogbuji at fourthought.com (Uche Ogbuji)
Date: Thu, 09 Feb 2006 14:57:50 -0700
Subject: [XML-SIG] PyXML 0.8.4 and expat byteorder
In-Reply-To: <11FE3731-82C4-4D1E-9ECF-AE50ABE314E4@redivi.com>
References: <11FE3731-82C4-4D1E-9ECF-AE50ABE314E4@redivi.com>
Message-ID: <43EBBADE.8050506@fourthought.com>

Bob Ippolito wrote:
> Here's the PyXML patch that gets expat byteorder from pyconfig.h.  I
> don't know who the maintainer is nor do I have any interest in
> subscribing to xml-sig (this CC will probably bounce, or get stuck in
> mod queue for days/weeks/forever).  If you give a damn about PyXML
> please make sure to get the patch to the right person.
>
> I've never even installed the 4Suite stuff, so I'm not going to put
> together a patch for that.  Such a patch should be roughly the same as
> this one.

You can relax.  4Suite developers track expat *very* closely (and even
contribute back to expat itself).  We came across and addressed this
issue months ago.


-- 
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/


From uche.ogbuji at fourthought.com  Thu Feb  9 23:30:55 2006
From: uche.ogbuji at fourthought.com (Uche Ogbuji)
Date: Thu, 09 Feb 2006 15:30:55 -0700
Subject: [XML-SIG] Canonical XML and attribute order
In-Reply-To: <dsc90c$e6i$1@sea.gmane.org>
References: <fc5d4c490602060935l4c74efecl@mail.gmail.com><e32b57660602061021h7f4e63ag6ebb65a90c177998@mail.gmail.com>	<43E96FFD.4000105@lbl.gov>
	<dsc90c$e6i$1@sea.gmane.org>
Message-ID: <43EBC29F.5050102@fourthought.com>

Fredrik Lundh wrote:
> Dan Gunter wrote:
>
>   
>> Right, in general XML processors don't care about attribute order (I
>> don't know much about canonicalization but that does sound like the
>> obvious exception).
>>     
>
> http://www.w3.org/TR/xml-c14n says to sort lexicographically on
> (namespace uri, local tag).
>
> (which, of course, is exactly what ET's default writer does)
>   

I just want to clarify that there is a lot more to canonicalization than
that.  There's surely no problem with adopting conventions from
Canonical XML, but it doesn't really make sense to treat that spec as an
authority in snippets.  Either you have Canonical XML or you don't.

FYI if you do want Canonical XML, you can use PyXML's c14n module, or
you can use PyGenx to generate XML:

http://software.translucentcode.org/pygenx/

PyGenx is based on Genx, which always creates Canonical XML.

Side note: I have a c14n module I've put together for Amara, and it's
intended for the next release.  It's based on 4Suite's fast SAX parser,
contrasting PyXML's, which is DOM-based (PyGenx is expat based, and thus
SAX-like).

Ob c14n reference: http://www.ibm.com/developerworks/xml/library/x-c14n/

All that having been said, the OP is looking to address a common problem
among makers of XML authoring tools--the need to respect the user's
choice of attribute order and other such lexical details.  It's not
really useful to repeat over and over that the XML spec states that
attribute order is not considered significant in determining the
conformance of a parser.  And it's very unfair to state that the OP is
somehow fudging the grand name of "XML".  Just as a fun exercise in
monkey-wrench  throwing, if you read carefully enough, there's the
little-known fact that XML 1.0 doesn't require parsers to report child
elements in any particular order, either.

It's more useful to say that most XML parsers do choose to ignore
attribute order , because they are based on an abstract information
model of XML (such as the Infoset, the XPath data model or the like)
rather than the lexical form of the entities.  For this reason most XML
editing tools rely on either specialized raw text frameworks, or a
hybrid of raw text with XML events (more usually the latter).  This does
not mean that they are not XML processors, but just that they do choose
to preserve details that the XML spec does not *require* them to
preserve.  The OP's best bet is to reuse another engine that already
gets this right, although I admit that I don't know of one available for
Python.  I certainly do not write such tools, but my colleague Simon
St.Laurent did have a go at such a generic tool for Java.

Ob XML and information ordering reference:
http://www-128.ibm.com/developerworks/xml/library/x-eleord.html

-- 
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/


From mike at skew.org  Thu Feb  9 23:26:45 2006
From: mike at skew.org (Mike Brown)
Date: Thu, 9 Feb 2006 15:26:45 -0700 (MST)
Subject: [XML-SIG] problems with encoding and SAX
In-Reply-To: <82d07e750602090509p6e07a5dcwbf74f1f665d4a84d@mail.gmail.com>
Message-ID: <200602092226.k19MQjRS084852@chilled.skew.org>

Daniel Clerc wrote:
> Hi everybody!
> 
> I have some trouble with SAX and encondings...
> 
> When I try to parse the following XML-code:
> 
> <?xml version="1.0" encoding="WINDOWS-1252" ?>
> </TRANSACTION>
> <TRANSACTION TIME="03.04.2003 01:52:15" TIME_CODED="37714.0779513889"
> DURATION="1001">
>   <QUESTION>K'R&#174;</QUESTION>
>                        ^^^^^^^^^^^^^^

Look where the closing tag for TRANSACTION is.
Copy-paste error in your email? Or does the XML actually
look like that?

You also seem to have a couple of illegal control characters in
your QUESTION element. My editor shows them as ^Y^N, so I guess
they are U+0019 and U+000E, respectively. Both are disallowed in
XML.

From clerc at uni-bremen.de  Fri Feb 10 13:20:03 2006
From: clerc at uni-bremen.de (Daniel Clerc)
Date: Fri, 10 Feb 2006 13:20:03 +0100
Subject: [XML-SIG] [SOLVED] Re: problems with encoding and SAX
In-Reply-To: <82d07e750602090509p6e07a5dcwbf74f1f665d4a84d@mail.gmail.com>
References: <82d07e750602090509p6e07a5dcwbf74f1f665d4a84d@mail.gmail.com>
Message-ID: <82d07e750602100420m30fe7cbevdb5b199b2a28379e@mail.gmail.com>

Hi!

Thanks for your help!

In the XML-file are illegal chars. See:
http://www.w3.org/TR/2004/REC-xml-20040204/#charsets for legal chars.

So I need to build a regexp in order to get rid off the unwanted characters.

 best regards,

 Daniel


On 2/9/06, Daniel Clerc <clerc at uni-bremen.de> wrote:
> Hi everybody!
>
> I have some trouble with SAX and encondings...
>
> When I try to parse the following XML-code:
>
> <?xml version="1.0" encoding="WINDOWS-1252" ?>
> </TRANSACTION>
> <TRANSACTION TIME="03.04.2003 01:52:15" TIME_CODED="37714.0779513889"
> DURATION="1001">
>   <QUESTION>K'R&#174;</QUESTION>
<SNAPP>

From guthrie at mum.edu  Sat Feb 11 18:34:30 2006
From: guthrie at mum.edu (Gregory Guthrie)
Date: Sat, 11 Feb 2006 11:34:30 -0600
Subject: [XML-SIG] python XML install problem..
Message-ID: <6.2.5.6.2.20060211113119.01cfb4e8@mum.edu>

  I am trying to use a package from:
 From Python Cookbook;
     http://aspn.activestate.com/ASPN/WebServices/SWSAPI/pytut

It uses XML package;
    so I got: PyXML-0.8.4

When I try ot instsall in (On WIndows..)
   python setup.py install;

I get:
D:\Temp\Python\PyXML-0.8.4>python setup.py install
running install
running build
running build_py
running build_ext
error: The .NET Framework SDK needs to be installed before building 
extensions for Python.
Thanks.


-----------------------------------------------
Gregory Guthrie

MUM Faculty Mail - FM 1068
Fairfield, IA 52557

http://www.mum.edu/~guthrie
(641)472-7773
------------------------------------------------ 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20060211/ae74afde/attachment.html 

From mike at skew.org  Sat Feb 11 22:25:50 2006
From: mike at skew.org (Mike Brown)
Date: Sat, 11 Feb 2006 14:25:50 -0700 (MST)
Subject: [XML-SIG] python XML install problem..
In-Reply-To: <6.2.5.6.2.20060211113119.01cfb4e8@mum.edu>
Message-ID: <200602112125.k1BLPoOO014285@chilled.skew.org>

> D:\Temp\Python\PyXML-0.8.4>python setup.py install
> running install
> running build
> running build_py
> running build_ext
> error: The .NET Framework SDK needs to be installed before building 
> extensions for Python.

On Windows, you don't need to build PyXML from source.

Go to http://sourceforge.net/project/showfiles.php?group_id=6473
and download the appropriate installer file.  For Python 2.4 you
just need PyXML-0.8.4.win32-py2.4.exe.

From ken.beesley at xrce.xerox.com  Sun Feb 12 14:20:49 2006
From: ken.beesley at xrce.xerox.com (Ken Beesley)
Date: Sun, 12 Feb 2006 14:20:49 +0100
Subject: [XML-SIG] Python 2.4.2, OS X, ucs4 build, unicodedata problem
In-Reply-To: <F8A4CF7E-B42B-49CA-9B75-DEBC93EA1AA8@free.fr>
References: <F8A4CF7E-B42B-49CA-9B75-DEBC93EA1AA8@free.fr>
Message-ID: <43EF3631.8050609@xrce.xerox.com>


Python 2.4.2, OS X, ucs4 build, unicodedata problem

I need a ucs4 build of Python to reliably handle XML files that
can contain supplemental Unicode characters (newer characters
beyond the Basic Multilingual Plane). I recently upgraded to OS X
10.4.4 and downloaded (from http://www.python.org/download) the
sources for Python 2.4.2. After detarring the package, I did

./configure --enable-framework --enable-unicode-ucs4
make
sudo make install

Which created and installed a 2.4.2 Python executable,
/Library/Frameworks/Python.framework/Versions/2.4/bin/python

I can run it, and I confirmed that it is a ucs4 build, e.g.

len(u'\U00010400')

returns 1, rather than the 2 returned by a ucs2 build. (Python 2.3.5,
supplied with 10.4, is a ucs2 build.)

THE PROBLEM: when I try (manually, or in a script) to import the
unicodedata package, I get the traceback below, which seems to
complain about a symbol __PyUnicodeUCS2_ToNumeric not being
found when the unicodedata module is imported.

Has anyone out there seen or dealt with this problem?
Am I just doing something wrong?

Thanks,

Ken

*********************** Traceback ****************************

% python
Python 2.4.2 (#2, Oct 24 2005, 22:26:37)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> import unicodedata
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ImportError: Failure linking new module: /Library/Frameworks/ 
Python.framework/Versions/2.4/lib/python2.4/lib-dynload/ unicodedata.so: 
Symbol not found: __PyUnicodeUCS2_ToNumeric
Referenced from: /Library/Frameworks/Python.framework/Versions/2.4/ 
lib/python2.4/lib-dynload/unicodedata.so
Expected in: dynamic lookup


From mike at skew.org  Mon Feb 13 00:28:48 2006
From: mike at skew.org (Mike Brown)
Date: Sun, 12 Feb 2006 16:28:48 -0700 (MST)
Subject: [XML-SIG] Python 2.4.2, OS X, ucs4 build, unicodedata problem
In-Reply-To: <43EF3631.8050609@xrce.xerox.com>
Message-ID: <200602122328.k1CNSmPG030138@chilled.skew.org>

Ken Beesley wrote:
> % python
> Python 2.4.2 (#2, Oct 24 2005, 22:26:37)
> [GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import unicodedata
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> ImportError: Failure linking new module: /Library/Frameworks/ 
> Python.framework/Versions/2.4/lib/python2.4/lib-dynload/ unicodedata.so: 
> Symbol not found: __PyUnicodeUCS2_ToNumeric
> Referenced from: /Library/Frameworks/Python.framework/Versions/2.4/ 
> lib/python2.4/lib-dynload/unicodedata.so
> Expected in: dynamic lookup

Since it doesn't have anything directly to do with XML in Python, I suggest
posting to python-list / comp.lang.python:
  http://mail.python.org/mailman/listinfo/python-list

More people who can help you monitor that forum.
Good luck.

From inguin at gmx.de  Mon Feb 13 15:30:42 2006
From: inguin at gmx.de (Ingo van Lil)
Date: Mon, 13 Feb 2006 15:30:42 +0100
Subject: [XML-SIG] pyexpat: Comments before DOCTYPE
Message-ID: <20060213143042.GA10101@marvin.csn.tu-chemnitz.de>

Hello there,

I ran into a minor problem using the xml.dom.minidom XML parser: An XML
document having a comment before a DOCTYPE node seems to leave the DOM
data structures in an inconsistent state.

Let's say I have a little test.xml file:

    <?xml version="1.0"?>
    <!-- comment -->
    <!DOCTYPE test SYSTEM "test.dtd">
    <test> <tag2> Hello world </tag2> </test>

and a little Python program to parse it:

    from xml.dom.minidom import parse
    dom = parse("test.xml")
    print "document node:", dom
    print len(dom.childNodes), "children"
    print "first child:", dom.firstChild
    print "next sibling:", dom.firstChild.nextSibling

The output of that program is:

    document node: <xml.dom.minidom.Document instance at 0xb7b82b6c>
    3 children
    first child: <DOM Comment node " comment ">
    next sibling: None

I.e. the document node does have three children (a comment node, a
DocumentType instance and an element), but the first child's nextSibling
pointer isn't set correctly. This breaks my algorithm, which is supposed
to recursively walk the entire DOM tree, but stops after the first node
instead.

I'm not entirely sure whether this really is a bug in pyexpat or an
error in my XML file. I haven't found any hints whether an XML document
is allowed to have comment before the DOCTYPE declaration. xmllint
doesn't seem to complain about it, though.

        Cheers,
            Ingo


From inguin at gmx.de  Mon Feb 13 21:47:25 2006
From: inguin at gmx.de (Ingo van Lil)
Date: Mon, 13 Feb 2006 21:47:25 +0100
Subject: [XML-SIG] pyexpat: Comments before DOCTYPE
In-Reply-To: <20060213143042.GA10101@marvin.csn.tu-chemnitz.de>
References: <20060213143042.GA10101@marvin.csn.tu-chemnitz.de>
Message-ID: <20060213204725.GA6320@marvin.csn.tu-chemnitz.de>

On 13 Feb 2006, Ingo van Lil wrote:

> I ran into a minor problem using the xml.dom.minidom XML parser: An XML
> document having a comment before a DOCTYPE node seems to leave the DOM
> data structures in an inconsistent state.

Hi again. I had a look at the source code, and the reason for the effect
I observed isn't all that hard to spot: The start_doctype_decl_handler
in expatbuilder.py:240 directly manipulates the document's childNodes
vector rather than using the _append_child function responsible for
keeping all those nextSibling/previousSibling/parentNode pointers
up-to-date.
Unless the current behaviour is for some reason intentional (I doubt
it), the appended patch (against Python 2.4.2) should fix the problem.

        Cheers,
            Ingo

-------------- next part --------------
--- Lib/xml/dom/expatbuilder.py.orig	2006-02-13 20:53:44.000000000 +0100
+++ Lib/xml/dom/expatbuilder.py	2006-02-13 20:55:29.000000000 +0100
@@ -242,7 +242,7 @@
         doctype = self.document.implementation.createDocumentType(
             doctypeName, publicId, systemId)
         doctype.ownerDocument = self.document
-        self.document.childNodes.append(doctype)
+        _append_child(self.document, doctype)
         self.document.doctype = doctype
         if self._filter and self._filter.acceptNode(doctype) == FILTER_REJECT:
             self.document.doctype = None

From ajay at infogridpacific.com  Mon Feb 20 14:33:44 2006
From: ajay at infogridpacific.com (Ajay Abhyankar)
Date: Mon, 20 Feb 2006 19:03:44 +0530
Subject: [XML-SIG] Namespace prefix being changed while saving file.
Message-ID: <43F9C538.4050409@infogridpacific.com>

Hi,

I was trying cElementTree for reading and updating an xml file. I am 
using iterparse to parse and make relevant changes to the xml as required.
Everything works very fine till I use a valid xml namespace in xml file. 
It is not giving any problems in manipulation of file content, but only 
changes the namespace prefix on its own to something like "ns0" and 
retains the original URL, when the file is written back after updates.
Can the namespace prefix be retained after manipultion? Am I doing 
something wrong or have I missed out on something.
Please help to understand and solve the problem.

Thanks in advance.
Ajay


From vincent at hydrosoft.com.br  Thu Feb 23 14:31:23 2006
From: vincent at hydrosoft.com.br (Vincent Buonomano)
Date: Thu, 23 Feb 2006 10:31:23 -0300
Subject: [XML-SIG] Constructing complex trees from relational data bases
	using XML schemas
Message-ID: <002001c6387d$73b5ce80$0100000a@star>

The XMLServer presents an XML view of a relational database which is defined by an XML schema with all the necessary information contained in the appinfo elements of the schema. It serves the constructed XML to a browser, Java or Mathematica. You may get a record by key and the next or previous one. 

There are currently a large number of products dedicated to this end (see Bourret ). What distinguishes this product is it's ability to construct arbitrarily complex trees. 

Examples

You may download it from XMLServer Beta Version 1, and freely use and distribute it. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20060223/f0ed5d55/attachment.html 

From fredrik at pythonware.com  Tue Feb 28 08:31:19 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 28 Feb 2006 08:31:19 +0100
Subject: [XML-SIG] Namespace prefix being changed while saving file.
References: <43F9C538.4050409@infogridpacific.com>
Message-ID: <du0u88$45s$1@sea.gmane.org>

Ajay Abhyankar wrote:

> I was trying cElementTree for reading and updating an xml file. I am
> using iterparse to parse and make relevant changes to the xml as required.
> Everything works very fine till I use a valid xml namespace in xml file.
> It is not giving any problems in manipulation of file content, but only
> changes the namespace prefix on its own to something like "ns0" and
> retains the original URL, when the file is written back after updates.
> Can the namespace prefix be retained after manipultion? Am I doing
> something wrong or have I missed out on something.
> Please help to understand and solve the problem.

the standard ET parser throws away the prefix, and the standard
serializer generates new prefixes on the fly.

for many applications, this is not a problem -- it's the namespace
URL that matters in XML, not the prefix.

if you want to preserve namespaces under stock ET, your best bet
is to use iterparse's namespace events to collect prefix information,
and either update the _namespace_map dictionary:

    from elementtree import ElementTree

    # undocumented, guaranteed to be supported in all 1.2 releases
    ElementTree._namespace_map[url] = prefix
    ElementTree._namespace_map[url] = prefix
    ...
    ... the serializer now maps {url}foo to prefix:foo, for all url/prefix
    ... pairs in the namespace map
    ...

or use a custom serializer (or a postprocessing step).

hope this helps!

</F>


From dieter at handshake.de  Tue Feb 28 19:40:53 2006
From: dieter at handshake.de (Dieter Maurer)
Date: Tue, 28 Feb 2006 19:40:53 +0100
Subject: [XML-SIG] Python 2.4.2, OS X, ucs4 build, unicodedata problem
In-Reply-To: <43EF3631.8050609@xrce.xerox.com>
References: <F8A4CF7E-B42B-49CA-9B75-DEBC93EA1AA8@free.fr>
	<43EF3631.8050609@xrce.xerox.com>
Message-ID: <17412.39221.220486.888777@gargle.gargle.HOWL>

Ken Beesley wrote at 2006-2-12 14:20 +0100:
> ...
>THE PROBLEM: when I try (manually, or in a script) to import the
>unicodedata package, I get the traceback below, which seems to
>complain about a symbol __PyUnicodeUCS2_ToNumeric not being
>found when the unicodedata module is imported.

Looks that you did not rebuild "unicodedata".

Whenever you change the generations options for a Python build,
you usually need to regenerate all extensions (such as "unicodedata")
to ensure that they use the same options.

Usually, a "make clean" should be sufficient to get rid of the old
versions.

-- 
Dieter