From scipy-user-bounces at scipy.org  Tue Mar  4 08:56:34 2008
From: scipy-user-bounces at scipy.org (scipy-user-bounces at scipy.org)
Date: Tue, 04 Mar 2008 01:56:34 -0600
Subject: [XML-SIG] Your message to SciPy-user awaits moderator approval
Message-ID: <mailman.55813.1204617394.22623.scipy-user@scipy.org>

Your mail to 'SciPy-user' with the subject

    (no subject)

Is being held until the list moderator can review it for approval.

The reason it is being held:

    SpamAssassin identified this message as possible spam (score 6)

Either the message will get posted to the list, or you will receive
notification of the moderator's decision.  If you would like to cancel
this posting, please visit the following URL:

    http://projects.scipy.org/mailman/confirm/scipy-user/685b303324561f228108a315973e86c4a47cc0c0


From martin at v.loewis.de  Sat Mar  8 13:11:52 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 08 Mar 2008 13:11:52 +0100
Subject: [XML-SIG] PyXML for py 2.5
In-Reply-To: <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de>
References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com>	<472ADB06.3090907@v.loewis.de>
	<472AE76E.8060305@behnel.de>	<472AEA6A.9040102@v.loewis.de>
	<19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de>
Message-ID: <47D28288.7020403@v.loewis.de>

> What about changing the "XML" link on the Python homepage to point to a
> Wiki page? I think this one would come close:
> 
> http://wiki.python.org/moin/PythonXml

Ok, I changed it so.

Regards,
Martin

From martin at v.loewis.de  Mon Mar 10 08:06:31 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 10 Mar 2008 08:06:31 +0100
Subject: [XML-SIG] Converting XML Schema to data struture and then to XML
In-Reply-To: <479DB671.8000608@behnel.de>
References: <116467.61512.qm@web35906.mail.mud.yahoo.com>
	<479DB671.8000608@behnel.de>
Message-ID: <47D4DDF7.1070801@v.loewis.de>

> BTW, from the POV of objectify, generating Python classes from a schema would
> basically mean infering a document instance from an XML Schema (sort of a
> meta-model to model transformation). I find that an interesting relation, but
> maybe that's just me...

It's just you. It would *not* be a meta-model to model transformation, 
but a meta-model-to-meta-model one. The schema defines a type system,
just as a set of Python classes does. Instances of the schema (i.e.
a document) then correspond to a set of instances of these classes.

It's a very natural thing to do, and has been done in other languages
dozens of time. It gives the term "document type" a true representation
in the programming language.

Regards,
Martin

From stefan_ml at behnel.de  Mon Mar 10 08:46:51 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 10 Mar 2008 08:46:51 +0100
Subject: [XML-SIG] Converting XML Schema to data struture and then to XML
In-Reply-To: <47D4DDF7.1070801@v.loewis.de>
References: <116467.61512.qm@web35906.mail.mud.yahoo.com>	<479DB671.8000608@behnel.de>
	<47D4DDF7.1070801@v.loewis.de>
Message-ID: <47D4E76B.1090606@behnel.de>

Hi,

Martin v. L?wis wrote:
>> BTW, from the POV of objectify, generating Python classes from a schema would
>> basically mean inferring a document instance from an XML Schema (sort of a
>> meta-model to model transformation). I find that an interesting relation, but
>> maybe that's just me...
> 
> It's just you. It would *not* be a meta-model to model transformation, 
> but a meta-model-to-meta-model one.

It obviously is a meta-to-meta model transformation to generate Python classes
from a schema, but "inferring a document instance from an XML Schema" is not.


> The schema defines a type system,
> just as a set of Python classes does. Instances of the schema (i.e.
> a document) then correspond to a set of instances of these classes.

Objectify doesn't generate code. Instead, it comes with an extensible
meta-model that resembles the basic Python type system, and which gets mapped
on a tree at runtime. So the act of inferring the classes from the schema is
actually linked to the instance, not the meta model. And the link is done
through validation, which assures that the document really is an instance. So
we end up with classes that represent an instance of a meta-model. There is no
intermediate step of a meta-to-meta model transformation.

Stefan

From somayeh.farnoush at gmail.com  Mon Mar 10 11:31:12 2008
From: somayeh.farnoush at gmail.com (Somayeh Farnoush)
Date: Mon, 10 Mar 2008 02:31:12 -0800
Subject: [XML-SIG] Pyxml
Message-ID: <b8a400250803100331y76be8b32u15e897730c56681d@mail.gmail.com>

Dear sir,

I have installed PyXml , but when I run this

$ rpm -qa | grep python-xml

it does not returned anything. does it mean that python-xml is missed? How
can I fix it?


regards,
SF
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20080310/c0cda4a5/attachment.htm 

From stefan_ml at behnel.de  Mon Mar 10 12:05:34 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 10 Mar 2008 12:05:34 +0100
Subject: [XML-SIG] Pyxml
In-Reply-To: <b8a400250803100331y76be8b32u15e897730c56681d@mail.gmail.com>
References: <b8a400250803100331y76be8b32u15e897730c56681d@mail.gmail.com>
Message-ID: <47D515FE.7010200@behnel.de>

Hi,

Somayeh Farnoush wrote:
> Dear sir,

You've just missed half of the world population here.


> I have installed PyXml , but when I run this
> 
> $ rpm -qa | grep python-xml
> 
> it does not returned anything. does it mean that python-xml is missed? How
> can I fix it?

Depends on how you installed it. Did you use rpm for it? Is the package called
"python-xml" on your platform? What did rpm tell you when it installed it?

Did you also try

$ rpm -qa | grep -i python | grep -i xml

Stefan


From stefan_ml at behnel.de  Mon Mar 10 12:34:28 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 10 Mar 2008 12:34:28 +0100
Subject: [XML-SIG] Pyxml
In-Reply-To: <b8a400250803100414r53b8e891je2364476c0400493@mail.gmail.com>
References: <b8a400250803100331y76be8b32u15e897730c56681d@mail.gmail.com>	
	<47D515FE.7010200@behnel.de>
	<b8a400250803100414r53b8e891je2364476c0400493@mail.gmail.com>
Message-ID: <47D51CC4.9090003@behnel.de>

Hi,

Somayeh Farnoush wrote:
> I've installed it on redhat Enterprise4 and use the
> PyXML-0.8.4.tar.gz<http://downloads.sourceforge.net/pyxml/PyXML-0.8.4.tar.gz?modtime=1101741917&big_mirror=0>
> form http://sourceforge.net/project/showfiles.php?group_id=6473

and you've done *what* with that tar.gz file?

In case you ran "setup.py install", note that you can also run

    python setup.py bdist_rpm

to build a .rpm file which you can then install with the normal rpm tool.

Stefan


From strangest at comcast.net  Mon Mar 10 12:49:53 2008
From: strangest at comcast.net (Gloria)
Date: Mon, 10 Mar 2008 07:49:53 -0400
Subject: [XML-SIG] Pyxml
In-Reply-To: <47D515FE.7010200@behnel.de>
References: <b8a400250803100331y76be8b32u15e897730c56681d@mail.gmail.com>
	<47D515FE.7010200@behnel.de>
Message-ID: <47D52061.5030100@comcast.net>

Stefan Behnel wrote:
> Hi,
>
> Somayeh Farnoush wrote:
>   
>> Dear sir,
>>     
>
> You've just missed half of the world population here.
>   
LOL!
>
>   
>> I have installed PyXml , but when I run this
>>
>> $ rpm -qa | grep python-xml
>>
>> it does not returned anything. does it mean that python-xml is missed? How
>> can I fix it?
>>     
>
> Depends on how you installed it. Did you use rpm for it? Is the package called
> "python-xml" on your platform? What did rpm tell you when it installed it?
>
> Did you also try
>
> $ rpm -qa | grep -i python | grep -i xml
>
> Stefan
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20080310/a0760fdd/attachment.htm 

From stefan_ml at behnel.de  Mon Mar 10 13:13:16 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 10 Mar 2008 13:13:16 +0100
Subject: [XML-SIG] Pyxml
In-Reply-To: <b8a400250803100439i3d479186qe1c45b975a61863b@mail.gmail.com>
References: <b8a400250803100331y76be8b32u15e897730c56681d@mail.gmail.com>	
	<47D515FE.7010200@behnel.de>	
	<b8a400250803100414r53b8e891je2364476c0400493@mail.gmail.com>	
	<47D51CC4.9090003@behnel.de>
	<b8a400250803100439i3d479186qe1c45b975a61863b@mail.gmail.com>
Message-ID: <47D525DC.4060303@behnel.de>

Hi,

please keep this discussion on the list.

Somayeh Farnoush wrote:
> I've just run
> setup.py build
> setup.py install
> .....
> I am trying to install mpi intel which needs Python and Pyxml ... in
> troubleshooting part of installing intel mpi suggested to use the command
> rpm -qa | grep python-xml
> to enshure existance of pyxml properly.

That's ok, they just didn't know what they were doing either.

If you ran "setup.py install" as root (and it didn't fail), then PyXML should
be correctly installed. It's just that RPM doesn't know about it as you didn't
install it using the rpm command.

Stefan

From info at mosp-tech.se  Mon Mar 10 23:30:44 2008
From: info at mosp-tech.se (info at mosp-tech.se)
Date: Mon, 10 Mar 2008 23:30:44 +0100 (CET)
Subject: [XML-SIG] PyXML Howto
Message-ID: <63893.85.228.252.45.1205188244.squirrel@webmail01.one.com>

Hi!

I am writing to you to see if there is a tar.bz2 or tar.gz file of
http://pyxml.sourceforge.net/topics/howto/xml-howto.html available to
download. I am currently translating and posting different python related
tutorials and howtos to swedish and i would like to do this with that
howto.

Best regards,

Mikael J


From stefan_ml at behnel.de  Tue Mar 11 08:34:51 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 11 Mar 2008 08:34:51 +0100
Subject: [XML-SIG] PyXML Howto
In-Reply-To: <63893.85.228.252.45.1205188244.squirrel@webmail01.one.com>
References: <63893.85.228.252.45.1205188244.squirrel@webmail01.one.com>
Message-ID: <47D6361B.6020402@behnel.de>

Hi,

info at mosp-tech.se wrote:
> I am writing to you to see if there is a tar.bz2 or tar.gz file of
> http://pyxml.sourceforge.net/topics/howto/xml-howto.html available to
> download. I am currently translating and posting different python related
> tutorials and howtos to swedish and i would like to do this with that
> howto.

Hmmm, that's an old version (0.7.1) of a tutorial for a no-longer-maintained
library. I don't think there's much use in translating it.

If you want to translate an XML tutorial for Python, especially for people who
have little experience with XML processing, try this:

http://effbot.org/zone/element.htm

Or, ask in the comp.lang.python newsgroup what others consider a good "Python
and XML" tutorial that's worth being translated.

Stefan

From ht at inf.ed.ac.uk  Thu Mar 13 11:24:36 2008
From: ht at inf.ed.ac.uk (Henry S. Thompson)
Date: Thu, 13 Mar 2008 10:24:36 +0000
Subject: [XML-SIG] PyXML for py 2.5
In-Reply-To: <47C10103.20908@v.loewis.de> (Martin v.
	=?iso-2022-int-1?B?TPZ3aXMncw==?= message of "Sun, 24 Feb 2008 06:30:43
	+0100")
References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com>
	<472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de>
	<472AEA6A.9040102@v.loewis.de>
	<19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de>
	<47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de>
Message-ID: <f5bod9i53bv.fsf@hildegard.inf.ed.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin v. L?wis writes:

> If you found that validation is a processing need, I strongly recommend
> that you re-evaluate your processing needs (whether you use Python
> or not). IMHO, validation is much over-rated and over-used.

Strong words, which I strongly disagree with.  "Validate at trust
boundaries" is a long-standing and helpful mantra, IMO.  If you're
only processing XML you produce yourself, sure, validation is probably
unnecessary.  But if you're accepting XML from others, a validating
parser will simplify your code and give your users better error
reporting.

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht at inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFH2QDkkjnJixAXWBoRAqt5AJ9Np/DQ9YlscIIkIda9fMJDt8AegQCdGAS3
5T+DZHKYZnKzazF/C1w6i2g=
=XEqC
-----END PGP SIGNATURE-----

From naeemkhans79 at hotmail.com  Sun Mar 16 19:45:43 2008
From: naeemkhans79 at hotmail.com (Khan1814)
Date: Sun, 16 Mar 2008 11:45:43 -0700 (PDT)
Subject: [XML-SIG]  XMI Access in Netbeans 5.5
Message-ID: <16082402.post@talk.nabble.com>


Hello everyone,

I have transformed the UML diagram into XMI file and wana to use it in
Netbeans for the purpose of checking logical errors in the class diagram. In
this connection, I have also added the MDR library in the project. But I
dont know how to assign my XMI file to it and access it in java. Can any
body help me out to make it posible??

Regards,
Khan

-- 
View this message in context: http://www.nabble.com/XMI-Access-in-Netbeans-5.5-tp16082402p16082402.html
Sent from the Python - xml-sig mailing list archive at Nabble.com.


From spammb at gmail.com  Thu Mar 20 23:27:09 2008
From: spammb at gmail.com (Michael Becker)
Date: Thu, 20 Mar 2008 15:27:09 -0700 (PDT)
Subject: [XML-SIG] Issues with XMLTreeBuilder in cElementTree and
	ElementTree (Cross-post from comp.lang.python)
Message-ID: <697339f4-c549-4022-b945-434f2909cdfc@13g2000hsb.googlegroups.com>

I had some xmls being output by an application whose formatting did
not allow for easy editing by humans so I was trying to write a short
python app to pretty print xml files. Most of the data in these xml
files is in the attributes so I wanted each attribute on its own line.
I wrote a short app using xml.etree.ElementTree.XMLTreeBuilder(). To
my dismay the attributes were getting reordered. I found that the
implementation of XMLTreeBuilder did not make proper use of the
ordered_attributes attribute of the expat parser (which it defaults
to). The constructor sets ordered_attributes = 1 but then the
_start_list method iterates through the ordered list of attributes and
stores them in a dictionary! This is incredibly unintuitive and seems
to me to be a bug. I would recommend the following changes to
ElementTree.py:

class XMLTreeBuilder:
...
    def _start_list(self, tag, attrib_in):
        fixname = self._fixname
        tag = fixname(tag)
        attrib = []
        if attrib_in:
            for i in range(0, len(attrib_in), 2):
 
attrib.append((fixname(attrib_in[i]),self._fixtext(attrib_in[i+1])))
        return self._target.start(tag, attrib)

class _ElementInterface:
...

    def items(self):
        try:
            return self.attrib.items()
        except AttributeError:
            return self.attrib

These changes would allow the user to take advantage of the
ordered_attributes attribute in the expat parser to use either ordered
or unorder attributes as desired. For backwards compatibility it might
be desirable to change XMLTreeBuilder to default to ordered_attributes
= 0. I've never submitted a bug fix to a python library so if this
seems like a real bug please let me know how to proceed.

Secondly, I found a potential issue with the cElementTree module. My
understanding (which could be incorrect) of python C modules is that
they should work the same as the python versions but be more
efficient. The XMLTreeBuilder class in cElementTree doesn't seem to be
using the same parser as that in ElementTree. The following code
illustrates this issue:

>>> import xml.etree.cElementTree
>>> t1=xml.etree.cElementTree.XMLTreeBuilder()
>>> t1._parser.ordered_attributes = 1

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: _parser

>>> import xml.etree.ElementTree
>>> t1=xml.etree.ElementTree.XMLTreeBuilder()
>>> t1._parser.ordered_attributes = 1

In case it is relevant, here is the version and environment
information:
tpadmin at osswlg1{/tpdata/ossgw/config} $ python -V
Python 2.5.1
tpadmin at osswlg1{/tpdata/ossgw/config} $ uname -a
SunOS localhost 5.10 Generic_118833-33 sun4u sparc SUNW,Netra-240

From smcg4191 at frii.com  Mon Mar 24 04:56:59 2008
From: smcg4191 at frii.com (Stuart McGraw)
Date: Sun, 23 Mar 2008 21:56:59 -0600
Subject: [XML-SIG] lxml iterparse and comments
Message-ID: <AKEFJAHAPDBEDKIICNJKIEEBCEAA.smcg4191@frii.com>

Hello,

I am probably mising something elementary (I am new
to both xml and lxml), but I am having problems figuring 
out how to get comments when using lxml's iterparse().  
When I parse xml with parse() and iterate though the 
result, I get the comments.  But when I try to do the
same thing (approximately I think) with iterparse, 
I don't see any comments.  See example code below.  
(lxml-2.02, Python-2.5.1)

(I was using the standard Python ElementTree but my 
understanding is that it doesn't save comments at all.  
If that's wrong I would go back to using it).

The real file is ~50MB and has about 1M nodes under the 
root so I have to use iterparse and I also have to process 
comments, so I would really appreciate a clue about how 
to do it.  Thanks.

Example code:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
import lxml.etree as ET
from cStringIO import StringIO

# XML data...
#=============================================
xmltxt = \
'''<?xml version="1.0" encoding="UTF-8"?>
<!-- Rev 1.06 
-->
<!DOCTYPE Test [
<!ELEMENT Test (entry*)>
<!--                                                                   -->
<!ELEMENT entry ANY>
	<!-- Description of <entry> element.
	-->
]>
<!-- File created: 2008-02-27 -->
<Test>
<!--  Chronosynclastic Infindibulum Listing -->
<entry>text 1</entry>
<!-- Deleted:  A1500477 -->
<entry>text 2</entry>
</Test>'''
#=============================================

print 'Parse:\n------'
et = ET.parse( StringIO (xmltxt))
for elem in et.iter():
    print elem

print '\nIterparse:\n----------'
xx = ET.iterparse( StringIO (xmltxt), ("start","end"))
for event, elem in iter(xx):
    print event, elem


From stefan_ml at behnel.de  Mon Mar 24 08:33:53 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 24 Mar 2008 08:33:53 +0100
Subject: [XML-SIG] lxml iterparse and comments
In-Reply-To: <AKEFJAHAPDBEDKIICNJKIEEBCEAA.smcg4191@frii.com>
References: <AKEFJAHAPDBEDKIICNJKIEEBCEAA.smcg4191@frii.com>
Message-ID: <47E75961.2040503@behnel.de>

Hi,

Stuart McGraw wrote:
> I am probably mising something elementary (I am new
> to both xml and lxml), but I am having problems figuring 
> out how to get comments when using lxml's iterparse().  
> When I parse xml with parse() and iterate though the 
> result, I get the comments.  But when I try to do the
> same thing (approximately I think) with iterparse, 
> I don't see any comments.

While the comments end up in the tree that iterparse generates, they do not
show up in the events. Now that you mention it, I actually think that should
change. There should be events "comment" and "pi" that yield them if requested.


> I was using the standard Python ElementTree but my 
> understanding is that it doesn't save comments at all.

ElementTree strips comments in the parser, that's right.


> The real file is ~50MB and has about 1M nodes under the 
> root so I have to use iterparse and I also have to process 
> comments, so I would really appreciate a clue about how 
> to do it.  Thanks.

Have you tried the parser target interface? It's a SAX-like interface that
uses callbacks.

http://codespeak.net/lxml/parsing.html#the-target-parser-interface
http://effbot.org/elementtree/elementtree-xmlparser.htm#the-target-interface

Stefan

From jmaze at fas.harvard.edu  Tue Mar 25 01:22:02 2008
From: jmaze at fas.harvard.edu (Jero Maze)
Date: Mon, 24 Mar 2008 20:22:02 -0400
Subject: [XML-SIG] How do I test PyXML
Message-ID: <47E845AA.80809@fas.harvard.edu>

To whom it may concern,

I'm trying to use the extension for "Inkscape", "textext" which needs 
"PyXML". I've been  trying to install PyXML and then uses "textext" 
without success so I don't know if I'm installing PyXML correctly.

My OS is Mac OS X version 10.5.2

When I run "python regrtest.py" I got the message below (this might be 
helpful)

Sincerely,
Jero


test_c14n
test test_c14n skipped -- an optional feature could not be imported
test_dom
test test_dom skipped -- an optional feature could not be imported
test_domreg
test_encodings
test_expatreader
test test_expatreader failed -- Traceback (most recent call last):
  File 
"/Applications/MyApplications/PyXML-0.8.4/test/test_expatreader.py", 
line 21, in setUp
    self.parser.setFeature(handler.feature_namespace_prefixes, 1)
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/xml/sax/expatreader.py", 
line 157, in setFeature
    "expat does not report namespace prefixes")
SAXNotSupportedException: expat does not report namespace prefixes

test_filter
test test_filter failed -- Writing: u'<?xml version="1.0" 
?><doc><e>text<e/>moreabc</e>xyz</doc>', expected: '<?xml version="1.0" 
?>\n<doc><e>text<e/>moreabc</e>xyz</doc'
test_howto
test test_howto crashed -- <type 'exceptions.AttributeError'> : 'module' 
object has no attribute 'DefaultHandler'
test_htmlb
test test_htmlb skipped -- an optional feature could not be imported
test_javadom
test test_javadom skipped -- an optional feature could not be imported
test_marshal
test test_marshal skipped -- an optional feature could not be imported
test_minidom
test test_minidom failed -- Writing: 'Test Failed: ', expected: ''
test_ns
test test_ns skipped -- an optional feature could not be imported
test_pyexpat
test_sax
test test_sax skipped -- an optional feature could not be imported
test_sax2
test test_sax2 skipped -- an optional feature could not be imported
test_sax2_xmlproc
test_sax_xmlproc
test test_sax_xmlproc skipped -- an optional feature could not be imported
test_saxdrivers
test test_saxdrivers skipped -- an optional feature could not be imported
test_utils
test test_utils skipped -- an optional feature could not be imported
test_xmlbuilder
test test_xmlbuilder failed -- errors occurred; run in verbose mode for 
details
test_xmlproc
test test_xmlproc skipped -- an optional feature could not be imported
4 tests OK.
5 tests failed: test_expatreader test_filter test_howto test_minidom 
test_xmlbuilder
12 tests skipped: test_c14n test_dom test_htmlb test_javadom 
test_marshal test_ns test_sax test_sax2 test_sax_xmlproc test_saxdrivers 
test_utils test_xmlproc


From smcg4191 at frii.com  Tue Mar 25 05:19:15 2008
From: smcg4191 at frii.com (Stuart McGraw)
Date: Mon, 24 Mar 2008 22:19:15 -0600
Subject: [XML-SIG] lxml iterparse and comments
Message-ID: <47E87D43.1090802@frii.com>

Hello Stefan,

Thanks for your response.

> Stuart McGraw wrote:
> > I am probably mising something elementary (I am new
> > to both xml and lxml), but I am having problems figuring
> > out how to get comments when using lxml's iterparse().
> > When I parse xml with parse() and iterate though the
> > result, I get the comments.  But when I try to do the
> > same thing (approximately I think) with iterparse,
> > I don't see any comments.
>
> While the comments end up in the tree that iterparse generates, 
> they do not show up in the events. Now that you mention it, I
> actually think that should change. There should be events
>  "comment" and "pi" that yield them if requested.

That would be ideal, from my perspective.  It also seems
more consistent with the other interfaces (parse, parse target,
etc)

> > I was using the standard Python ElementTree but my
> > understanding is that it doesn't save comments at all.
>
> ElementTree strips comments in the parser, that's right.
>
> > The real file is ~50MB and has about 1M nodes under the
> > root so I have to use iterparse and I also have to process
> > comments, so I would really appreciate a clue about how
> > to do it.  Thanks.
>
> Have you tried the parser target interface? It's a SAX-like
> interface that uses callbacks.
>
> http://codespeak.net/lxml/parsing.html#the-target-parser-interface
>
http://effbot.org/elementtree/elementtree-xmlparser.htm#the-target-interfa
ce

Thanks for pointing that out.  I'd seen it in the docs but
hadn't appreciated that it was relevant.  However, I am
having trouble getting it to work.  Specifically, the test
code below produces the output I expected when run with
cElementTree, but with lxml, it is missing "end" callbacks,
the second "start(entry) " callback, and the resolved entity
text.  Am I doing something wrong?

Test code:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#import xml.etree.cElementTree as ET
import lxml.etree as ET
from cStringIO import StringIO

# XML data...
#=============================================
xmltxt = \
'''<?xml version="1.0" encoding="UTF-8"?>
<!-- Rev 1.06
-->
<!DOCTYPE Test [
<!ELEMENT Test (entry*)>
<!ELEMENT entry (#PCDATA)>
	<!-- Description of <entry> element.
	-->
<!ENTITY ex "an existential entity">
]>
<!-- File created: 2008-02-27 -->
<Test>
<!--  Chronosynclastic Infindibulum Listing -->
<entry>text 1 is &ex;</entry>
<!-- Deleted:  A1500477 -->
<entry>text 2</entry>
</Test>'''
#=============================================

print '\nTargetParser:\n-------------'

try:                   XMLParser = ET.XMLParser
except AttributeError: XMLParser = ET.XMLTreeBuilder

class EchoTarget:
    def comment(self, tag):
        print "comment", tag
    def start(self, tag, attrib):
        print "start", tag, attrib
    def end(self, tag):
        print "end", tag
    def data(self, data):
        print "data", repr(data)
    def close(self):
        print "close"
        return "closed!"

parser = XMLParser( target = EchoTarget())
result = ET.parse( StringIO (xmltxt), parser)

From stefan_ml at behnel.de  Tue Mar 25 22:04:02 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 25 Mar 2008 22:04:02 +0100
Subject: [XML-SIG] lxml iterparse and comments
In-Reply-To: <47E87D43.1090802@frii.com>
References: <47E87D43.1090802@frii.com>
Message-ID: <47E968C2.6030905@behnel.de>

Hi,

Stuart McGraw wrote:
>> Stuart McGraw wrote:
>> > I am probably mising something elementary (I am new
>> > to both xml and lxml), but I am having problems figuring
>> > out how to get comments when using lxml's iterparse().
>> > When I parse xml with parse() and iterate though the
>> > result, I get the comments.  But when I try to do the
>> > same thing (approximately I think) with iterparse,
>> > I don't see any comments.
>>
>> While the comments end up in the tree that iterparse generates, they
>> do not show up in the events. Now that you mention it, I
>> actually think that should change. There should be events
>>  "comment" and "pi" that yield them if requested.
> 
> That would be ideal, from my perspective.  It also seems
> more consistent with the other interfaces (parse, parse target,
> etc)

Implemented on the trunk, will be in lxml 2.1.


>> Have you tried the parser target interface?
> I am having trouble getting it to work.  Specifically, the test
> code below produces the output I expected when run with
> cElementTree, but with lxml, it is missing "end" callbacks,
> the second "start(entry) " callback, and the resolved entity
> text.  Am I doing something wrong?
> 
> Test code:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> #import xml.etree.cElementTree as ET
> import lxml.etree as ET
> from cStringIO import StringIO
> 
> # XML data...
> #=============================================
> xmltxt = \
> '''<?xml version="1.0" encoding="UTF-8"?>
> <!-- Rev 1.06
> -->
> <!DOCTYPE Test [
> <!ELEMENT Test (entry*)>
> <!ELEMENT entry (#PCDATA)>
>     <!-- Description of <entry> element.
>     -->
> <!ENTITY ex "an existential entity">
> ]>
> <!-- File created: 2008-02-27 -->
> <Test>
> <!--  Chronosynclastic Infindibulum Listing -->
> <entry>text 1 is &ex;</entry>
> <!-- Deleted:  A1500477 -->
> <entry>text 2</entry>
> </Test>'''
> #=============================================
> 
> print '\nTargetParser:\n-------------'
> 
> try:                   XMLParser = ET.XMLParser
> except AttributeError: XMLParser = ET.XMLTreeBuilder
> 
> class EchoTarget:
>    def comment(self, tag):
>        print "comment", tag
>    def start(self, tag, attrib):
>        print "start", tag, attrib
>    def end(self, tag):
>        print "end", tag
>    def data(self, data):
>        print "data", repr(data)
>    def close(self):
>        print "close"
>        return "closed!"
> 
> parser = XMLParser( target = EchoTarget())
> result = ET.parse( StringIO (xmltxt), parser)

I can reproduce that. Seems to require an entity reference in the data,
though. I'll look into it.

Stefan

From stefan_ml at behnel.de  Tue Mar 25 23:04:37 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 25 Mar 2008 23:04:37 +0100
Subject: [XML-SIG] Issues with XMLTreeBuilder in cElementTree and
	ElementTree
In-Reply-To: <697339f4-c549-4022-b945-434f2909cdfc@13g2000hsb.googlegroups.com>
References: <697339f4-c549-4022-b945-434f2909cdfc@13g2000hsb.googlegroups.com>
Message-ID: <47E976F5.3020704@behnel.de>

Hi again,

Michael Becker wrote:
> These changes would allow the user to take advantage of the
> ordered_attributes attribute in the expat parser to use either ordered
> or unorder attributes as desired. For backwards compatibility it might
> be desirable to change XMLTreeBuilder to default to ordered_attributes
> = 0. I've never submitted a bug fix to a python library so if this
> seems like a real bug please let me know how to proceed.
> 
> Secondly, I found a potential issue with the cElementTree module. My
> understanding (which could be incorrect) of python C modules is that
> they should work the same as the python versions but be more
> efficient. The XMLTreeBuilder class in cElementTree doesn't seem to be
> using the same parser as that in ElementTree. The following code
> illustrates this issue:
> 
>>>> import xml.etree.cElementTree
>>>> t1=xml.etree.cElementTree.XMLTreeBuilder()
>>>> t1._parser.ordered_attributes = 1
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> AttributeError: _parser

(c)ET's XMLParser has an attribute "parser" that references the expat parser
instance. It was renamed in newer versions.

Stefan


From 2huggie at gmail.com  Wed Mar 26 08:12:28 2008
From: 2huggie at gmail.com (Timothy Wu)
Date: Wed, 26 Mar 2008 15:12:28 +0800
Subject: [XML-SIG] Content is split into two
Message-ID: <ebf8d36c0803260012q105cdd9fkbfe8e370dacc8025@mail.gmail.com>

Hi, I post the following in the Python mailing list but no one responded. So
I'm posting here again.

------------

Hi,

I have created a very, very simple parser for an XML.

class FindGoXML2(ContentHandler):
    def characters(self, content):
        print content

I have made it simple because I want to debug. This prints out any content
enclosed by tags (right?).

The XML is publicly available here:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml

I show a few line embedded in this XML:

              <Gene-commentary_source>
                <Other-source>
                  <Other-source_src>
                    <Dbtag>
                      <Dbtag_db>GO</Dbtag_db>
                      <Dbtag_tag>
                        <Object-id>
                          <Object-id_id>3824</Object-id_id>
                        </Object-id>
                      </Dbtag_tag>
                    </Dbtag>
                  </Other-source_src>
                  <Other-source_anchor>catalytic
activity</Other-source_anchor>
                  <Other-source_post-text>evidence:
IEA</Other-source_post-text>
                </Other-source>
              </Gene-commentary_source>

Notice the third line before the last. I expect my content printout to print
out "evidence:IEA".
However this is what I get.

-------------------------
catalytic activity  ==> this is the print out the line before


e
vidence: IEA
-------------------------

I don't understand why a few blank lines were printed after "catalytic
activity". But that
doesn't matter. What matters is where the string "evidence: IEA" is split
into two printouts.
First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs
without a problem,
this occurs on my 826th XML.

Any explanations??
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20080326/5e67a967/attachment.htm 

From jcd at unc.edu  Wed Mar 26 14:39:21 2008
From: jcd at unc.edu (J. Cliff Dyer)
Date: Wed, 26 Mar 2008 09:39:21 -0400
Subject: [XML-SIG] Content is split into two
In-Reply-To: <ebf8d36c0803260012q105cdd9fkbfe8e370dacc8025@mail.gmail.com>
References: <ebf8d36c0803260012q105cdd9fkbfe8e370dacc8025@mail.gmail.com>
Message-ID: <1206538761.3328.3.camel@aalcdl07.lib.unc.edu>

On Wed, 2008-03-26 at 15:12 +0800, Timothy Wu wrote:
> Hi, I post the following in the Python mailing list but no one
> responded. So I'm posting here again.
> 
> ------------
> 
> Hi,
> 
> I have created a very, very simple parser for an XML.
> 
> class FindGoXML2(ContentHandler):
>     def characters(self, content):
>         print content
> 
> I have made it simple because I want to debug. This prints out any
> content enclosed by tags (right?).
> 
> The XML is publicly available here:
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml
> 
> I show a few line embedded in this XML:
> 
>               <Gene-commentary_source>
>                 <Other-source>
>                   <Other-source_src>
>                     <Dbtag>
>                       <Dbtag_db>GO</Dbtag_db>
>                       <Dbtag_tag>
>                         <Object-id>
>                           <Object-id_id>3824</Object-id
> _id>
>                         </Object-id>
>                       </Dbtag_tag>
>                     </Dbtag>
>                   </Other-source_src>
>                   <Other-source_anchor>catalytic
> activity</Other-source_anchor>
>                   <Other-source_post-text>evidence:
> IEA</Other-source_post-text>
>                 </Other-source>
>               </Gene-commentary_source>
> 
> Notice the third line before the last. I expect my content printout to
> print out "evidence:IEA".
> However this is what I get.
> 
> -------------------------
> catalytic activity  ==> this is the print out the line before
> 
> 
> 
> e
> vidence: IEA
> -------------------------
> 
> I don't understand why a few blank lines were printed after "catalytic
> activity". But that 
> doesn't matter. What matters is where the string "evidence: IEA" is
> split into two printouts.
> First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs
> without a problem, 
> this occurs on my 826th XML.
> 
> Any explanations??

The parser will retrieve input in chunks of unspecified size.  There is
no guarantee that a text block will all get returned at once.  You are
seeing this problem because the print statement adds a newline after it
prints.  If you want to see the text itself, without phantom newlines,
try replacing print with sys.stdout.write().  

Cheers,
Cliff


From smcg4191 at frii.com  Wed Mar 26 17:11:10 2008
From: smcg4191 at frii.com (Stuart McGraw)
Date: Wed, 26 Mar 2008 10:11:10 -0600
Subject: [XML-SIG] lxml iterparse and comments
Message-ID: <47EA759E.1080103@frii.com>

Stefan Behnel wrote:
[...re adding comment and pi events to iterparse...]
> Implemented on the trunk, will be in lxml 2.1.

Thanks.

[... re missing callbacks from target parser...]
> I can reproduce that. Seems to require an entity reference in the data,
> though. I'll look into it.
[and(from lxml-dev)]
> Fixed for 2.0.3.

Thanks again!

From martin at v.loewis.de  Wed Mar 26 20:54:19 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 26 Mar 2008 20:54:19 +0100
Subject: [XML-SIG] How do I test PyXML
In-Reply-To: <47E845AA.80809@fas.harvard.edu>
References: <47E845AA.80809@fas.harvard.edu>
Message-ID: <47EAA9EB.8020205@v.loewis.de>

> I'm trying to use the extension for "Inkscape", "textext" which needs 
> "PyXML". I've been  trying to install PyXML and then uses "textext" 
> without success so I don't know if I'm installing PyXML correctly.
> 
> My OS is Mac OS X version 10.5.2
> 
> When I run "python regrtest.py" I got the message below (this might be 
> helpful)

Did you install PyXML, using "setup.py install"? It seems you are not
picking up the installed copy, but the standard XML packages from your
Python 2.5 installation.

Regards,
Martin

From 2huggie at gmail.com  Thu Mar 27 05:01:38 2008
From: 2huggie at gmail.com (Timothy Wu)
Date: Thu, 27 Mar 2008 12:01:38 +0800
Subject: [XML-SIG] Content is split into two
In-Reply-To: <1206538761.3328.3.camel@aalcdl07.lib.unc.edu>
References: <ebf8d36c0803260012q105cdd9fkbfe8e370dacc8025@mail.gmail.com>
	<1206538761.3328.3.camel@aalcdl07.lib.unc.edu>
Message-ID: <ebf8d36c0803262101q75d9283avd26f654f18049bae@mail.gmail.com>

On Wed, Mar 26, 2008 at 9:39 PM, J. Cliff Dyer <jcd at unc.edu> wrote:

> The parser will retrieve input in chunks of unspecified size.  There is
> no guarantee that a text block will all get returned at once.  You are
> seeing this problem because the print statement adds a newline after it
> prints.  If you want to see the text itself, without phantom newlines,
> try replacing print with sys.stdout.write().
>
> Cheers,
> Cliff


Thanks for the help.

Now I see that on page

http://pyxml.sourceforge.net/topics/howto/node14.html

"You also shouldn't assume that all the characters are passed in a single
function call."

Wow, totally unexpected. Wonder why it's designed as it is? This is
especially weird to me since the string size isn't big (small buffer) and
this add a bit of complexity to the text processing. Now I have to set flag
to make sure that I should finish off when moving out of the tag.

This now all sounds like of de javu, maybe I ran into this before. =/ I
don't process XML that often.

Timothy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20080327/79b0c6eb/attachment.htm 

From stefan_ml at behnel.de  Thu Mar 27 09:23:54 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 27 Mar 2008 09:23:54 +0100
Subject: [XML-SIG] Content is split into two
In-Reply-To: <ebf8d36c0803262101q75d9283avd26f654f18049bae@mail.gmail.com>
References: <ebf8d36c0803260012q105cdd9fkbfe8e370dacc8025@mail.gmail.com>	<1206538761.3328.3.camel@aalcdl07.lib.unc.edu>
	<ebf8d36c0803262101q75d9283avd26f654f18049bae@mail.gmail.com>
Message-ID: <47EB599A.1060802@behnel.de>

Hi,

Timothy Wu wrote:
> "You also shouldn't assume that all the characters are passed in a single
> function call."
> 
> Wow, totally unexpected. Wonder why it's designed as it is? This is
> especially weird to me since the string size isn't big (small buffer) and

For you maybe, but nothing keeps an XML document from having text entries of a
couple of megabytes, possibly separated by entity references. Aggregating all
that in memory could be quite expensive, so it's a good design choice not to
require that in the parser.


> this add a bit of complexity to the text processing.

Not that much. The usual pattern is: append text content to a list and join it
when you see something that's not text. That works very well unless your
strings are really long.

Stefan


From debian-users-admin at debian.or.jp  Thu Mar 27 15:28:00 2008
From: debian-users-admin at debian.or.jp (debian-users-admin at debian.or.jp)
Date: Thu, 27 Mar 2008 23:28:00 +0900
Subject: [XML-SIG] Subscribe request result (debian-users ML)
References: <20080327142754.BA478C2DFF@osdn.debian.or.jp>
Message-ID: <200803272328.FMLAAA13658.debian-users@debian.or.jp>

Hi, I am the fml ML manager for the ML <debian-users at debian.or.jp>.


--debian-users at debian.or.jp, Be Seeing You!    

************************************************************
If you have any questions or problems,
   please contact debian-users-admin at debian.or.jp


************************************************************


From fredrik at pythonware.com  Sun Mar 30 15:28:38 2008
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sun, 30 Mar 2008 15:28:38 +0200
Subject: [XML-SIG] Issues with XMLTreeBuilder in cElementTree and
	ElementTree
In-Reply-To: <47E976F5.3020704@behnel.de>
References: <697339f4-c549-4022-b945-434f2909cdfc@13g2000hsb.googlegroups.com>
	<47E976F5.3020704@behnel.de>
Message-ID: <fso4i7$1et$1@ger.gmane.org>

Stefan Behnel wrote:

> (c)ET's XMLParser has an attribute "parser" that references the expat parser
> instance. It was renamed in newer versions.

cElementTree doesn't use the pyexpat API, and the expat binding it uses 
doesn't support the ordered_attributes nonsense (*) at all.

</F>

*) it's an XML parser, after all.  bugs in downstream tools should be 
fixed in those tools, or by post-processing, not by hacking XML tools
to produce things that are not XML.


From HDoran at air.org  Mon Mar 31 19:34:58 2008
From: HDoran at air.org (Doran, Harold)
Date: Mon, 31 Mar 2008 13:34:58 -0400
Subject: [XML-SIG] Learning to use elementtree
Message-ID: <2323A6D37908A847A7C32F1E3662C80E017BDC9A@dc1ex01.air.org>

Dear List:

I am brand new to xml and have some experience with python using it to
parse through text files. Now, however, I need to use python to parse
through some xml files. I am working with elementtree right now and am
able to make this work on some toy examples. Things are going well with
these toy examples.

But, now I am trying to apply the code I have written to a real xml file
I need to work with and things are hitting a road block. Is anyone on
this able willing to look at an xml file I can send them and work with
me through a small example to see if I can get this to work?

I am working with python 2.5.2 for windows XP.

Harold

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20080331/76c7da27/attachment.htm