From stefan_ml at behnel.de  Sun Jul  1 17:38:36 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 01 Jul 2007 17:38:36 +0200
Subject: [XML-SIG] lxml 1.3 released
In-Reply-To: <4687BAFA.1080201@comcast.net>
References: <467E51B2.4020207@behnel.de> <4687BAFA.1080201@comcast.net>
Message-ID: <4687CA7C.9010409@behnel.de>

Hi,

Gloria W wrote:
> There's no chance of getting an extension to this module which supports
> DOM2, is there? I cannot work with the current PyXML DOM2 support. It is
> inflexible (does not allow subtree construction/insertion), is buggy,
> and bloated.

Well, "bloat" is a word I would use for any DOM implementation.

lxml is actually quite the opposite of the three: extremely flexible, safe and
simple.


> I wrote my own, but I don't have time to implement the
> range() functionality. Let me know if there are plans to extend this. It
> would be great.

No. lxml will not support the DOM API. It already has a (mostly?) equivalent
API that is much simpler in spirit (and thus much easier to use), so there is
no reason for us to take the step back to the impressively un-pythonic DOM API.

If you really want a W3C-DOM compatible API and want to use libxml2, there is
a project that implements DOM on top of them: libxml2dom.

http://www.boddie.org.uk/python/libxml2dom.html

I assume this is for porting existing code? But even then, you may consider
rewriting the XML parts in lxml. We had a couple of comments on the list that
make me believe that this is a) not that hard (depending on your code
size/architecture) and b) worth it, at least in the cases I heard about.

Oh, and for the really hard-to-port stuff, you can still use Python's DOM support:

http://codespeak.net/lxml/sax.html

Stefan

From stefan_ml at behnel.de  Sun Jul  1 18:02:33 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 01 Jul 2007 18:02:33 +0200
Subject: [XML-SIG] lxml 1.3 released
In-Reply-To: <4687BAFA.1080201@comcast.net>
References: <467E51B2.4020207@behnel.de> <4687BAFA.1080201@comcast.net>
Message-ID: <4687D019.1030109@behnel.de>


Gloria W wrote:
> There's no chance of getting an extension to this module which supports
> DOM2, is there? I cannot work with the current PyXML DOM2 support. It is
> inflexible (does not allow subtree construction/insertion), is buggy,
> and bloated. I wrote my own, but I don't have time to implement the
> range() functionality. Let me know if there are plans to extend this. It
> would be great.

Ah, I forgot to say that lxml.etree is obviously flexible enough to support a
DOM compatible implementation on top of itself. It's just that no-one has done
it and it is unlikely that someone takes the time to actually do it. It would
not add any functionality that isn't there already, just with a less pythonic API.

In case you consider starting such a thing, here's how to do it:

http://codespeak.net/lxml/element_classes.html

Stefan


From robert.rawlins at thinkbluemedia.co.uk  Mon Jul  2 13:20:21 2007
From: robert.rawlins at thinkbluemedia.co.uk (Robert Rawlins - Think Blue)
Date: Mon, 2 Jul 2007 12:20:21 +0100
Subject: [XML-SIG] Help Needed (Will pay if someone is interested)
Message-ID: <002701c7bc9a$fa7e5fc0$ef7b1f40$@rawlins@thinkbluemedia.co.uk>

Hello Chaps,

 
I'm looking for some help with XML parsing, I've been playing around with
this over the past few days and the only solution I can come up with seems
to be a little slow and also leaves what I think is a memory leak in my
application, which causes all kinds of problems. 

 
I have a very simple XML file which I need to loop over the elements and
extract the attribute information from, but the loop must be conditional as
the attributes must meet a certain criteria.

 
My current solution is using minidom, which I've read isn't one of the
better parsers, if anyone knows of any that are better for the task I would
love to hear it, the content is extracted regularly so whatever we chose
needs to be quick, and validation isn't so important. Take a look at this
brief example of the XML we're dealing with:

 
<schedules name="Default event" location="this is the location of the
event">

                <event name="This is an event" location="At my house"
type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />

                <event name="And Another" location="At work" type="2"
start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />

                <event name="This is some more" location="At the cafe"
type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />

                <event name="And one last one" location="At my house"
type="3" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />

</schedules>

 
Now this file details events which are possibly going to occur over the next
couple of weeks. Now what I need to do is have a function which is called
'getCurrentEvent()' which will return any events that should be occurring at
this point in time, or now(). The 'Type' attribute details how often the
event it likely to reoccur, 1 being daily, 2 being weekly and so on, if no
elements are found which are occurring in this time and date then I would
like it to return the default event which is defined in the attributes of
the 'schedules' tag.

 
The current solution I have put together uses minidom to loop over the
elements from the XML and then does a conditional against a python module
called 'period.py'. This works ok, but it's very slow and also contains a
memory leak. I need something better and I have no real idea or experience
of how to achieve it which is why I'm here with you good gentlemen to try
and find a solution.

 
I appreciate this could be quite a challenging task so would be happy to pay
someone for their time to solve this for me, you may want to contact me off
list to talk about that though and we'd be hoping to get this sorted ASAP.

 
Thanks guys,

 
Rob

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20070702/41cd5b1e/attachment.htm 

From robert.rawlins at thinkbluemedia.co.uk  Mon Jul  2 16:21:19 2007
From: robert.rawlins at thinkbluemedia.co.uk (Robert Rawlins - Think Blue)
Date: Mon, 2 Jul 2007 15:21:19 +0100
Subject: [XML-SIG] Help Needed (Will pay if someone is interested)
Message-ID: <005501c7bcb4$42646780$c72d3680$@rawlins@thinkbluemedia.co.uk>

Hello Chaps,

 
I'm looking for some help with XML parsing, I've been playing around with
this over the past few days and the only solution I can come up with seems
to be a little slow and also leaves what I think is a memory leak in my
application, which causes all kinds of problems. 

 
I have a very simple XML file which I need to loop over the elements and
extract the attribute information from, but the loop must be conditional as
the attributes must meet a certain criteria.

 
My current solution is using minidom, which I've read isn't one of the
better parsers, if anyone knows of any that are better for the task I would
love to hear it, the content is extracted regularly so whatever we chose
needs to be quick, and validation isn't so important. Take a look at this
brief example of the XML we're dealing with:

 
<schedules name="Default event" location="this is the location of the
event">

                <event name="This is an event" location="At my house"
type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />

                <event name="And Another" location="At work" type="2"
start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />

                <event name="This is some more" location="At the cafe"
type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />

                <event name="And one last one" location="At my house"
type="3" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />

</schedules>

 
Now this file details events which are possibly going to occur over the next
couple of weeks. Now what I need to do is have a function which is called
'getCurrentEvent()' which will return any events that should be occurring at
this point in time, or now(). The 'Type' attribute details how often the
event it likely to reoccur, 1 being daily, 2 being weekly and so on, if no
elements are found which are occurring in this time and date then I would
like it to return the default event which is defined in the attributes of
the 'schedules' tag.

 
The current solution I have put together uses minidom to loop over the
elements from the XML and then does a conditional against a python module
called 'period.py'. This works ok, but it's very slow and also contains a
memory leak. I need something better and I have no real idea or experience
of how to achieve it which is why I'm here with you good gentlemen to try
and find a solution.

 
I appreciate this could be quite a challenging task so would be happy to pay
someone for their time to solve this for me, you may want to contact me off
list to talk about that though and we'd be hoping to get this sorted ASAP.

 
I'm thinking maybe some form of xquery instead of the iteration? I really
don't know, it's up to you.

 
Thanks guys,

 
Rob

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20070702/6d1f7614/attachment.html 

From stefan_ml at behnel.de  Mon Jul  2 17:12:31 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 02 Jul 2007 17:12:31 +0200
Subject: [XML-SIG] Help Needed (Will pay if someone is interested)
In-Reply-To: <002701c7bc9a$fa7e5fc0$ef7b1f40$@rawlins@thinkbluemedia.co.uk>
References: <002701c7bc9a$fa7e5fc0$ef7b1f40$@rawlins@thinkbluemedia.co.uk>
Message-ID: <468915DF.8060701@behnel.de>


Robert Rawlins - Think Blue wrote:
> I?m looking for some help with XML parsing, I?ve been playing around
> with this over the past few days and the only solution I can come up
> with seems to be a little slow and also leaves what I think is a memory
> leak in my application, which causes all kinds of problems.
> 
>  
> 
> I have a very simple XML file which I need to loop over the elements and
> extract the attribute information from, but the loop must be conditional
> as the attributes must meet a certain criteria.
> 
>  
> 
> My current solution is using minidom,

That's not the solution, that's the problem. Use cElementTree.


> which I?ve read isn?t one of the
> better parsers, if anyone knows of any that are better for the task I
> would love to hear it, the content is extracted regularly so whatever we
> chose needs to be quick, and validation isn?t so important. Take a look
> at this brief example of the XML we?re dealing with:
> 
>  
> 
> <schedules name="Default event" location="this is the location of the
> event">
> 
>                 <event name="This is an event" location="At my house"
> type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
>                 <event name="And Another" location="At work" type="2"
> start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
>                 <event name="This is some more" location="At the cafe"
> type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
>                 <event name="And one last one" location="At my house"
> type="3" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
> </schedules>
> 
>  
> 
> Now this file details events which are possibly going to occur over the
> next couple of weeks. Now what I need to do is have a function which is
> called ?getCurrentEvent()? which will return any events that should be
> occurring at this point in time, or now().

  from xml.etree import celementtree as et # Python 2.5

  # untested
  search_date = "2007-02-03 00:00:00"
  for _, element in et.iterparse("event-file.xml"):
      if element.tag == event:
          start = element.get("start")
          end   = element.get("end")
          if start > search_date:
               continue
          if end != start and end < search_date:
               continue

      print et.tostring(element)

or something like that. You'll love the performance.


> The ?Type? attribute details
> how often the event it likely to reoccur, 1 being daily, 2 being weekly
> and so on, if no elements are found which are occurring in this time and
> date then I would like it to return the default event which is defined
> in the attributes of the ?schedules? tag.

That's much harder, as it requires real date calculation in general. Are you
sure you want an XML tree as a database? Why not read the file into a more
suitable in-memory data structure and search from there?

Stefan

From robert.rawlins at thinkbluemedia.co.uk  Mon Jul  2 17:19:02 2007
From: robert.rawlins at thinkbluemedia.co.uk (Robert Rawlins - Think Blue)
Date: Mon, 2 Jul 2007 16:19:02 +0100
Subject: [XML-SIG] Help Needed (Will pay if someone is interested)
In-Reply-To: <468915DF.8060701@behnel.de>
References: <002701c7bc9a$fa7e5fc0$ef7b1f40$@rawlins@thinkbluemedia.co.uk>
	<468915DF.8060701@behnel.de>
Message-ID: <006901c7bcbc$551d5e10$ff581a30$@rawlins@thinkbluemedia.co.uk>

Hi Stefan.

Thanks for getting back to me so quickly, I've been tearing my hair out on this one :-)

>> Why not read the file into a more suitable in-memory data structure and search from there?

I'd be more than happy to do something like this, I just have no idea how, what type of data structure are you thinking would be simple?

Thanks for the cElementTree example the code already look a lot cleaner than that of the minidom stuff I was working on, jeez that stuff was messy lol.

Thanks, You'll have to excuse me on any naivety as I'm relatively new to both XML and Python, mixing the two is making my head spin :-D

Rob


-----Original Message-----
From: Stefan Behnel [mailto:stefan_ml at behnel.de] 
Sent: 02 July 2007 16:13
To: Robert Rawlins - Think Blue
Cc: xml-sig at python.org
Subject: Re: [XML-SIG] Help Needed (Will pay if someone is interested)


Robert Rawlins - Think Blue wrote:
> I?m looking for some help with XML parsing, I?ve been playing around
> with this over the past few days and the only solution I can come up
> with seems to be a little slow and also leaves what I think is a memory
> leak in my application, which causes all kinds of problems.
> 
>  
> 
> I have a very simple XML file which I need to loop over the elements and
> extract the attribute information from, but the loop must be conditional
> as the attributes must meet a certain criteria.
> 
>  
> 
> My current solution is using minidom,

That's not the solution, that's the problem. Use cElementTree.


> which I?ve read isn?t one of the
> better parsers, if anyone knows of any that are better for the task I
> would love to hear it, the content is extracted regularly so whatever we
> chose needs to be quick, and validation isn?t so important. Take a look
> at this brief example of the XML we?re dealing with:
> 
>  
> 
> <schedules name="Default event" location="this is the location of the
> event">
> 
>                 <event name="This is an event" location="At my house"
> type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
>                 <event name="And Another" location="At work" type="2"
> start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
>                 <event name="This is some more" location="At the cafe"
> type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
>                 <event name="And one last one" location="At my house"
> type="3" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
> </schedules>
> 
>  
> 
> Now this file details events which are possibly going to occur over the
> next couple of weeks. Now what I need to do is have a function which is
> called ?getCurrentEvent()? which will return any events that should be
> occurring at this point in time, or now().

  from xml.etree import celementtree as et # Python 2.5

  # untested
  search_date = "2007-02-03 00:00:00"
  for _, element in et.iterparse("event-file.xml"):
      if element.tag == event:
          start = element.get("start")
          end   = element.get("end")
          if start > search_date:
               continue
          if end != start and end < search_date:
               continue

      print et.tostring(element)

or something like that. You'll love the performance.


> The ?Type? attribute details
> how often the event it likely to reoccur, 1 being daily, 2 being weekly
> and so on, if no elements are found which are occurring in this time and
> date then I would like it to return the default event which is defined
> in the attributes of the ?schedules? tag.

That's much harder, as it requires real date calculation in general. Are you
sure you want an XML tree as a database? Why not read the file into a more
suitable in-memory data structure and search from there?

Stefan


From dkuhlman at rexx.com  Mon Jul  2 20:21:46 2007
From: dkuhlman at rexx.com (Dave Kuhlman)
Date: Mon, 2 Jul 2007 11:21:46 -0700
Subject: [XML-SIG] lxml 1.3 released
In-Reply-To: <4687CA7C.9010409@behnel.de>
References: <467E51B2.4020207@behnel.de> <4687BAFA.1080201@comcast.net>
	<4687CA7C.9010409@behnel.de>
Message-ID: <20070702182146.GA10229@cutter.rexx.com>

On Sun, Jul 01, 2007 at 05:38:36PM +0200, Stefan Behnel wrote:
> Hi,
> 
> Gloria W wrote:
> > There's no chance of getting an extension to this module which supports
> > DOM2, is there? I cannot work with the current PyXML DOM2 support. It is
> > inflexible (does not allow subtree construction/insertion), is buggy,
> > and bloated.
> 
> Well, "bloat" is a word I would use for any DOM implementation.
> 
> lxml is actually quite the opposite of the three: extremely flexible, safe and
> simple.

Stefan -

Just so you don't take silence as thank-less-ness, thank you for
lxml.  I use it frequently.  My rst2odt writer for Docutils
(converts reStructuredText to .odt files for OpenOffice oowriter) is
built on it.  It's great.  Thanks much.

Dave


-- 
Dave Kuhlman
http://www.rexx.com/~dkuhlman

From ixzlh at draftca.com  Tue Jul 10 04:23:25 2007
From: ixzlh at draftca.com (Carol Mills)
Date: Mon, 9 Jul 2007 19:23:25 -0700
Subject: [XML-SIG] corkscrew virile
Message-ID: <4692ED9D.6040508@online.be>

VPSN Has Wild Day as Stock climbs $0.019 (90.48%) GAIN!

VISION AIRSHIPS INC (Other OTC:VPSN.PK)

The 24 hrs has been a sky rocket for VPSN. With major news to be
released stirring interest has brought huge returns for investors. The
key is, knowing when to get on and when to get off a stock, for
successful day trading. VPSN has distinct patterns to watch for. This
ride is not over. Jump on now and ride the price up on the highest
return "Day Trade" we have featured this year.

Get on VPSN first thing Tuesday as we stired you in the right direction
for Monday.

It can make your code more readable. Often applications that use Java
Persistence execute queries  that return a collection of objects.

" He is a regular speaker on enterprise application  design. Also, web
sites and public APIs used in mashups have very different mechanisms for
responding to exception conditions.

Because many public APIs provide the response in XML, the server-side
code must often convert the response into another data type. However,
there are other ways to call a service.

It then parses the XML content at the specified URL into an XML Document
object.

A server-side mashup is also called a proxy-style mashup because a
component in the server acts as a proxy to the service.

Using the Yahoo Maps Geocoding Service The Yahoo Maps Geocoding service
is a REST-based web service that is available for use by other web sites
through a public API. List    required:       java. WSIT addresses key
aspects of web services  interoperability such as reliable messaging,
transaction  handling, and security.

Here are some other good reasons for using proxy style in doing a
mashup: The Java EE and Java SE platforms provide many libraries that
make it easy to access other web sites from the server.

Additionally, it's good practice to validate the input data to a
service. Query class will likely change to better support generics.
JavaOne Online has the conference technical sessions in both PDF and
multimedia format for free.
It then parses the XML content at the specified URL into an XML Document
object. The proxy used in a server-side mashup can serve as a buffer
between the client and the other web site.
The warning is generated because query. In one approach, called a
server-side mashup, also known as a proxy-style mashup, you integrate
services and content on the server.


From robert.rawlins at thinkbluemedia.co.uk  Tue Jul 10 16:32:48 2007
From: robert.rawlins at thinkbluemedia.co.uk (Robert Rawlins - Think Blue)
Date: Tue, 10 Jul 2007 15:32:48 +0100
Subject: [XML-SIG] Parsing Help
Message-ID: <022501c7c2ff$30c4ee90$924ecbb0$@rawlins@thinkbluemedia.co.uk>

Hello Guys,

 
I'm looking for some help building a function which can parse some XML for
me using ElementTree. The document is of a very consistent format and I've
copied an example of the document below.

 
<?xml version="1.0" encoding="UTF-8" ?>

 
<record>

        <attribute id="0x0000">

                <uint32 value="0x00010005" />

        </attribute>

 
        <attribute id="0x0001">

                <sequence>

                        <uuid value="0x1105" />

                </sequence>

        </attribute>

 
        <attribute id="0x0003">

                <uuid value="0xe005" />

        </attribute>

 
        <attribute id="0x0004">

                <sequence>

                        <sequence>

                                <uuid value="0x0100" />

                        </sequence>

                        <sequence>

                                <uuid value="0x0003" />

                                <uint8 value="0x05" />

                        </sequence>

                        <sequence>

                                <uuid value="0x0008" />

                        </sequence>

                </sequence>

        </attribute>

 
        <attribute id="0x0005">

                <sequence>

                        <uuid value="0x1002" />

                </sequence>

        </attribute>

 
        <attribute id="0x0009">

                <sequence>

                        <sequence>

                                <uuid value="0x1105" />

                                <uint16 value="0x0100" />

                        </sequence>

                </sequence>

        </attribute>

 
        <attribute id="0x0100">

                <text value="OBEX Object Push" />

        </attribute>

 
        <attribute id="0x0303">

                <sequence>

                        <uint8 value="0x01" />

                        <uint8 value="0x03" />

                        <uint8 value="0x05" />

                        <uint8 value="0x06" />

                        <uint8 value="0xff" />

                </sequence>

        </attribute>

</record>

 
Now, the piece of information I'm looking to retrieve is inside the
<attribute id="0x0004"> element and is, in this example <uint8 value="0x05"
/>, however I want the function to return the standard integer value and not
the unit8 encoded version, so instead of my function returning '0x05' it
just needs to return '5' which is the standard integer version.

 
I will be passing this XML into the function as a string, so the function
will be formed something like this:

 
Def myFunction(XmlAsString):

                Pass the xml and exract my value....

                
                Return the value as an integer...

 
I'm not sure on the best method to do this, I just want something nice and
quick, lightweight and that's not resource hungry. Can anyone offer some
advice on this?

 
Thanks guys,

 
Rob

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20070710/f454b2d2/attachment.htm 

From dkuhlman at rexx.com  Tue Jul 10 17:53:22 2007
From: dkuhlman at rexx.com (Dave Kuhlman)
Date: Tue, 10 Jul 2007 08:53:22 -0700
Subject: [XML-SIG] Parsing Help
In-Reply-To: <022501c7c2ff$30c4ee90$924ecbb0$@rawlins@thinkbluemedia.co.uk>
References: <022501c7c2ff$30c4ee90$924ecbb0$@rawlins@thinkbluemedia.co.uk>
Message-ID: <20070710155322.GA21020@cutter.rexx.com>

On Tue, Jul 10, 2007 at 03:32:48PM +0100, Robert Rawlins - Think Blue wrote:
> Hello Guys,
> 
>  
> 
> I'm looking for some help building a function which can parse some XML for
> me using ElementTree. The document is of a very consistent format and I've
> copied an example of the document below.
> 

Here are some suggestions.

Import ElementTree or Lxml:

    from xml.etree import ElementTree as etree

Or:

    from lxml import etree

Parse the string:

    root = etree.fromstring(xmlstring)

Iterate over the nodes in the tree:

    for node in root.getiterator():

Check for the "attribute" tag:

    if node.tag == 'attribute':
    # But, use something like the following if there is a namespace.
    #if node.tag == '{%s}attribute' % (node.nsmap['mynamespace'], ):

Get the "id" attribute (or None is there isn't one):

    charid = node.get('id', None)

Enough to get you started?

Dave


-- 
Dave Kuhlman
http://www.rexx.com/~dkuhlman

From tosh54 at gmail.com  Mon Jul 16 04:39:15 2007
From: tosh54 at gmail.com (Peter Hoffmann)
Date: Sun, 15 Jul 2007 19:39:15 -0700
Subject: [XML-SIG] add Namespace Defaulting to ElementTree.write()
Message-ID: <1184553555.324929.149040@n2g2000hse.googlegroups.com>

Hi!

As I somtimes have to look  at or even edit xml markup generated by
ElementTree, it would be a lot easier if ElementTree could use
Namespace Defaultig  as described in http://www.w3.org/TR/REC-xml-names/#defaulting

Here is an example what I mean:
### raw input
<book xmlns='urn:loc.gov:books'
      xmlns:isbn='urn:ISBN:0-395-36341-6'>
    <title>Cheaper by the Dozen</title>
    <isbn:number>1568491379</isbn:number>
</book>

### normal ElementTree output
<ns0:book xmlns:ns0="urn:loc.gov:books">
    <ns0:title>Cheaper by the Dozen</ns0:title>
    <ns1:number xmlns:ns1="urn:ISBN:0-395-36341-6">1568491379</
ns1:number>
</ns0:book>

### with Patch/set default_namespace="urn:loc.gov:books"
<book xmlns="urn:loc.gov:books">
    <title>Cheaper by the Dozen</title>
    <ns1:number xmlns:ns1="urn:ISBN:0-395-36341-6">1568491379</
ns1:number>
</book>

I wrote a small patch against ElementTree.py (Python 2.5.1
(r251:54863, May  2 2007, 16:56:35) so that one can set a namespace to
be used as a default namespace when serealising an xml tree. You can
find it at http://user.cs.tu-berlin.de/~tosh/elementtree/  Some basic
tests are provided in selftest.py and some examples in test.py.


Any chances that the patch or a funcionality like this gets added to
ElementTree?


Regards Peter


From stefan_ml at behnel.de  Mon Jul 16 08:29:20 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 16 Jul 2007 08:29:20 +0200
Subject: [XML-SIG] add Namespace Defaulting to ElementTree.write()
In-Reply-To: <1184553555.324929.149040@n2g2000hse.googlegroups.com>
References: <1184553555.324929.149040@n2g2000hse.googlegroups.com>
Message-ID: <469B1040.10903@behnel.de>


Peter Hoffmann wrote:
> As I somtimes have to look  at or even edit xml markup generated by
> ElementTree, it would be a lot easier if ElementTree could use
> Namespace Defaultig  as described in http://www.w3.org/TR/REC-xml-names/#defaulting
> 
> Here is an example what I mean:
> ### raw input
> <book xmlns='urn:loc.gov:books'
>       xmlns:isbn='urn:ISBN:0-395-36341-6'>
>     <title>Cheaper by the Dozen</title>
>     <isbn:number>1568491379</isbn:number>
> </book>
> 
> ### normal ElementTree output
> <ns0:book xmlns:ns0="urn:loc.gov:books">
>     <ns0:title>Cheaper by the Dozen</ns0:title>
>     <ns1:number xmlns:ns1="urn:ISBN:0-395-36341-6">1568491379</
> ns1:number>
> </ns0:book>
> 
> ### with Patch/set default_namespace="urn:loc.gov:books"
> <book xmlns="urn:loc.gov:books">
>     <title>Cheaper by the Dozen</title>
>     <ns1:number xmlns:ns1="urn:ISBN:0-395-36341-6">1568491379</
> ns1:number>
> </book>


lxml.etree has been using a property called "nsmap" since the beginning, which
is already more generic than just a default namespace. If ElementTree wants to
adopt such a feature, I'd be happy if it could keep up compatibility from its
own side here.

Stefan

From jimcat3 at optonline.net  Fri Jul 27 19:18:02 2007
From: jimcat3 at optonline.net (Jim Caterbone)
Date: Fri, 27 Jul 2007 12:18:02 -0500
Subject: [XML-SIG]  Buy Vicodin online today, overnight shipping xyiz kccg v
Message-ID: <MNEELAKBBOKANDBKCIGGAEOGCMAA.jimcat3@optonline.net>

Price list?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 1232 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20070727/f9d2b50c/attachment.bin 

From dantrevino at gmail.com  Mon Jul 30 19:53:05 2007
From: dantrevino at gmail.com (Dan Trevino)
Date: Mon, 30 Jul 2007 13:53:05 -0400
Subject: [XML-SIG] problem parsing msproject xml
Message-ID: <d9a21a9b0707301053w2051a79boe374bd1b6eba8b01@mail.gmail.com>

I'm trying to parse project xml.  The main thing i'm trying to get at
is the task name, which is basically in this structure:
<Task>
  <UID>1</UID>
  <Name>do step 1</Name>    <-- i want the text from here
...
</Task>
<Resource>
  <Name>John Doe</Name>
...
</Resource>

I'm having difficulty figuring out which methods to use to access the
data.  I cant get to "Name" directly because it is used also for
project resources....so I need the task name specifically.  Where do I
go from here:

==============================
>>> prjdoc = minidom.parse('prj.xml')
>>> tasklist = prjdoc.getElementsByTagName("Task")
>>> for task in tasklist:
...     taskname = task.getElementsByTagName('Name')
...     print taskname
...
[]
[<DOM Element: Name at 0x5e89a08>]
[<DOM Element: Name at 0x5e99ad0>]
[<DOM Element: Name at 0x5eb00d0>]
[<DOM Element: Name at 0x5efb698>]
[<DOM Element: Name at 0x5f0dc60>]

================================
TIA,
dan

From billk at sunflower.com  Tue Jul 31 19:24:47 2007
From: billk at sunflower.com (Bill Kinnersley)
Date: Tue, 31 Jul 2007 12:24:47 -0500
Subject: [XML-SIG] problem parsing msproject xml
In-Reply-To: <d9a21a9b0707301053w2051a79boe374bd1b6eba8b01@mail.gmail.com>
References: <d9a21a9b0707301053w2051a79boe374bd1b6eba8b01@mail.gmail.com>
Message-ID: <46AF705F.308@sunflower.com>

Never used DOM and never written a line of Python, but maybe even I know 
the answer to this one!

The minidom documentation suggests that instead of

	print taskname

you should be saying

	print taskname.firstChild.data


Bill K

Dan Trevino wrote:
> I'm trying to parse project xml.  The main thing i'm trying to get at
> is the task name, which is basically in this structure:
> <Task>
>   <UID>1</UID>
>   <Name>do step 1</Name>    <-- i want the text from here
> ...
> </Task>
> <Resource>
>   <Name>John Doe</Name>
> ...
> </Resource>
> 
> I'm having difficulty figuring out which methods to use to access the
> data.  I cant get to "Name" directly because it is used also for
> project resources....so I need the task name specifically.  Where do I
> go from here:
> 
> ==============================
>>>> prjdoc = minidom.parse('prj.xml')
>>>> tasklist = prjdoc.getElementsByTagName("Task")
>>>> for task in tasklist:
> ...     taskname = task.getElementsByTagName('Name')
> ...     print taskname
> ...
> []
> [<DOM Element: Name at 0x5e89a08>]
> [<DOM Element: Name at 0x5e99ad0>]
> [<DOM Element: Name at 0x5eb00d0>]
> [<DOM Element: Name at 0x5efb698>]
> [<DOM Element: Name at 0x5f0dc60>]
> 
> ================================
> TIA,
> dan


From dantrevino at gmail.com  Tue Jul 31 20:02:26 2007
From: dantrevino at gmail.com (Dan Trevino)
Date: Tue, 31 Jul 2007 14:02:26 -0400
Subject: [XML-SIG] Fwd:  problem parsing msproject xml
In-Reply-To: <d9a21a9b0707311102j45e136bdx60598d8e3ed5d140@mail.gmail.com>
References: <d9a21a9b0707301053w2051a79boe374bd1b6eba8b01@mail.gmail.com>
	<46AF705F.308@sunflower.com>
	<d9a21a9b0707311102j45e136bdx60598d8e3ed5d140@mail.gmail.com>
Message-ID: <d9a21a9b0707311102s61b4b207s15b0ab279638f5b6@mail.gmail.com>

Thanks I tried this, but:

>>> prjdoc = xml.dom.minidom.parse('prj.xml')
>>> tasklist = prjdoc.getElementsByTagName('Task')
>>> for task in tasklist:
...     taskname = task.getElementsByTagName('Name')
...     print taskname.firstChild.data
...
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
AttributeError: 'NodeList' object has no attribute 'firstChild'
>>>

On 7/31/07, Bill Kinnersley <billk at sunflower.com> wrote:
> Never used DOM and never written a line of Python, but maybe even I know
> the answer to this one!
>
> The minidom documentation suggests that instead of
>
>         print taskname
>
> you should be saying
>
>         print taskname.firstChild.data
>
>
> Bill K
>
> Dan Trevino wrote:
> > I'm trying to parse project xml.  The main thing i'm trying to get at
> > is the task name, which is basically in this structure:
> > <Task>
> >   <UID>1</UID>
> >   <Name>do step 1</Name>    <-- i want the text from here
> > ...
> > </Task>
> > <Resource>
> >   <Name>John Doe</Name>
> > ...
> > </Resource>
> >
> > I'm having difficulty figuring out which methods to use to access the
> > data.  I cant get to "Name" directly because it is used also for
> > project resources....so I need the task name specifically.  Where do I
> > go from here:
> >
> > ==============================
> >>>> prjdoc = minidom.parse('prj.xml')
> >>>> tasklist = prjdoc.getElementsByTagName("Task")
> >>>> for task in tasklist:
> > ...     taskname = task.getElementsByTagName('Name')
> > ...     print taskname
> > ...
> > []
> > [<DOM Element: Name at 0x5e89a08>]
> > [<DOM Element: Name at 0x5e99ad0>]
> > [<DOM Element: Name at 0x5eb00d0>]
> > [<DOM Element: Name at 0x5efb698>]
> > [<DOM Element: Name at 0x5f0dc60>]
> >
> > ================================
> > TIA,
> > dan
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>

From stefan_ml at behnel.de  Tue Jul 31 20:28:19 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 31 Jul 2007 20:28:19 +0200
Subject: [XML-SIG] problem parsing msproject xml
In-Reply-To: <d9a21a9b0707301053w2051a79boe374bd1b6eba8b01@mail.gmail.com>
References: <d9a21a9b0707301053w2051a79boe374bd1b6eba8b01@mail.gmail.com>
Message-ID: <46AF7F43.9050608@behnel.de>


Dan Trevino wrote:
> I'm trying to parse project xml.  The main thing i'm trying to get at
> is the task name, which is basically in this structure:
> <Task>
>   <UID>1</UID>
>   <Name>do step 1</Name>    <-- i want the text from here
> ...
> </Task>
> <Resource>
>   <Name>John Doe</Name>
> ...
> </Resource>
> 
> I'm having difficulty figuring out which methods to use to access the
> data.  I cant get to "Name" directly because it is used also for
> project resources....so I need the task name specifically.

Try lxml.etree:

   >>> # untested
   >>> from lxml import etree
   >>> tree = etree.parse("project.xml")
   >>> print tree.xpath("//Task/Name/text()")
   ["do step 1", ...]

or if you don't like XPath:

   >>> # untested
   >>> from lxml import etree
   >>> tree = etree.parse("project.xml")
   >>> for task in tree.getiterator("Task"):
   ...     for name in task.findall("Name"):
   ...         print name.text
   do step 1
   ...

http://codespeak.net/lxml

Stefan