From neilmunro at gmail.com  Fri Feb 13 23:11:40 2009
From: neilmunro at gmail.com (Neil Munro)
Date: Fri, 13 Feb 2009 22:11:40 +0000
Subject: [XML-SIG] XML processing
Message-ID: <da8be25a0902131411m604203e9p952e438769a89d83@mail.gmail.com>

Hello all
            I'm a final year university student in the UK and I've chosen
python as my language of choice for implementing my desired solution but I'm
having issues with getting xml working, the problem is that I've done a lot
of reading discovered lots of different libraries and means of doing things
now I'm unsure which one to use and how.

I'm most familiar with DOM and SAX now, or rather the overview of them, from
what i understand DOM loads the whole document into memory and retains it
until the user removes it, SAX by contrast only holds tiny bits at once in
memory, now I'm of the impression DOM is great for doing a lot of XML
editing where as SAX is great for reading a file in and extracting the
information out of it as you go.

So for my initial XML usage (for my program is going to deal with a lot of
them) I have used SAX to write a basic preferences file (which is part of my
programs design) that works, now, I am at a loss to understand how to read
XML data back in, too many libraries to choose from and different examples
show different things it's hard to fully understand how things work, if
someone can poin me to tutorials and answer questions on them I'd appriciate
it.

Many thanks
Neil Munro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090213/e91b06cc/attachment.htm>

From john at nmt.edu  Sat Feb 14 02:06:29 2009
From: john at nmt.edu (John W. Shipman)
Date: Fri, 13 Feb 2009 18:06:29 -0700 (MST)
Subject: [XML-SIG] XML processing
In-Reply-To: <da8be25a0902131411m604203e9p952e438769a89d83@mail.gmail.com>
References: <da8be25a0902131411m604203e9p952e438769a89d83@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0902131800160.7239@minnie.tcct.nmt.edu>

On Fri, 13 Feb 2009, Neil Munro wrote:

+--
| I'm a final year university student in the UK and I've chosen
| python as my language of choice for implementing my desired solution...
|
| ...if someone can poin me to tutorials and answer questions on them
+--

Here's some documentation for my Python XML tool of choice.  I'll
be happy to answer questions.

     http://www.nmt.edu/tcc/help/pubs/pylxml/

I avoided this approach, Fredrik Lundh's ElementTree model, for
some time because of its differences from DOM, but what won my
over was screaming speed.  After I slurped a 500KB file into
memory in about 300msec, I was a convert.

The last section of the above document contains my adaptation of
Fredrik Lundh's builder.py module, which makes the construction
of XML so very easy.  I use it for all my new dynamic Web and
other XML generation tasks now.

This document does not discuss the lxml package's toolset for the
equivalent of SAX (when the document doesn't fit in memory); for
that, refer to the main site:

     http://codespeak.net/lxml/

Best regards,
John Shipman (john at nmt.edu), Applications Specialist, NM Tech Computer Center,
Speare 119, Socorro, NM 87801, (505) 835-5950, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber

From joshua.r.english at gmail.com  Sat Feb 14 20:01:01 2009
From: joshua.r.english at gmail.com (Josh English)
Date: Sat, 14 Feb 2009 11:01:01 -0800
Subject: [XML-SIG] XML processing
In-Reply-To: <da8be25a0902131411m604203e9p952e438769a89d83@mail.gmail.com>
References: <da8be25a0902131411m604203e9p952e438769a89d83@mail.gmail.com>
Message-ID: <e53a3a5d0902141101k4d461855kd82dc8917cc4da13@mail.gmail.com>

I started with DOM processing, but the computer I had back then
couldn't handle XML files larger that 60K or so. (Serious memory
limitations.) So I learned SAX to extract data from my XML, but had to
revert to DOM to put data back in. Since these worked differently, I
ended up creating a DOM element, then passing the XML text back to a
SAX parser. It was ugly, but I got around the file size.

I've replaced everything with ElementTree. I'm on a computer without
memory limitations (well, I haven't tried to parse a file too large
yet), and ElementTree does a good job of extracting information as
well as writing human-readable XML.

I think the ElementTree page has a good tutorial. It's good enough for
me, at least.

The only disadvantage I've run into is my implementation spends a lot
of time parsing and over writing my XML data files, so that overhead
is costly, but the application is small enough, and the network isn't
getting bogged down, so I can get away with it.


I've got some very old examples at this page:
http://www.spiritone.com/~english/code/ims.html

The code isn't well commented, but hey, this is Python  : )


On 2/13/09, Neil Munro <neilmunro at gmail.com> wrote:


-- 
Josh English
Joshua.R.English at gmail.com
http://joshenglish.livejournal.com

From stefan_ml at behnel.de  Sat Feb 14 22:37:13 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 14 Feb 2009 22:37:13 +0100
Subject: [XML-SIG] XML processing
In-Reply-To: <e53a3a5d0902141101k4d461855kd82dc8917cc4da13@mail.gmail.com>
References: <da8be25a0902131411m604203e9p952e438769a89d83@mail.gmail.com>
	<e53a3a5d0902141101k4d461855kd82dc8917cc4da13@mail.gmail.com>
Message-ID: <49973989.5010606@behnel.de>

Hi,

Josh English wrote:
> The only disadvantage I've run into is my implementation spends a lot
> of time parsing and over writing my XML data files, so that overhead
> is costly

Sounds like you should give lxml a try. It's about as fast as cElementTree
on parsing, but several times faster on serialisation.

http://codespeak.net/lxml/performance.html#parsing-and-serialising

Stefan


From billk at sunflower.com  Sun Feb 15 00:53:48 2009
From: billk at sunflower.com (Bill Kinnersley)
Date: Sat, 14 Feb 2009 17:53:48 -0600
Subject: [XML-SIG] XML processing
In-Reply-To: <Pine.LNX.4.64.0902131800160.7239@minnie.tcct.nmt.edu>
References: <da8be25a0902131411m604203e9p952e438769a89d83@mail.gmail.com>
	<Pine.LNX.4.64.0902131800160.7239@minnie.tcct.nmt.edu>
Message-ID: <4997598C.3070602@sunflower.com>

John W. Shipman wrote:
> 
> This document does not discuss the lxml package's toolset for the
> equivalent of SAX (when the document doesn't fit in memory);

Can anyone add any substance to this remark?  With today's typical 
system RAM of 2GB to 3GB, is it even worth consideration any more that a 
document might not fit in memory?

Offhand I'd guess the size of the XML file and the size of the DOM tree 
would be in the same ballpark.  So unless I've got more than 500MB of 
XML to read, I'm clear.  Right or wrong?


From stefan_ml at behnel.de  Sun Feb 15 09:11:26 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 15 Feb 2009 09:11:26 +0100
Subject: [XML-SIG] XML processing
In-Reply-To: <4997598C.3070602@sunflower.com>
References: <da8be25a0902131411m604203e9p952e438769a89d83@mail.gmail.com>	<Pine.LNX.4.64.0902131800160.7239@minnie.tcct.nmt.edu>
	<4997598C.3070602@sunflower.com>
Message-ID: <4997CE2E.1030907@behnel.de>

Hi,

Bill Kinnersley wrote:
> Can anyone add any substance to this remark?  With today's typical
> system RAM of 2GB to 3GB, is it even worth consideration any more that a
> document might not fit in memory?

At least, it allows you to parse pretty large documents. But think of
parallel handling of more than one document. In that case, you'd still want
to make sure things don't hit the swap disk.


> Offhand I'd guess the size of the XML file and the size of the DOM tree
> would be in the same ballpark.  So unless I've got more than 500MB of
> XML to read, I'm clear.  Right or wrong?

Wrong. Especially the stdlib's minidom is terribly memory hungry. Fredrik
has some benchmarks and memory size hints on his cElementTree page.

http://effbot.org/zone/celementtree.htm#benchmarks

Here are some other benchmarks from Ian Bicking on HTML parsers:

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Sadly, I do not know of any direct comparison of lxml.etree and
cElementTree regarding memory usage, but my guess is that cET is still a
bit better than lxml.etree (which is impressively memory friendly already).
A quick comparison for a 3.4MB XML file with a lot of text and very short
tag names (the old testament in English) gave me almost exactly the same
time for parsing. When done, I had a 17MB Python interpreter for lxml.etree
and a 10MB interpreter for cET. Depending on your XML, this may change in
any kind of way, as both optimise their time and memory usage very differently.

For minidom, I get about 60MB, where Fredrik got 80MB. That's still about a
factor of 17-23 compared to the serialised XML file, whereas lxml and cET
end up with a factor of 3-5. Your assumption that you can use a system with
3GB of RAM to parse a 500MB XML file into an in-memory tree can easily turn
wrong for XML files with more tags and shorter text content (say, numbers),
or for documents with non-european languages.

Stefan

From stefan_ml at behnel.de  Sun Feb 15 12:22:44 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 15 Feb 2009 12:22:44 +0100
Subject: [XML-SIG] XML processing
In-Reply-To: <4997CE2E.1030907@behnel.de>
References: <da8be25a0902131411m604203e9p952e438769a89d83@mail.gmail.com>	<Pine.LNX.4.64.0902131800160.7239@minnie.tcct.nmt.edu>	<4997598C.3070602@sunflower.com>
	<4997CE2E.1030907@behnel.de>
Message-ID: <4997FB04.7030407@behnel.de>


Stefan Behnel wrote:
> For minidom, I get about 60MB, where Fredrik got 80MB. That's still about a
> factor of 17-23 compared to the serialised XML file, whereas lxml and cET
> end up with a factor of 3-5. Your assumption that you can use a system with
> 3GB of RAM to parse a 500MB XML file into an in-memory tree can easily turn
> wrong for XML files with more tags and shorter text content (say, numbers),
> or for documents with non-european languages.

I should add that this was measured on a 32 bit system. 64 bit systems will
require even more memory to store the tree, almost twice as much for each
element.

Stefan


From davidgshi at yahoo.co.uk  Wed Feb 18 10:58:41 2009
From: davidgshi at yahoo.co.uk (David Shi)
Date: Wed, 18 Feb 2009 09:58:41 +0000 (GMT)
Subject: [XML-SIG] Concise way to create a form with listing of checkboxes
Message-ID: <8497.82521.qm@web26303.mail.ukl.yahoo.com>

Hello,
?
Can anyone suggest the best concise way to create a form with listing of checkboxes?
?
Regards.
?
David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090218/2191ada8/attachment.htm>

From davidgshi at yahoo.co.uk  Wed Feb 18 12:08:47 2009
From: davidgshi at yahoo.co.uk (David Shi)
Date: Wed, 18 Feb 2009 11:08:47 +0000 (GMT)
Subject: [XML-SIG] Looking for link to HTMLgen.py, tutorial/documentation
Message-ID: <73761.71999.qm@web26306.mail.ukl.yahoo.com>

Hello.
?
I am looking for HTMLgen.py, tutorial/documentation.
?
Regards.
?
David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090218/da84ed5b/attachment.htm>

From stefan_ml at behnel.de  Wed Feb 18 12:20:57 2009
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 18 Feb 2009 12:20:57 +0100 (CET)
Subject: [XML-SIG] Looking for link to HTMLgen.py, tutorial/documentation
In-Reply-To: <73761.71999.qm@web26306.mail.ukl.yahoo.com>
References: <73761.71999.qm@web26306.mail.ukl.yahoo.com>
Message-ID: <63418.213.61.181.86.1234956057.squirrel@groupware.dvs.informatik.tu-darmstadt.de>

David Shi wrote:
> I am looking for HTMLgen.py, tutorial/documentation.

http://www.google.de/search?q=HTMLgen.py&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a

HTH,

Stefan


From davidgshi at yahoo.co.uk  Wed Feb 18 16:56:31 2009
From: davidgshi at yahoo.co.uk (David Shi)
Date: Wed, 18 Feb 2009 15:56:31 +0000 (GMT)
Subject: [XML-SIG] Where is the latest version of formtools and
	tutorial/documentation
Message-ID: <180216.86308.qm@web26305.mail.ukl.yahoo.com>

Hello.
Can you tell me where I can download the latest version of formtools.py and associated tutorial/documentation?
?
Regards.
?
David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090218/6f67c6fc/attachment.htm>

From narendar.n at gmail.com  Wed Feb 18 21:07:27 2009
From: narendar.n at gmail.com (narendar rao)
Date: Wed, 18 Feb 2009 15:07:27 -0500
Subject: [XML-SIG] need help with removeChild()
Message-ID: <b840c4fe0902181207oba6a5b3t25e6c7ea06b939ab@mail.gmail.com>

Hi,

I am newbie to Python and trying to edit a XML file.

I am attaching the XML file.

I am trying to remove the Node which has a certain attribute value.

For example, I want to remove <c> Node after checking if the attribute
value(Name) equals to the command line parameter passed.

In the removeConfiguration, i want to iterate over the nodes and check
for attribute value and then remove child.

If we use removeChild(), will it save changes to the file that we are
editing. If it doesn't, can some one show me how to do that.

Can some one help me in writing this code? It will help me in
understanding some of the things alot.

Thanks,
Narender
-------------- next part --------------
import os,sys
from xml.dom import minidom


def Usage():

		print "\n Usage: '%s' Folder location Attribute value"
		print "\n Example: '%s' G:\test write"
		sys.exit(1)
		
def CheckArgs():
		if len (sys.argv) < 3:
			Usage()
			sys.exit(-1)
			
def RemoveConfiguration(file,doc,sSrcAttribute):
		node = doc.childNodes[0]
		
			
def callBack(ctx,root,files):
	for file in files:
		file = root + "\\" + file
		if not file.lower().endswith('.xml'):
			continue
		try:
                    print file
                    doc = minidom.parse(file)
                    RemoveConfiguration(file,doc,u'Name')
                except:
                    print "Error in Processing"
                            

CheckArgs()

sSrcFolder = sys.argv[1]
sSrcAttribute = sys.argv[2]

os.path.walk(sSrcFolder,callBack,0)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xmlfile.xml
Type: text/xml
Size: 119 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/xml-sig/attachments/20090218/d6ec85a4/attachment.bin>