From neilmunro at gmail.com Fri Feb 13 23:11:40 2009 From: neilmunro at gmail.com (Neil Munro) Date: Fri, 13 Feb 2009 22:11:40 +0000 Subject: [XML-SIG] XML processing Message-ID: Hello all I'm a final year university student in the UK and I've chosen python as my language of choice for implementing my desired solution but I'm having issues with getting xml working, the problem is that I've done a lot of reading discovered lots of different libraries and means of doing things now I'm unsure which one to use and how. I'm most familiar with DOM and SAX now, or rather the overview of them, from what i understand DOM loads the whole document into memory and retains it until the user removes it, SAX by contrast only holds tiny bits at once in memory, now I'm of the impression DOM is great for doing a lot of XML editing where as SAX is great for reading a file in and extracting the information out of it as you go. So for my initial XML usage (for my program is going to deal with a lot of them) I have used SAX to write a basic preferences file (which is part of my programs design) that works, now, I am at a loss to understand how to read XML data back in, too many libraries to choose from and different examples show different things it's hard to fully understand how things work, if someone can poin me to tutorials and answer questions on them I'd appriciate it. Many thanks Neil Munro -------------- next part -------------- An HTML attachment was scrubbed... URL: From john at nmt.edu Sat Feb 14 02:06:29 2009 From: john at nmt.edu (John W. Shipman) Date: Fri, 13 Feb 2009 18:06:29 -0700 (MST) Subject: [XML-SIG] XML processing In-Reply-To: References: Message-ID: On Fri, 13 Feb 2009, Neil Munro wrote: +-- | I'm a final year university student in the UK and I've chosen | python as my language of choice for implementing my desired solution... | | ...if someone can poin me to tutorials and answer questions on them +-- Here's some documentation for my Python XML tool of choice. I'll be happy to answer questions. http://www.nmt.edu/tcc/help/pubs/pylxml/ I avoided this approach, Fredrik Lundh's ElementTree model, for some time because of its differences from DOM, but what won my over was screaming speed. After I slurped a 500KB file into memory in about 300msec, I was a convert. The last section of the above document contains my adaptation of Fredrik Lundh's builder.py module, which makes the construction of XML so very easy. I use it for all my new dynamic Web and other XML generation tasks now. This document does not discuss the lxml package's toolset for the equivalent of SAX (when the document doesn't fit in memory); for that, refer to the main site: http://codespeak.net/lxml/ Best regards, John Shipman (john at nmt.edu), Applications Specialist, NM Tech Computer Center, Speare 119, Socorro, NM 87801, (505) 835-5950, http://www.nmt.edu/~john ``Let's go outside and commiserate with nature.'' --Dave Farber From joshua.r.english at gmail.com Sat Feb 14 20:01:01 2009 From: joshua.r.english at gmail.com (Josh English) Date: Sat, 14 Feb 2009 11:01:01 -0800 Subject: [XML-SIG] XML processing In-Reply-To: References: Message-ID: I started with DOM processing, but the computer I had back then couldn't handle XML files larger that 60K or so. (Serious memory limitations.) So I learned SAX to extract data from my XML, but had to revert to DOM to put data back in. Since these worked differently, I ended up creating a DOM element, then passing the XML text back to a SAX parser. It was ugly, but I got around the file size. I've replaced everything with ElementTree. I'm on a computer without memory limitations (well, I haven't tried to parse a file too large yet), and ElementTree does a good job of extracting information as well as writing human-readable XML. I think the ElementTree page has a good tutorial. It's good enough for me, at least. The only disadvantage I've run into is my implementation spends a lot of time parsing and over writing my XML data files, so that overhead is costly, but the application is small enough, and the network isn't getting bogged down, so I can get away with it. I've got some very old examples at this page: http://www.spiritone.com/~english/code/ims.html The code isn't well commented, but hey, this is Python : ) On 2/13/09, Neil Munro wrote: -- Josh English Joshua.R.English at gmail.com http://joshenglish.livejournal.com From stefan_ml at behnel.de Sat Feb 14 22:37:13 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Feb 2009 22:37:13 +0100 Subject: [XML-SIG] XML processing In-Reply-To: References: Message-ID: <49973989.5010606@behnel.de> Hi, Josh English wrote: > The only disadvantage I've run into is my implementation spends a lot > of time parsing and over writing my XML data files, so that overhead > is costly Sounds like you should give lxml a try. It's about as fast as cElementTree on parsing, but several times faster on serialisation. http://codespeak.net/lxml/performance.html#parsing-and-serialising Stefan From billk at sunflower.com Sun Feb 15 00:53:48 2009 From: billk at sunflower.com (Bill Kinnersley) Date: Sat, 14 Feb 2009 17:53:48 -0600 Subject: [XML-SIG] XML processing In-Reply-To: References: Message-ID: <4997598C.3070602@sunflower.com> John W. Shipman wrote: > > This document does not discuss the lxml package's toolset for the > equivalent of SAX (when the document doesn't fit in memory); Can anyone add any substance to this remark? With today's typical system RAM of 2GB to 3GB, is it even worth consideration any more that a document might not fit in memory? Offhand I'd guess the size of the XML file and the size of the DOM tree would be in the same ballpark. So unless I've got more than 500MB of XML to read, I'm clear. Right or wrong? From stefan_ml at behnel.de Sun Feb 15 09:11:26 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Feb 2009 09:11:26 +0100 Subject: [XML-SIG] XML processing In-Reply-To: <4997598C.3070602@sunflower.com> References: <4997598C.3070602@sunflower.com> Message-ID: <4997CE2E.1030907@behnel.de> Hi, Bill Kinnersley wrote: > Can anyone add any substance to this remark? With today's typical > system RAM of 2GB to 3GB, is it even worth consideration any more that a > document might not fit in memory? At least, it allows you to parse pretty large documents. But think of parallel handling of more than one document. In that case, you'd still want to make sure things don't hit the swap disk. > Offhand I'd guess the size of the XML file and the size of the DOM tree > would be in the same ballpark. So unless I've got more than 500MB of > XML to read, I'm clear. Right or wrong? Wrong. Especially the stdlib's minidom is terribly memory hungry. Fredrik has some benchmarks and memory size hints on his cElementTree page. http://effbot.org/zone/celementtree.htm#benchmarks Here are some other benchmarks from Ian Bicking on HTML parsers: http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ Sadly, I do not know of any direct comparison of lxml.etree and cElementTree regarding memory usage, but my guess is that cET is still a bit better than lxml.etree (which is impressively memory friendly already). A quick comparison for a 3.4MB XML file with a lot of text and very short tag names (the old testament in English) gave me almost exactly the same time for parsing. When done, I had a 17MB Python interpreter for lxml.etree and a 10MB interpreter for cET. Depending on your XML, this may change in any kind of way, as both optimise their time and memory usage very differently. For minidom, I get about 60MB, where Fredrik got 80MB. That's still about a factor of 17-23 compared to the serialised XML file, whereas lxml and cET end up with a factor of 3-5. Your assumption that you can use a system with 3GB of RAM to parse a 500MB XML file into an in-memory tree can easily turn wrong for XML files with more tags and shorter text content (say, numbers), or for documents with non-european languages. Stefan From stefan_ml at behnel.de Sun Feb 15 12:22:44 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Feb 2009 12:22:44 +0100 Subject: [XML-SIG] XML processing In-Reply-To: <4997CE2E.1030907@behnel.de> References: <4997598C.3070602@sunflower.com> <4997CE2E.1030907@behnel.de> Message-ID: <4997FB04.7030407@behnel.de> Stefan Behnel wrote: > For minidom, I get about 60MB, where Fredrik got 80MB. That's still about a > factor of 17-23 compared to the serialised XML file, whereas lxml and cET > end up with a factor of 3-5. Your assumption that you can use a system with > 3GB of RAM to parse a 500MB XML file into an in-memory tree can easily turn > wrong for XML files with more tags and shorter text content (say, numbers), > or for documents with non-european languages. I should add that this was measured on a 32 bit system. 64 bit systems will require even more memory to store the tree, almost twice as much for each element. Stefan From davidgshi at yahoo.co.uk Wed Feb 18 10:58:41 2009 From: davidgshi at yahoo.co.uk (David Shi) Date: Wed, 18 Feb 2009 09:58:41 +0000 (GMT) Subject: [XML-SIG] Concise way to create a form with listing of checkboxes Message-ID: <8497.82521.qm@web26303.mail.ukl.yahoo.com> Hello, ? Can anyone suggest the best concise way to create a form with listing of checkboxes? ? Regards. ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidgshi at yahoo.co.uk Wed Feb 18 12:08:47 2009 From: davidgshi at yahoo.co.uk (David Shi) Date: Wed, 18 Feb 2009 11:08:47 +0000 (GMT) Subject: [XML-SIG] Looking for link to HTMLgen.py, tutorial/documentation Message-ID: <73761.71999.qm@web26306.mail.ukl.yahoo.com> Hello. ? I am looking for HTMLgen.py, tutorial/documentation. ? Regards. ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Wed Feb 18 12:20:57 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 18 Feb 2009 12:20:57 +0100 (CET) Subject: [XML-SIG] Looking for link to HTMLgen.py, tutorial/documentation In-Reply-To: <73761.71999.qm@web26306.mail.ukl.yahoo.com> References: <73761.71999.qm@web26306.mail.ukl.yahoo.com> Message-ID: <63418.213.61.181.86.1234956057.squirrel@groupware.dvs.informatik.tu-darmstadt.de> David Shi wrote: > I am looking for HTMLgen.py, tutorial/documentation. http://www.google.de/search?q=HTMLgen.py&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a HTH, Stefan From davidgshi at yahoo.co.uk Wed Feb 18 16:56:31 2009 From: davidgshi at yahoo.co.uk (David Shi) Date: Wed, 18 Feb 2009 15:56:31 +0000 (GMT) Subject: [XML-SIG] Where is the latest version of formtools and tutorial/documentation Message-ID: <180216.86308.qm@web26305.mail.ukl.yahoo.com> Hello. Can you tell me where I can download the latest version of formtools.py and associated tutorial/documentation? ? Regards. ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From narendar.n at gmail.com Wed Feb 18 21:07:27 2009 From: narendar.n at gmail.com (narendar rao) Date: Wed, 18 Feb 2009 15:07:27 -0500 Subject: [XML-SIG] need help with removeChild() Message-ID: Hi, I am newbie to Python and trying to edit a XML file. I am attaching the XML file. I am trying to remove the Node which has a certain attribute value. For example, I want to remove Node after checking if the attribute value(Name) equals to the command line parameter passed. In the removeConfiguration, i want to iterate over the nodes and check for attribute value and then remove child. If we use removeChild(), will it save changes to the file that we are editing. If it doesn't, can some one show me how to do that. Can some one help me in writing this code? It will help me in understanding some of the things alot. Thanks, Narender -------------- next part -------------- import os,sys from xml.dom import minidom def Usage(): print "\n Usage: '%s' Folder location Attribute value" print "\n Example: '%s' G:\test write" sys.exit(1) def CheckArgs(): if len (sys.argv) < 3: Usage() sys.exit(-1) def RemoveConfiguration(file,doc,sSrcAttribute): node = doc.childNodes[0] def callBack(ctx,root,files): for file in files: file = root + "\\" + file if not file.lower().endswith('.xml'): continue try: print file doc = minidom.parse(file) RemoveConfiguration(file,doc,u'Name') except: print "Error in Processing" CheckArgs() sSrcFolder = sys.argv[1] sSrcAttribute = sys.argv[2] os.path.walk(sSrcFolder,callBack,0) -------------- next part -------------- A non-text attachment was scrubbed... Name: xmlfile.xml Type: text/xml Size: 119 bytes Desc: not available URL: