XML document causes pickle to go into infinite recursion

Stefan Behnel stefan.behnel-n05pAM at web.de
Thu Nov 1 04:57:01 EDT 2007


Orest Kozyar wrote:
> I'm working on a CGI script that pulls XML data from a public database
> (Medline) and caches this data using shelveleto minimize load on the
> database.  In general, the script works quite well, but keeps crashing
> every time I try to pickle a particular XML document.  Below is a
> script that illustrates the problem, followed by the stack trace that
> is generated (thanks to Kent Johnson who helped me refine the
> script).  I'd appreciate any advice for solving this particular
> problem.  Someone on Python-Tutor suggested that the XML document has
> a circular reference, but I'm not sure exactly what this means, or why
> the document would have a reference to itself.

minidom creates a pretty complete tree data structure, with loads of backlinks
to parent elements etc. That's where your circular references come from.

I don't know why you want to use pickle here (and not serialised XML or the
plain in-memory tree), but if memory consumption is an issue, try
cElementTree, which comes with Python 2.5 (or as an external module for older
versions). It's faster, more memory friendly and easier to use than minidom.
There's also lxml.objectify, in case you can't live without pickeling.

http://effbot.org/zone/celementtree.htm
http://codespeak.net/lxml

Stefan



More information about the Python-list mailing list