xpathEval fails for large files

Paul Boddie paul at boddie.org.uk
Tue Jul 22 06:28:09 EDT 2008


On 22 Jul, 11:00, Kanchana <kanchana.senevirat... at gmail.com> wrote:
>
> I tried to extract some data with xpathEval. Path contain more than
> 100,000 elements.
>
> doc = libxml2.parseFile("test.xml")
> ctxt = doc.xpathNewContext()
> result = ctxt.xpathEval('//src_ref/@editions')
> doc.freeDoc()
> ctxt.xpathFreeContext()

Another note on libraries: if you want a pure Python library which
works on top of libxml2 and the bundled Python bindings, consider
libxml2dom [1].

> this will stuck in following line and will result in high usage of
> CPU.
> result = ctxt.xpathEval('//src_ref/@editions')
>
> Any suggestions to resolve this.

How big is your document and how much memory is the process using
after you have parsed the document? Sometimes, you won't be able to
effectively handle very large documents by having them loaded
completely in memory because you'll require more main memory than your
system has available, making operations on the document somewhat
inefficient.

> Is there any better alternative to handle large documents?

Fredrik pointed out a few. There's also xml.dom.pulldom and xml.sax in
the standard library - the latter attractive mostly if you have
previous experience with it - providing stream-based processing of
documents if you don't mind writing more code.

Paul

[1] http://www.python.org/pypi/libxml2dom



More information about the Python-list mailing list