xpathEval fails for large files

Jim Washington jwashin at vt.edu
Tue Jul 22 07:48:01 EDT 2008


Kanchana wrote:
> Hi,
> 
> I tried to extract some data with xpathEval. Path contain more than
> 100,000 elements.
> 
> doc = libxml2.parseFile("test.xml")
> ctxt = doc.xpathNewContext()
> result = ctxt.xpathEval('//src_ref/@editions')
> doc.freeDoc()
> ctxt.xpathFreeContext()
> 
> this will stuck in following line and will result in high usage of
> CPU.
> result = ctxt.xpathEval('//src_ref/@editions')
> 
> Any suggestions to resolve this.
> 
> Is there any better alternative to handle large documents?

One option might be an XML database.  I'm familiar with Sedna (
http://modis.ispras.ru/sedna/ ).

In practice, you store the document in the database, and let the
database do the extracting for you.  Sedna does XQuery, which is a very
nice way to get just what you want out of your document or collection of
documents.

Good:
   It's free (Apache 2.0 license)
   It's cross-platform (later Windows x86, Linux x86, FreeBSD, MacOS X)
   It has python bindings (zif.sedna at the cheese shop and others).
   It's pretty fast, particularly if you set-up indexes.
   Document and document collection size are limited only by disk space.

Not so good:
   Sedna runs as a server.  Expect to use in the range of 100M of RAM
per database. A database can contain many many documents, so you
probably only want one database, anyway.

Disclosure: I'm the author of the zif.sedna package, and I'm
interpreting the fact that I have not received much feedback as "It
works pretty well" :)


- Jim Washington



More information about the Python-list mailing list