RDFXML Parser For "qualified Dublin Core" Database File

Brandon McGinty brandon.mcginty at gmail.com
Wed May 30 00:36:22 EDT 2007


Hi All,

My goal is to be able to read the www.gutenberg.org
<http://www.gutenberg.org/>  rdf catalog, parse it into a python structure,
and pull out data for each record.

The catalog is a Dublin core RDF/XML catalog, divided into sections for each
book and details for that book.

I have done a very large amount of research on this problem.

I've tried tools such as pyrple, sax/dom/minidom, and some others both
standard and nonstandard to a python installation.

None of the tools has been able to read this file successfully, and those
that can even see the data can take up to half an hour to load with 2 gb of
ram.

So you all know what I'm talking about, the file is located at:

http://www.gutenberg.org/feeds/catalog.rdf.bz2

Does anyone have suggestions for a parser or converter, so I'd be able to
view this file, and extract data?

Any help is appreciated.

 

Thanks,

Brandon McGinty

Brandon.mcginty at gmail.com

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070529/0888449e/attachment.html>


More information about the Python-list mailing list