python xml DOM? pulldom? SAX?
Fredrik Lundh
fredrik at pythonware.com
Mon Aug 29 12:38:06 EDT 2005
"jog" wrote:
> I want to get text out of some nodes of a huge xml file (1,5 GB). The
> architecture of the xml file is something like this
> I want to combine the text out of page:title and page:revision:text for
> every single page element. One by one I want to index these combined
> texts (so for each page one index)
here's one way to do it:
try:
import cElementTree as ET
except ImportError:
from elementtree import ElementTree as ET
for event, elem in ET.iterparse(file):
if elem.tag == "page":
title = elem.findtext("title")
revision = elem.findtext("revision/text")
print title, revision
elem.clear() # won't need this any more
references:
http://effbot.org/zone/element-index.htm
http://effbot.org/zone/celementtree.htm (for best performance)
http://effbot.org/zone/element-iterparse.htm
</F>
More information about the Python-list
mailing list