[XML-SIG] Reading characters
Albert Chin
xml-sig@python.org
Tue, 21 May 2002 18:33:38 -0500
On Wed, May 22, 2002 at 12:07:08AM +0200, Martin v. Loewis wrote:
> Albert Chin <xml-sig@thewrittenword.com> writes:
>
> > I have an XML element that contains a lot of data (it's a base64
> > encoded file). Reading the characters through the default characters()
> > function is slow (one line at a time). How can I read more?
> >
> > I'm using PyXML 0.7.1 and Python 2.2.1.
>
> Can you elaborate? What kind of parsing technology do you use? Expat,
> SAX, minidom, something else?
I'm using expat to the best of my knowledge:
from xml.sax import saxexts, saxlib
...
fh = open (self.path, 'r')
xmlh = read_pkg_db (self.data)
p = saxexts.make_parser ()
p.setDocumentHandler (xmlh)
p.parseFile (fh)
fh.close ()
p.close ()
> What do you mean by "read through"? Can you share a bit of code?
class read_pkg_db (saxlib.HandlerBase):
def __init__ (self, data, extract = 0):
self.in_data = 0
self.data = data
self.extract = extract # whether or not to extract payload
...
def startElement (self, name, attrs):
...
def characters (self, ch, start, length):
if self.in_data and self.extract:
self.payload = self.payload + ch[start:start+length]
...
Sample XML file:
<?xml version="1.0"?>
<packages>
<package name="m4" version="1.4">
<package-manager name="pkgadd">
<datetime type="create">2002-05-21T23:22:05Z</datetime>
<install-name>m4</install-name>
<pkgname-base>TWWm4</pkgname-base>
<version>1.4</version>
<revision>5</revision>
<subpkg type="man">m</subpkg>
<subpkg type="runtime">u</subpkg>
<data checksum="445f09c2f585c9d5a5c7c8dff2ea8275"
checksum-type="md5" encoding="base64"
filename="data.gcpio.bz2" size="104123">
QlpoOTFBWSZTWaGX1TsBL+V/////////////////////////////////////////////4fZgDz61
ABSQpQAAbb1tvuilfS9gyp31kPqW9nX3b7Z8RSmb0zz3wAEQkAeA+xhAXIADtgKp9OgABoAAFAoA
...
</data>
</package-manager>
</package>
</packages>
--
albert chin (china@thewrittenword.com)