[XML-SIG] speed question re DOM parsing
Walter Underwood
wunder@ultraseek.com
Thu, 22 Jun 2000 09:37:56 -0700
I know this is a reply to a really old post, but the talk about
speedup reminded me that I hadn't answered it.
This is worth posting generally, because it is applicable to
lots of SAX handlers (add it to the documentation?).
--On Wednesday, May 31, 2000 8:58 PM -0600 Bjorn Pettersen
<bjorn@roguewave.com> wrote:
>
> After some profiling, I found that most of the time was going into the
> else branch in the cdata method. This branch is growing a string
> character by character by saying:
>
> elem.first_cdata = elem.first_cdata + data
I had one of those in my character data handler too. Parsing the
Old Testament took about 45 min, as I remember. The copies and
reallocs in concatenation are O(n**2). Save all the strings in
a list, then use string.join at the end. This is linear.
Here are the relevant fragments of the class with the handlers:
class XMLToText:
def __init__(self):
self.text = []
def cdata(self, data):
self.text.append(data)
def finish(self):
self.text = string.join(self.text,u'')
Note the Unicode string constant -- remove that for Python 1.5,
and add code to handle the UTF-8, if necessary.
wunder
--
Walter R. Underwood
Senior Staff Engineer, Ultraseek Corp.
http://www.ultraseek.com/