[XML-SIG] speed question re DOM parsing
Greg Stein
gstein@lyra.org
Fri, 23 Jun 2000 04:14:11 -0700
On Fri, Jun 23, 2000 at 12:12:08PM +0200, Juergen Hermann wrote:
> On Thu, 22 Jun 2000 19:41:53 -0700, Greg Stein wrote:
>
> >Exactly. Bjorn solved this with StringIO. A timing comparison against
> >string.join is an important test before using either approach.
>
> The two runs I gave it (on Win/NT)...
>
> Length of testtext is 1292
> adding 39.687
> format 189.71
> join 47.034
> chararray 67.323
> stringio 33.011
>
> Length of testtext is 1292
> adding 40.573
> format 191.327
> join 47.09
> chararray 65.256
> stringio 32.65
>
> The result is obvious, and also what I expected.
well... not so obvious. You're appending characters. I commented out all but
the join and stringio tests, cut the iterations down some, and changed
testtext to read:
testtext = ['x'*1000] * 100
That produced the following numbers:
join 3.42
stringio 4.67
Changing testtext to "testtext = ['x'*100] * 1000" produced:
join 12.52
stringio 10.35
In other words, the fastest mechanism depends on the length of the input
pieces. The balance seems to occur right around 500 characters in my
off-the-cuff tests.
I think that I'd choose cStringIO when present; otherwise choose .join().
Unfortunately, the code would get ugly for that, so it really means going
with one pattern. Assuming that cStringIO is always present is probably best
(it is enabled by default). The plain StringIO package uses .join, so that
is a nice fallback.
oh... and regarding the patch: adding a __getattr__ to the element seems
wrong. I'd recommend instantiating a StringIO in start() and placing it into
the elem instance as _buf. On a call to end(), do a getvalue(), store the
value into first_cdata, and toss the object. (have to toss since there isn't
a common way to "reset and truncate" a StringIO)
Cheers,
-g
--
Greg Stein, http://www.lyra.org/