[SciPy-dev] matlab io - request for testing

Nathaniel Smith njs at pobox.com
Sun Feb 22 05:05:41 EST 2009


On Sun, Feb 22, 2009 at 1:01 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
> On Fri, Feb 20, 2009 at 10:28 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> I finally got a chance to test with my nasty file, and with r5561, it
>> now takes ~32 minutes of cpu time to load (as compared to ~5 minutes
>> for 0.7.0, and 3 seconds for 0.6.0). All the time is in
>> zlibstreams.py:read.
>
> Could you check current SVN again and see how it works?

It's down to 4 seconds. Yay.

> I've sped up zlibstreams and it's now saving memory on the read, at
> about a 12% drop in speed, now I think due to the overhead of the
> single extra function calls on many small reads.
>
> I'm unsure whether I want to leave zlibstreams in.  It has the
> advantage of making skipping variables much faster and more memory
> efficient, and maybe some increase in memory efficiency as the
> variable is read, but still, the small performance penalty is
> annoying.

IMHO, if it lets one load gigabyte-matrices without allocating
gigabyte temp variables, then that's a qualitative difference that's
worth a small slowdown. If not, then neither the memory savings or the
slowdown are large enough for me to care much. (I don't tend to
save/load matlab files in my inner loops, personally.)

The thing that does make me nervous is this code's fragility (as has
been demonstrated repeatedly now). It's really non-obvious how small
changes will affect its performance characteristics. Having read your
changes, it isn't at all obvious to me why it's faster now. And e.g. I
had to read StringIO.py to understand why you were recreating the
StringIO object on every __fill. Just looking at zlibstreams.py, it
appears wasteful and should be removed, but now I think that doing so
could make it super-slow again. Basically, I just don't want to have
to come back at every release and complain about my weird files
again...

-- Nathaniel



More information about the SciPy-Dev mailing list