[SciPy-dev] huge speed regression in loadmat from 0.6.0 to 0.7.0

Scott David Daniels Scott.Daniels at Acm.Org
Fri Feb 13 16:24:55 EST 2009


My former follow-up bounced, so just to close the loop:

Ryan May wrote:
 > ... <places bag over head> I can't believe I didn't notice you weren't
 > the OP.  And yeah, I forgot the loop control.  Clearly, this is
 > evidence that I shouldn't start my day with creating a patch, though I
 > did at least have the sense to run the test suite.  Obviously, the
 > tests don't exercise a code path that uses the len(self.data) < bytes.
Actually this issue is hard to hard to test as a black box, since
over-filling should work correctly but inefficiently.

 > As far as bytes goes, it isn't initialized to -1, but rather
 > read_to_end is a boolean set to the value of (bytes == -1), so
 > that you can pass bytes in as -1 and read all the data.

Right.  I figured that out when I actually went back to the original to
make a patch.  Many eyes make bugs shallow.  There is one thing I did
that you might want to incorporate:
     Instead of:
          self_data = []
     Use:
          self_data = [self.data]

     And then at the bottom, instead of:
          self.data += ''.join([self_data])
     Use:
          self.data = ''.join([self_data])

This way, the full length of the result is know before combining
anything (so you get a single large buffer allocation, rather than two).

--Scott David Daniels
Scott.Daniels at Acm.Org




More information about the SciPy-Dev mailing list